Analyzer configuration
Analyzers are used to control how the content of a document is broken into 'terms' (words) during the indexing process. For example, an analyzer can remove common words, normalize plural to singular and can perform other language-specific operations in order to improve the search quality.
<analyzer> <class>...</class> <stemmer>...</stemmer> <locale>...</locale> </analyzer>
Configuration nodes
The following nodes are used to specify an analyzer:
- the <locale> node specifies the locale like "de" or "en" used within an index configuration node to specify the appropriate analyzer of the contents of an index.
- the <class> node specifies the package/class name of the analyzer class.
- the <stemmer> node is used to specify the stemmer algorithm of the analyzer.
Available analyzers
Currently, these analyzers are part of the OpenCms search package:
- org.apache.lucene.analysis.de.GermanAnalyzer
Analyzer for german language content. - org.apache.lucene.analysis.ru.RussianAnalyzer
Analyzer for russian language content. - org.apache.lucene.analysis.standard.StandardAnalyzer
Analyzer for english and other language content. - org.apache.lucene.analysis.snowball.SnowballAnalyzer
Analyzer for various languages, see the snowball homepage.
For this analyzer, the language is specified using the additional parameter with values: Danish, Dutch, English, Finnish, French, German, Italian, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, Swedish
Example
This example shows how to configure an analyzer for contents in french language:
<analyzer> <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class> <stemmer>French</stemmer> <locale>fr</locale> </analyzer>