Analyzer configuration

Analyzers are used to control how the content of a document is broken into 'terms' (words) during the indexing process. For example, an analyzer can remove common words,  normalize plural to singular and can perform other language-specific operations in order to improve the search quality. 

<analyzer>
	<class>...</class>
	<stemmer>...</stemmer>
	<locale>...</locale>
</analyzer>

Configuration nodes

The following nodes are used to specify an analyzer:

  • the <locale> node specifies the locale like "de" or "en" used within an index configuration node to specify the appropriate analyzer of the contents of an index.
  • the <class> node specifies the package/class name of the analyzer class.
  • the <stemmer> node is used to specify the stemmer algorithm of the analyzer.

Available analyzers

Currently, these analyzers are part of the OpenCms search package:

  • org.apache.lucene.analysis.de.GermanAnalyzer
    Analyzer for german language content.
  • org.apache.lucene.analysis.ru.RussianAnalyzer
    Analyzer for russian language content.
  • org.apache.lucene.analysis.standard.StandardAnalyzer
    Analyzer for english and other language content.
  • org.apache.lucene.analysis.snowball.SnowballAnalyzer
    Analyzer for various languages, see the snowball homepage.
    For this analyzer, the language is specified using the additional parameter with values: Danish, Dutch, English, Finnish, French, German, Italian, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, Swedish

Example

This example shows how to configure an analyzer for contents in french language:

<analyzer>
	<class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
	<stemmer>French</stemmer>
	<locale>fr</locale>
</analyzer>