Package version:

Enumeration KnownTokenizerNamesReadonly

Defines values for TokenizerName.

Enumeration Members

Classic

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html

EdgeNGram
Keyword
Letter
Lowercase
MicrosoftLanguageStemmingTokenizer

Divides text using language-specific rules and reduces words to their base forms.

MicrosoftLanguageTokenizer

Divides text using language-specific rules.

NGram
PathHierarchy
Pattern

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html

Standard

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html

UaxUrlEmail
Whitespace