Package version:

Enumeration KnownTokenizerNames

Known values of LexicalTokenizerName that the service accepts.

Enumeration Members

Classic: "classic"

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html

EdgeNGram: "edgeNGram"
Keyword: "keyword_v2"
Letter: "letter"
Lowercase: "lowercase"
MicrosoftLanguageStemmingTokenizer: "microsoft_language_stemming_tokenizer"

Divides text using language-specific rules and reduces words to their base forms.

MicrosoftLanguageTokenizer: "microsoft_language_tokenizer"

Divides text using language-specific rules.

NGram: "nGram"
PathHierarchy: "path_hierarchy_v2"
Pattern: "pattern"

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html

Standard: "standard_v2"

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html

UaxUrlEmail: "uax_url_email"
Whitespace: "whitespace"