Package version:

Interface SplitSkill

A skill to split a string into chunks of text.

interface SplitSkill {
    azureOpenAITokenizerParameters?: AzureOpenAITokenizerParameters;
    context?: string;
    defaultLanguageCode?:
        | "da"
        | "de"
        | "en"
        | "es"
        | "fi"
        | "fr"
        | "it"
        | "ko"
        | "pt"
        | "cs"
        | "nl"
        | "hu"
        | "ja"
        | "pl"
        | "ru"
        | "sv"
        | "tr"
        | "bs"
        | "et"
        | "he"
        | "hi"
        | "hr"
        | "id"
        | "lv"
        | "nb"
        | "sk"
        | "sl"
        | "zh"
        | "is"
        | "sr"
        | "ur"
        | "am"
        | "pt-br";
    description?: string;
    inputs: InputFieldMappingEntry[];
    maximumPagesToTake?: number;
    maxPageLength?: number;
    name?: string;
    odatatype: "#Microsoft.Skills.Text.SplitSkill";
    outputs: OutputFieldMappingEntry[];
    pageOverlapLength?: number;
    textSplitMode?: "pages" | "sentences";
    unit?: string;
}

Hierarchy (view full)

BaseSearchIndexerSkill
- SplitSkill

Index

Properties

azureOpenAITokenizerParameters? context? defaultLanguageCode? description? inputs maximumPagesToTake? maxPageLength? name? odatatype outputs pageOverlapLength? textSplitMode? unit?

Properties

`Optional`azureOpenAITokenizerParameters

azureOpenAITokenizerParameters?: AzureOpenAITokenizerParameters

Only applies if the unit is set to azureOpenAITokens. If specified, the splitSkill will use these parameters when performing the tokenization. The parameters are a valid 'encoderModelName' and an optional 'allowedSpecialTokens' property.

`Optional`context

context?: string

Represents the level at which operations take place, such as the document root or document content (for example, /document or /document/content). The default is /document.

`Optional`defaultLanguageCode

defaultLanguageCode?:
    | "da"
    | "de"
    | "en"
    | "es"
    | "fi"
    | "fr"
    | "it"
    | "ko"
    | "pt"
    | "cs"
    | "nl"
    | "hu"
    | "ja"
    | "pl"
    | "ru"
    | "sv"
    | "tr"
    | "bs"
    | "et"
    | "he"
    | "hi"
    | "hr"
    | "id"
    | "lv"
    | "nb"
    | "sk"
    | "sl"
    | "zh"
    | "is"
    | "sr"
    | "ur"
    | "am"
    | "pt-br"

A value indicating which language code to use. Default is en.

`Optional`description

description?: string

The description of the skill which describes the inputs, outputs, and usage of the skill.

inputs

inputs: InputFieldMappingEntry[]

Inputs of the skills could be a column in the source data set, or the output of an upstream skill.

`Optional`maximumPagesToTake

maximumPagesToTake?: number

Only applicable when textSplitMode is set to 'pages'. If specified, the SplitSkill will discontinue splitting after processing the first 'maximumPagesToTake' pages, in order to improve performance when only a few initial pages are needed from each document.

`Optional`maxPageLength

maxPageLength?: number

The desired maximum page length. Default is 10000.

`Optional`name

name?: string

The name of the skill which uniquely identifies it within the skillset. A skill with no name defined will be given a default name of its 1-based index in the skills array, prefixed with the character '#'.

odatatype

Polymorphic discriminator, which specifies the different types this object can be

outputs

outputs: OutputFieldMappingEntry[]

The output of a skill is either a field in a search index, or a value that can be consumed as an input by another skill.

`Optional`pageOverlapLength

pageOverlapLength?: number

Only applicable when textSplitMode is set to 'pages'. If specified, n+1th chunk will start with this number of characters/tokens from the end of the nth chunk.

`Optional`textSplitMode

textSplitMode?: "pages" | "sentences"

A value indicating which split mode to perform.

`Optional`unit

unit?: string

Only applies if textSplitMode is set to pages. There are two possible values. The choice of the values will decide the length (maximumPageLength and pageOverlapLength) measurement. The default is 'characters', which means the length will be measured by character.

Interface SplitSkill

Hierarchy (view full)

Index

Properties

Properties

OptionalazureOpenAITokenizerParameters

Optionalcontext

OptionaldefaultLanguageCode

Optionaldescription

inputs

OptionalmaximumPagesToTake

OptionalmaxPageLength

Optionalname

odatatype

outputs

OptionalpageOverlapLength

OptionaltextSplitMode

Optionalunit

Settings

`Optional`azureOpenAITokenizerParameters

`Optional`context

`Optional`defaultLanguageCode

`Optional`description

`Optional`maximumPagesToTake

`Optional`maxPageLength

`Optional`name

`Optional`pageOverlapLength

`Optional`textSplitMode

`Optional`unit