azure.ai.inference.aio package

class azure.ai.inference.aio.ChatCompletionsClient(endpoint: str, credential: AzureKeyCredential | AsyncTokenCredential, *, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: Literal['text', 'json_object'] | JsonSchemaFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)[source]

ChatCompletionsClient.

Parameters:
  • endpoint (str) – Service endpoint URL for AI model inference. Required.

  • credential (AzureKeyCredential or AsyncTokenCredential) – Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a AsyncTokenCredential type. Required.

Keyword Arguments:
  • frequency_penalty (float) – A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.

  • presence_penalty (float) – A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model’s likelihood to output new topics. Supported range is [-2, 2]. Default value is None.

  • temperature (float) – The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

  • top_p (float) – An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

  • max_tokens (int) – The maximum number of tokens to generate. Default value is None.

  • response_format (Union[Literal['text', 'json_object'], ~azure.ai.inference.models.JsonSchemaFormat]) – The format that the AI model must output. AI chat completions models typically output unformatted text by default. This is equivalent to setting “text” as the response_format. To output JSON format, without adhering to any schema, set to “json_object”. To output JSON format adhering to a provided schema, set this to an object of the class ~azure.ai.inference.models.JsonSchemaFormat. Default value is None.

  • stop (list[str]) – A collection of textual sequences that will end completions generation. Default value is None.

  • tools (list[ChatCompletionsToolDefinition]) – The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.

  • tool_choice (str or ChatCompletionsToolChoicePreset or ChatCompletionsNamedToolChoice) – If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, “_models.ChatCompletionsToolChoicePreset”] type or a ChatCompletionsNamedToolChoice type. Default value is None.

  • seed (int) – If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

  • api_version (str) – The API version to use for this operation. Default value is “2024-05-01-preview”. Note that overriding this default value may result in unsupported behavior.

async close() None[source]
async complete(*, messages: List[ChatRequestMessage], stream: Literal[False] = False, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: Literal['text', 'json_object'] | JsonSchemaFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) ChatCompletions[source]
async complete(*, messages: List[ChatRequestMessage], stream: Literal[True], frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: Literal['text', 'json_object'] | JsonSchemaFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) AsyncIterable[StreamingChatCompletionsUpdate]
async complete(*, messages: List[ChatRequestMessage], stream: bool | None = None, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: Literal['text', 'json_object'] | JsonSchemaFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) AsyncIterable[StreamingChatCompletionsUpdate] | ChatCompletions
async complete(body: MutableMapping[str, Any], *, content_type: str = 'application/json', **kwargs: Any) AsyncIterable[StreamingChatCompletionsUpdate] | ChatCompletions
async complete(body: IO[bytes], *, content_type: str = 'application/json', **kwargs: Any) AsyncIterable[StreamingChatCompletionsUpdate] | ChatCompletions

Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or “completes” provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.

Parameters:

body (JSON or IO[bytes]) – Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required.

Keyword Arguments:
  • messages (list[ChatRequestMessage]) – The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. Required.

  • stream (bool) – A value indicating whether chat completions should be streamed for this request. Default value is False. If streaming is enabled, the response will be a StreamingChatCompletions. Otherwise the response will be a ChatCompletions.

  • frequency_penalty (float) – A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.

  • presence_penalty (float) – A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model’s likelihood to output new topics. Supported range is [-2, 2]. Default value is None.

  • temperature (float) – The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

  • top_p (float) – An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

  • max_tokens (int) – The maximum number of tokens to generate. Default value is None.

  • response_format (Union[Literal['text', 'json_object'], ~azure.ai.inference.models.JsonSchemaFormat]) – The format that the AI model must output. AI chat completions models typically output unformatted text by default. This is equivalent to setting “text” as the response_format. To output JSON format, without adhering to any schema, set to “json_object”. To output JSON format adhering to a provided schema, set this to an object of the class ~azure.ai.inference.models.JsonSchemaFormat. Default value is None.

  • stop (list[str]) – A collection of textual sequences that will end completions generation. Default value is None.

  • tools (list[ChatCompletionsToolDefinition]) – The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.

  • tool_choice (str or ChatCompletionsToolChoicePreset or ChatCompletionsNamedToolChoice) – If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, “_models.ChatCompletionsToolChoicePreset”] type or a ChatCompletionsNamedToolChoice type. Default value is None.

  • seed (int) – If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

Returns:

ChatCompletions for non-streaming, or AsyncIterable[StreamingChatCompletionsUpdate] for streaming.

Return type:

ChatCompletions or AsyncStreamingChatCompletions

Raises:

HttpResponseError

async get_model_info(**kwargs: Any) ModelInfo[source]

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Returns:

ModelInfo. The ModelInfo is compatible with MutableMapping

Return type:

ModelInfo

Raises:

HttpResponseError

send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) Awaitable[AsyncHttpResponse][source]

Runs the network request through the client’s chained policies.

>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = await client.send_request(request)
<AsyncHttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

Parameters:

request (HttpRequest) – The network request you want to make. Required.

Keyword Arguments:

stream (bool) – Whether the response payload will be streamed. Defaults to False.

Returns:

The response of your network call. Does not do error handling on your response.

Return type:

AsyncHttpResponse

class azure.ai.inference.aio.EmbeddingsClient(endpoint: str, credential: AzureKeyCredential | AsyncTokenCredential, *, dimensions: int | None = None, encoding_format: str | EmbeddingEncodingFormat | None = None, input_type: str | EmbeddingInputType | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)[source]

EmbeddingsClient.

Parameters:
  • endpoint (str) – Service endpoint URL for AI model inference. Required.

  • credential (AzureKeyCredential or AsyncTokenCredential) – Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a AsyncTokenCredential type. Required.

Keyword Arguments:
  • dimensions (int) – Optional. The number of dimensions the resulting output embeddings should have. Default value is None.

  • encoding_format (str or EmbeddingEncodingFormat) – Optional. The desired format for the returned embeddings. Known values are: “base64”, “binary”, “float”, “int8”, “ubinary”, and “uint8”. Default value is None.

  • input_type (str or EmbeddingInputType) – Optional. The type of the input. Known values are: “text”, “query”, and “document”. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

  • api_version (str) – The API version to use for this operation. Default value is “2024-05-01-preview”. Note that overriding this default value may result in unsupported behavior.

async close() None[source]
async embed(*, input: List[str], dimensions: int | None = None, encoding_format: str | _models.EmbeddingEncodingFormat | None = None, input_type: str | _models.EmbeddingInputType | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) _models.EmbeddingsResult[source]
async embed(body: JSON, *, content_type: str = 'application/json', **kwargs: Any) _models.EmbeddingsResult
async embed(body: IO[bytes], *, content_type: str = 'application/json', **kwargs: Any) _models.EmbeddingsResult

Return the embedding vectors for given text prompts. The method makes a REST API call to the /embeddings route on the given endpoint.

Parameters:

body (JSON or IO[bytes]) – Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required.

Keyword Arguments:
  • input (list[str]) – Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. Required.

  • dimensions (int) – Optional. The number of dimensions the resulting output embeddings should have. Default value is None.

  • encoding_format (str or EmbeddingEncodingFormat) – Optional. The desired format for the returned embeddings. Known values are: “base64”, “binary”, “float”, “int8”, “ubinary”, and “uint8”. Default value is None.

  • input_type (str or EmbeddingInputType) – Optional. The type of the input. Known values are: “text”, “query”, and “document”. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

Returns:

EmbeddingsResult. The EmbeddingsResult is compatible with MutableMapping

Return type:

EmbeddingsResult

Raises:

HttpResponseError

async get_model_info(**kwargs: Any) ModelInfo[source]

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Returns:

ModelInfo. The ModelInfo is compatible with MutableMapping

Return type:

ModelInfo

Raises:

HttpResponseError

send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) Awaitable[AsyncHttpResponse][source]

Runs the network request through the client’s chained policies.

>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = await client.send_request(request)
<AsyncHttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

Parameters:

request (HttpRequest) – The network request you want to make. Required.

Keyword Arguments:

stream (bool) – Whether the response payload will be streamed. Defaults to False.

Returns:

The response of your network call. Does not do error handling on your response.

Return type:

AsyncHttpResponse

class azure.ai.inference.aio.ImageEmbeddingsClient(endpoint: str, credential: AzureKeyCredential | AsyncTokenCredential, *, dimensions: int | None = None, encoding_format: str | EmbeddingEncodingFormat | None = None, input_type: str | EmbeddingInputType | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)[source]

ImageEmbeddingsClient.

Parameters:
  • endpoint (str) – Service endpoint URL for AI model inference. Required.

  • credential (AzureKeyCredential or AsyncTokenCredential) – Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a AsyncTokenCredential type. Required.

Keyword Arguments:
  • dimensions (int) – Optional. The number of dimensions the resulting output embeddings should have. Default value is None.

  • encoding_format (str or EmbeddingEncodingFormat) – Optional. The desired format for the returned embeddings. Known values are: “base64”, “binary”, “float”, “int8”, “ubinary”, and “uint8”. Default value is None.

  • input_type (str or EmbeddingInputType) – Optional. The type of the input. Known values are: “text”, “query”, and “document”. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

  • api_version (str) – The API version to use for this operation. Default value is “2024-05-01-preview”. Note that overriding this default value may result in unsupported behavior.

async close() None[source]
async embed(*, input: List[_models.ImageEmbeddingInput], dimensions: int | None = None, encoding_format: str | _models.EmbeddingEncodingFormat | None = None, input_type: str | _models.EmbeddingInputType | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) _models.EmbeddingsResult[source]
async embed(body: JSON, *, content_type: str = 'application/json', **kwargs: Any) _models.EmbeddingsResult
async embed(body: IO[bytes], *, content_type: str = 'application/json', **kwargs: Any) _models.EmbeddingsResult

Return the embedding vectors for given images. The method makes a REST API call to the /images/embeddings route on the given endpoint.

Parameters:

body (JSON or IO[bytes]) – Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required.

Keyword Arguments:
  • input (list[ImageEmbeddingInput]) – Input image to embed. To embed multiple inputs in a single request, pass an array. The input must not exceed the max input tokens for the model. Required.

  • dimensions (int) – Optional. The number of dimensions the resulting output embeddings should have. Default value is None.

  • encoding_format (str or EmbeddingEncodingFormat) – Optional. The desired format for the returned embeddings. Known values are: “base64”, “binary”, “float”, “int8”, “ubinary”, and “uint8”. Default value is None.

  • input_type (str or EmbeddingInputType) – Optional. The type of the input. Known values are: “text”, “query”, and “document”. Default value is None.

  • model (str) – ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

  • model_extras (dict[str, Any]) – Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

Returns:

EmbeddingsResult. The EmbeddingsResult is compatible with MutableMapping

Return type:

EmbeddingsResult

Raises:

HttpResponseError

async get_model_info(**kwargs: Any) ModelInfo[source]

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

Returns:

ModelInfo. The ModelInfo is compatible with MutableMapping

Return type:

ModelInfo

Raises:

HttpResponseError

send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) Awaitable[AsyncHttpResponse][source]

Runs the network request through the client’s chained policies.

>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = await client.send_request(request)
<AsyncHttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

Parameters:

request (HttpRequest) – The network request you want to make. Required.

Keyword Arguments:

stream (bool) – Whether the response payload will be streamed. Defaults to False.

Returns:

The response of your network call. Does not do error handling on your response.

Return type:

AsyncHttpResponse

async azure.ai.inference.aio.load_client(endpoint: str, credential: AzureKeyCredential | AsyncTokenCredential, **kwargs: Any) ChatCompletionsClient | EmbeddingsClient | ImageEmbeddingsClient[source]

Load a client from a given endpoint URL. The method makes a REST API call to the /info route on the given endpoint, to determine the model type and therefore which client to instantiate. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint. Keyword arguments are passed through to the client constructor (you can set keywords such as api_version, user_agent, logging_enable etc. on the client constructor).

Parameters:
  • endpoint (str) – Service endpoint URL for AI model inference. Required.

  • credential (AzureKeyCredential or AsyncTokenCredential) – Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a AsyncTokenCredential type. Required.

Returns:

The appropriate asynchronous client associated with the given endpoint

Return type:

ChatCompletionsClient or EmbeddingsClient or ImageEmbeddingsClient

Raises:

HttpResponseError