azure.ai.ml.data_transfer package¶
- class azure.ai.ml.data_transfer.DataTransferCopy(*, component: str | DataTransferCopyComponent, compute: str | None = None, inputs: Dict[str, NodeOutput | Input | str] | None = None, outputs: Dict[str, str | Output] | None = None, data_copy_mode: str | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Base class for data transfer copy node.
You should not instantiate this class directly. Instead, you should create from builder function: copy_data.
- Parameters:
component (DataTransferCopyComponent) – Id or instance of the data transfer component/job to be run for the step
inputs (Dict[str, Union[NodeOutput, Input, str]]) – Inputs to the data transfer.
outputs (Dict[str, Union[str, Output, dict]]) – Mapping of output data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
data_copy_mode (str) – data copy mode in copy task, possible value is “merge_with_overwrite”, “fail_if_conflict”.
- Raises:
ValidationException – Raised if DataTransferCopy cannot be successfully validated. Details will be provided in the error message.
- clear() None. Remove all items from D.¶
- copy() a shallow copy of D¶
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dumps the job content into a file in YAML format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Raises:
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- pop(k[, d]) v, remove specified key and return the corresponding value.¶
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values¶
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property status: str | None¶
The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns:
Status of the job.
- Return type:
Optional[str]
- class azure.ai.ml.data_transfer.DataTransferCopyComponent(*, data_copy_mode: str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
DataTransfer copy component version, used to define a data transfer copy component.
- Parameters:
data_copy_mode (str) – Data copy mode in the copy task. Possible values are “merge_with_overwrite” and “fail_if_conflict”.
inputs (dict) – Mapping of input data bindings used in the job.
outputs (dict) – Mapping of output data bindings used in the job.
kwargs – Additional parameters for the data transfer copy component.
- Raises:
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dump the component content into a file in yaml format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property data_copy_mode: str | None¶
Data copy mode of the component.
- Returns:
Data copy mode of the component.
- Return type:
- property display_name: str | None¶
Display name of the component.
- Returns:
Display name of the component.
- Return type:
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property is_deterministic: bool | None¶
Whether the component is deterministic.
- Returns:
Whether the component is deterministic
- Return type:
- property task: str | None¶
Task type of the component.
- Returns:
Task type of the component.
- Return type:
- class azure.ai.ml.data_transfer.DataTransferExport(*, component: str | DataTransferCopyComponent | DataTransferImportComponent, compute: str | None = None, sink: Dict | Database | FileSystem | None = None, inputs: Dict[str, NodeOutput | Input | str] | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Base class for data transfer export node.
You should not instantiate this class directly. Instead, you should create from builder function: export_data.
- Parameters:
component (str) – Id of the data transfer built in component to be run for the step
sink (Union[Dict, Database, FileSystem]) – The sink of external data and databases.
inputs (Dict[str, Union[NodeOutput, Input, str, Input]]) – Mapping of input data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
- Raises:
ValidationException – Raised if DataTransferExport cannot be successfully validated. Details will be provided in the error message.
- clear() None. Remove all items from D.¶
- copy() a shallow copy of D¶
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dumps the job content into a file in YAML format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Raises:
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- pop(k[, d]) v, remove specified key and return the corresponding value.¶
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values¶
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property sink: Dict | Database | FileSystem | None¶
The sink of external data and databases.
- Returns:
The sink of external data and databases.
- Return type:
Union[None, Database, FileSystem]
- property status: str | None¶
The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns:
Status of the job.
- Return type:
Optional[str]
- class azure.ai.ml.data_transfer.DataTransferExportComponent(*, inputs: Dict | None = None, sink: Dict | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
DataTransfer export component version, used to define a data transfer export component.
- Parameters:
sink (Union[Dict, Database, FileSystem]) – The sink of external data and databases.
inputs (dict) – Mapping of input data bindings used in the job.
kwargs – Additional parameters for the data transfer export component.
- Raises:
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dump the component content into a file in yaml format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property display_name: str | None¶
Display name of the component.
- Returns:
Display name of the component.
- Return type:
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property is_deterministic: bool | None¶
Whether the component is deterministic.
- Returns:
Whether the component is deterministic
- Return type:
- property task: str | None¶
Task type of the component.
- Returns:
Task type of the component.
- Return type:
- class azure.ai.ml.data_transfer.DataTransferImport(*, component: str | DataTransferImportComponent, compute: str | None = None, source: Dict | Database | FileSystem | None = None, outputs: Dict[str, str | Output] | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Base class for data transfer import node.
You should not instantiate this class directly. Instead, you should create from builder function: import_data.
- Parameters:
component (str) – Id of the data transfer built in component to be run for the step
source (Union[Dict, Database, FileSystem]) – The data source of file system or database
outputs (Dict[str, Union[str, Output, dict]]) – Mapping of output data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
- Raises:
ValidationException – Raised if DataTransferImport cannot be successfully validated. Details will be provided in the error message.
- clear() None. Remove all items from D.¶
- copy() a shallow copy of D¶
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dumps the job content into a file in YAML format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Raises:
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items¶
- keys() a set-like object providing a view on D's keys¶
- pop(k[, d]) v, remove specified key and return the corresponding value.¶
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values¶
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property status: str | None¶
The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns:
Status of the job.
- Return type:
Optional[str]
- class azure.ai.ml.data_transfer.DataTransferImportComponent(*, source: Dict | None = None, outputs: Dict | None = None, **kwargs: Any)[source]¶
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
DataTransfer import component version, used to define a data transfer import component.
- Parameters:
- Raises:
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
- dump(dest: str | PathLike | IO, **kwargs: Any) None¶
Dump the component content into a file in yaml format.
- Parameters:
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property base_path: str¶
The base path of the resource.
- Returns:
The base path of the resource.
- Return type:
- property creation_context: SystemData | None¶
The creation context of the resource.
- Returns:
The creation metadata for the resource.
- Return type:
Optional[SystemData]
- property display_name: str | None¶
Display name of the component.
- Returns:
Display name of the component.
- Return type:
- property id: str | None¶
The resource ID.
- Returns:
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type:
Optional[str]
- property is_deterministic: bool | None¶
Whether the component is deterministic.
- Returns:
Whether the component is deterministic
- Return type:
- property task: str | None¶
Task type of the component.
- Returns:
Task type of the component.
- Return type:
- class azure.ai.ml.data_transfer.Database(*, query: str | None = None, table_name: str | None = None, stored_procedure: str | None = None, stored_procedure_params: List[Dict] | None = None, connection: str | None = None)[source]¶
Define a database class for a DataTransfer Component or Job.
- Keyword Arguments:
query (str) – The SQL query to retrieve data from the database.
table_name (str) – The name of the database table.
stored_procedure (str) – The name of the stored procedure.
stored_procedure_params (List) – The parameters for the stored procedure.
connection (str) – The connection string for the database. The credential information should be stored in the connection.
- Raises:
ValidationException – Raised if the Database object cannot be successfully validated. Details will be provided in the error message.
Example:
Create a database and querying a database table.¶from azure.ai.ml.entities._inputs_outputs import Database # For querying a database table source_database = Database(query="SELECT * FROM my_table", connection="azureml:my_azuresql_connection") # For invoking a stored procedure with parameters stored_procedure_params = [ {"name": "job", "value": "Engineer", "type": "String"}, {"name": "department", "value": "Engineering", "type": "String"}, ] source_database = Database( stored_procedure="SelectEmployeeByJobAndDepartment", stored_procedure_params=stored_procedure_params, connection="azureml:my_azuresql_connection", )
- class azure.ai.ml.data_transfer.FileSystem(*, path: str | None = None, connection: str | None = None)[source]¶
Define a file system class of a DataTransfer Component or Job.
e.g. source_s3 = FileSystem(path=’s3://my_bucket/my_folder’, connection=’azureml:my_s3_connection’)
- Parameters:
- Raises:
ValidationException – Raised if Source cannot be successfully validated. Details will be provided in the error message.
- azure.ai.ml.data_transfer.copy_data(*, name: str | None = None, description: str | None = None, tags: Dict | None = None, display_name: str | None = None, experiment_name: str | None = None, compute: str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, is_deterministic: bool = True, data_copy_mode: str | None = None, **kwargs: Any) DataTransferCopy[source]¶
Note
This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Create a DataTransferCopy object which can be used inside dsl.pipeline as a function.
- Keyword Arguments:
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
inputs (dict) – Mapping of inputs data bindings used in the job.
outputs (dict) – Mapping of outputs data bindings used in the job.
is_deterministic (bool) – Specify whether the command will return same output given same input. If a command (component) is deterministic, when use it as a node/step in a pipeline, it will reuse results from a previous submitted job in current workspace which has same inputs and settings. In this case, this step will not use any compute resource. Default to be True, specify is_deterministic=False if you would like to avoid such reuse behavior.
data_copy_mode (str) – data copy mode in copy task, possible value is “merge_with_overwrite”, “fail_if_conflict”.
- Returns:
A DataTransferCopy object.
- Return type:
- azure.ai.ml.data_transfer.export_data(*, name: str | None = None, description: str | None = None, tags: Dict | None = None, display_name: str | None = None, experiment_name: str | None = None, compute: str | None = None, sink: Dict | Database | FileSystem | None = None, inputs: Dict | None = None, **kwargs: Any) DataTransferExport[source]¶
Note
This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Create a DataTransferExport object which can be used inside dsl.pipeline.
- Keyword Arguments:
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
sink (Union[ Dict, Database, FileSystem]) – The sink of external data and databases.
inputs (dict) – Mapping of inputs data bindings used in the job.
- Returns:
A DataTransferExport object.
- Return type:
DataTransferExport
- Raises:
ValidationException – If sink is not provided or exporting file system is not supported.
- azure.ai.ml.data_transfer.import_data(*, name: str | None = None, description: str | None = None, tags: Dict | None = None, display_name: str | None = None, experiment_name: str | None = None, compute: str | None = None, source: Dict | Database | FileSystem | None = None, outputs: Dict | None = None, **kwargs: Any) DataTransferImport[source]¶
Note
This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Create a DataTransferImport object which can be used inside dsl.pipeline.
- Keyword Arguments:
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
source (Union[Dict, Database, FileSystem]) – The data source of file system or database.
outputs (dict) – Mapping of outputs data bindings used in the job. The default will be an output port with the key “sink” and type “mltable”.
- Returns:
A DataTransferImport object.
- Return type:
DataTransferImport