azure.storage.filedatalake.aio package

class azure.storage.filedatalake.aio.DataLakeDirectoryClient(account_url: str, file_system_name: str, directory_name: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | AsyncTokenCredential | None = None, **kwargs: Any)[source]

A client to interact with the DataLake directory, even if the directory may not yet exist.

For operations relating to a specific subdirectory or file under the directory, a directory client or file client can be retrieved using the get_sub_directory_client() or get_file_client() functions.

Variables:
  • url (str) – The full endpoint URL to the file system, including SAS token if used.

  • primary_endpoint (str) – The full primary endpoint URL.

  • primary_hostname (str) – The hostname of the primary endpoint.

Parameters:
  • account_url (str) – The URI to the storage account.

  • file_system_name (str) – The file system for the directory or files.

  • directory_name (str) – The whole path of the directory. eg. {directory under file system}/{directory to interact with}

  • credential (AzureNamedKeyCredential or AzureSasCredential or AsyncTokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential - except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:
  • api_version (str) – The Storage API version to use for requests. Default value is the most recent service version that is compatible with the current SDK. Setting to an older version may result in reduced feature compatibility.

  • audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Example:

Creating the DataLakeServiceClient from connection string.
from azure.storage.filedatalake.aio import DataLakeDirectoryClient
DataLakeDirectoryClient.from_connection_string(connection_string, "myfilesystem", "mydirectory")
classmethod from_connection_string(conn_str: str, file_system_name: str, directory_name: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any) Self[source]

Create DataLakeDirectoryClient from a Connection String.

Parameters:
  • conn_str (str) – A connection string to an Azure Storage account.

  • file_system_name (str) – The name of file system to interact with.

  • credential (AzureNamedKeyCredential or AzureSasCredential or TokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential - except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

  • directory_name (str) – The name of directory to interact with. The directory is under file system.

Keyword Arguments:

audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Returns:

A DataLakeDirectoryClient.

Return type:

DataLakeDirectoryClient

async acquire_lease(lease_duration: int | None = -1, lease_id: str | None = None, **kwargs) DataLakeLeaseClient

Requests a new lease. If the file or directory does not have an active lease, the DataLake service creates a lease on the file/directory and returns a new lease ID.

Parameters:
  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) – Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A DataLakeLeaseClient object, that can be run in a context manager.

Return type:

DataLakeLeaseClient

async close() None

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

async create_directory(metadata: Dict[str, str] | None = None, **kwargs) Dict[str, str | datetime][source]

Create a new directory.

Parameters:

metadata (dict(str, str)) – Name-value pairs associated with the directory as metadata.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • lease (DataLakeLeaseClient or str) – Required if the directory has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A dictionary of response headers.

Return type:

dict[str, str] or dict[str, datetime]

Example:

Create directory.
await directory_client.create_directory()
async create_file(file: FileProperties | str, **kwargs) DataLakeFileClient[source]

Create a new file and return the file client to be interacted with.

Parameters:

file (str or FileProperties) – The file with which to interact. This can either be the name of the file, or an instance of FileProperties.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • metadata – Name-value pairs associated with the file as metadata.

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • expires_on (datetime or int) – The time to set the file to expiry. If the type of expires_on is an int, expiration time will be set as the number of milliseconds elapsed from creation time. If the type of expires_on is datetime, expiration time will be set absolute to the time provided. If no time zone info is provided, this will be interpreted as UTC.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeFileClient with the new file.

Return type:

DataLakeDirectoryClient

async create_sub_directory(sub_directory: DirectoryProperties | str, metadata: Dict[str, str] | None = None, **kwargs) DataLakeDirectoryClient[source]

Create a subdirectory and return the subdirectory client to be interacted with.

Parameters:
  • sub_directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

  • metadata (dict(str, str)) – Name-value pairs associated with the file as metadata.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeDirectoryClient for the subdirectory.

Return type:

DataLakeDirectoryClient

async delete_directory(**kwargs) None[source]

Marks the specified directory for deletion.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

None.

Return type:

None

Example:

Delete directory.
await new_directory.delete_directory()
async delete_sub_directory(sub_directory: DirectoryProperties | str, **kwargs) DataLakeDirectoryClient[source]

Marks the specified subdirectory for deletion.

Parameters:

sub_directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeDirectoryClient for the subdirectory.

Return type:

DataLakeDirectoryClient

async exists(**kwargs: Any) bool[source]

Returns True if a directory exists and returns False otherwise.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

True if a directory exists, False otherwise.

Return type:

bool

async get_access_control(upn: bool | None = None, **kwargs) Dict[str, Any]

Get the owner, group, permissions, or access control list for a path.

Parameters:

upn (bool) – Optional. Valid only when Hierarchical Namespace is enabled for the account. If “true”, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names. If “false”, the values will be returned as Azure Active Directory Object IDs. The default value is false. Note that group and application Object IDs are not translated because they do not have unique friendly names.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

response dict containing access control options (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async get_directory_properties(**kwargs: Any) DirectoryProperties[source]

Returns all user-defined metadata, standard HTTP properties, and system properties for the directory. It does not return the content of the directory.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the directory or file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the directory was created with a customer-provided key.

  • upn (bool) – If True, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names in the owner, group, and acl fields of DirectoryProperties. If False, the values will be returned as Azure Active Directory Object IDs. The default value is False. Note that group and application Object IDs are not translate because they do not have unique friendly names.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

Information including user-defined metadata, standard HTTP properties, and system properties for the file or directory.

Return type:

DirectoryProperties

Example:

Getting the properties for a file/directory.
props = await new_directory.get_directory_properties()
get_file_client(file: FileProperties | str) DataLakeFileClient[source]

Get a client to interact with the specified file.

The file need not already exist.

Parameters:

file (str or FileProperties) – The file with which to interact. This can either be the name of the file, or an instance of FileProperties. eg. directory/subdirectory/file

Returns:

A DataLakeFileClient.

Return type:

DataLakeFileClient

get_paths(*, recursive: bool = True, max_results: int | None = None, upn: bool | None = None, timeout: int | None = None, **kwargs: Any) AsyncItemPaged[PathProperties][source]

Returns an async generator to list the paths under specified file system and directory. The generator will lazily follow the continuation tokens returned by the service.

Keyword Arguments:
  • recursive (bool) – Set True for recursive, False for iterative. The default value is True.

  • max_results (Optional[int]) – An optional value that specifies the maximum number of items to return per page. If omitted or greater than 5,000, the response will include up to 5,000 items per page.

  • upn (Optional[bool]) – If True, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names in the owner, group, and acl fields of PathProperties. If False, the values will be returned as Azure Active Directory Object IDs. The default value is None. Note that group and application Object IDs are not translate because they do not have unique friendly names.

  • timeout (Optional[int]) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here. The default value is None.

Returns:

An iterable (auto-paging) response of PathProperties.

Return type:

AsyncItemPaged[PathProperties]

get_sub_directory_client(sub_directory: DirectoryProperties | str) DataLakeDirectoryClient[source]

Get a client to interact with the specified subdirectory of the current directory.

The sub subdirectory need not already exist.

Parameters:

sub_directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

Returns:

A DataLakeDirectoryClient.

Return type:

DataLakeDirectoryClient

async remove_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Removes the Access Control on a path and sub-paths.

Parameters:

acl (str) – Removes POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, and a user or group identifier in the format “[scope:][type]:[id]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

async rename_directory(new_name: str, **kwargs) DataLakeDirectoryClient[source]

Rename the source directory.

Parameters:

new_name (str) – the new directory name the user want to rename to. The value must have the following format: “{filesystem}/{directory}/{subdirectory}”.

Keyword Arguments:
  • source_lease (DataLakeLeaseClient or str) – A lease ID for the source path. If specified, the source path must have an active lease and the lease ID must match.

  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • source_if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • source_if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • source_etag (str) – The source ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • source_match_condition (MatchConditions) – The source match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeDirectoryClient containing the renamed directory.

Return type:

DataLakeDirectoryClient

Example:

Rename the source directory.
new_dir_name = "testdir2"
print("Renaming the directory named '{}' to '{}'.".format(dir_name, new_dir_name))
new_directory = await directory_client\
    .rename_directory(new_name=directory_client.file_system_name + '/' + new_dir_name)
async set_access_control(owner: str | None = None, group: str | None = None, permissions: str | None = None, acl: str | None = None, **kwargs) Dict[str, str | datetime]

Set the owner, group, permissions, or access control list for a path.

Parameters:
  • owner (str) – Optional. The owner of the file or directory.

  • group (str) – Optional. The owning group of the file or directory.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported. permissions and acl are mutually exclusive.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”. permissions and acl are mutually exclusive.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

dict containing access control options after setting modifications (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async set_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Sets the Access Control on a path and sub-paths.

Parameters:

acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

async set_http_headers(content_settings: ContentSettings | None = None, **kwargs) Dict[str, Any]

Sets system properties on the file or directory.

If one property is set for the content_settings, all properties will be overridden.

Parameters:

content_settings (ContentSettings) – ContentSettings object used to set file/directory properties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, set_file_system_metadata only succeeds if the file system’s lease is active and matches this ID.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

file/directory-updated property dict (Etag and last modified)

Return type:

dict[str, Any]

async set_metadata(metadata: Dict[str, str], **kwargs) Dict[str, str | datetime]

Sets one or more user-defined name-value pairs for the specified file system. Each call to this operation replaces all existing metadata attached to the file system. To remove all metadata from the file system, call this operation with no metadata dict.

Parameters:

metadata (dict[str, str]) – A dict containing name-value pairs to associate with the file system as metadata. Example: {‘category’:’test’}

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, set_file_system_metadata only succeeds if the file system’s lease is active and matches this ID.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

file system-updated property dict (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async update_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Modifies the Access Control on a path and sub-paths.

Parameters:

acl (str) – Modifies POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single, change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

property api_version

The version of the Storage API used for requests.

Return type:

str

property location_mode

The location mode that the client is currently using.

By default this will be “primary”. Options include “primary” and “secondary”.

Return type:

str

property primary_endpoint

The full primary endpoint URL.

Return type:

str

property primary_hostname

The hostname of the primary endpoint.

Return type:

str

property secondary_endpoint

The full secondary endpoint URL if configured.

If not available a ValueError will be raised. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

str

Raises:

ValueError

property secondary_hostname

The hostname of the secondary endpoint.

If not available this will be None. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

Optional[str]

property url

The full endpoint URL to this entity, including SAS token if used.

This could be either the primary endpoint, or the secondary endpoint depending on the current location_mode(). :returns: The full endpoint URL to this entity, including SAS token if used. :rtype: str

class azure.storage.filedatalake.aio.DataLakeFileClient(account_url: str, file_system_name: str, file_path: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | AsyncTokenCredential | None = None, **kwargs: Any)[source]

A client to interact with the DataLake file, even if the file may not yet exist.

Variables:
  • url (str) – The full endpoint URL to the file system, including SAS token if used.

  • primary_endpoint (str) – The full primary endpoint URL.

  • primary_hostname (str) – The hostname of the primary endpoint.

Parameters:
  • account_url (str) – The URI to the storage account.

  • file_system_name (str) – The file system for the directory or files.

  • file_path (str) – The whole file path, so that to interact with a specific file. eg. “{directory}/{subdirectory}/{file}”

  • credential (AzureNamedKeyCredential or AzureSasCredential or AsyncTokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential - except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:
  • api_version (str) – The Storage API version to use for requests. Default value is the most recent service version that is compatible with the current SDK. Setting to an older version may result in reduced feature compatibility.

  • audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Example:

Creating the DataLakeServiceClient from connection string.
from azure.storage.filedatalake.aio import DataLakeFileClient
DataLakeFileClient.from_connection_string(connection_string, "myfilesystem", "mydirectory", "myfile")
classmethod from_connection_string(conn_str: str, file_system_name: str, file_path: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any) Self[source]

Create DataLakeFileClient from a Connection String.

Parameters:
  • conn_str (str) – A connection string to an Azure Storage account.

  • file_system_name (str) – The name of file system to interact with.

  • file_path (str) – The whole file path, so that to interact with a specific file. eg. “{directory}/{subdirectory}/{file}”

  • credential (AzureNamedKeyCredential or AzureSasCredential or TokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token, or the connection string already has shared access key values. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. Credentials provided here will take precedence over those in the connection string. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:

audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Returns:

A DataLakeFileClient.

Return type:

DataLakeFileClient

async acquire_lease(lease_duration: int | None = -1, lease_id: str | None = None, **kwargs) DataLakeLeaseClient

Requests a new lease. If the file or directory does not have an active lease, the DataLake service creates a lease on the file/directory and returns a new lease ID.

Parameters:
  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A DataLakeLeaseClient object, that can be run in a context manager.

Return type:

DataLakeLeaseClient

async append_data(data: bytes | str | Iterable | IO, offset: int, length: int | None = None, **kwargs) Dict[str, str | datetime | int][source]

Append data to the file.

Parameters:
  • data (bytes, str, Iterable[AnyStr], or IO[AnyStr]) – Content to be appended to file

  • offset (int) – start position of the data to be appended to.

  • length (int or None) – Size of the data in bytes.

Keyword Arguments:
  • flush (bool) – If true, will commit the data after it is appended.

  • validate_content (bool) – If true, calculates an MD5 hash of the block content. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the file.

  • lease_action (Literal["acquire", "auto-renew", "release", "acquire-release"]) –

    Used to perform lease operations along with appending data.

    ”acquire” - Acquire a lease. “auto-renew” - Re-new an existing lease. “release” - Release the lease once the operation is complete. Requires flush=True. “acquire-release” - Acquire a lease and release it once the operations is complete. Requires flush=True.

  • lease_duration (int) –

    Valid if lease_action is set to “acquire” or “acquire-release”.

    Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease or if lease_action is set to “acquire” or “acquire-release”. If the file has an existing lease, this will be used to access the file. If acquiring a new lease, this will be used as the new lease id. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

Returns:

dict of the response header.

Return type:

dict[str, str], dict[str, datetime], or dict[str, int]

Example:

Append data to the file.
await file_client.append_data(data=file_content[2048:3072], offset=2048, length=1024)
async close() None

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

async create_file(content_settings: ContentSettings | None = None, metadata: Dict[str, str] | None = None, **kwargs) Dict[str, str | datetime][source]

Create a new file.

Parameters:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • metadata (Optional[dict[str, str]]) – Name-value pairs associated with the file as metadata.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • expires_on (datetime or int) – The time to set the file to expiry. If the type of expires_on is an int, expiration time will be set as the number of milliseconds elapsed from creation time. If the type of expires_on is datetime, expiration time will be set absolute to the time provided. If no time zone info is provided, this will be interpreted as UTC.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

  • encryption_context (str) – Specifies the encryption context to set on the file.

Returns:

response dict (Etag and last modified).

Return type:

dict[str, str] or dict[datetime]

Example:

Create file.
file_client = filesystem_client.get_file_client(file_name)
await file_client.create_file()
async delete_file(**kwargs) None[source]

Marks the specified file for deletion.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

None.

Return type:

None

Example:

Delete file.
await new_client.delete_file()
async download_file(offset: int | None = None, length: int | None = None, **kwargs: Any) StorageStreamDownloader[source]

Downloads a file to the StorageStreamDownloader. The readall() method must be used to read all the content, or readinto() must be used to download the file into a stream. Using chunks() returns an async iterator which allows the user to iterate over the content in chunks.

Parameters:
  • offset (int) – Start of byte range to use for downloading a section of the file. Must be set if length is provided.

  • length (int) – Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, download only succeeds if the file’s lease is active and matches this ID. Required if the file has an active lease.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a Customer-Provided Key.

  • max_concurrency (int) – Maximum number of parallel connections to use when transferring the file in chunks. This option does not affect the underlying connection pool, and may require a separate configuration of the connection pool.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here. This method may make multiple calls to the service and the timeout will apply to each call individually.

Returns:

A streaming object (StorageStreamDownloader)

Return type:

StorageStreamDownloader

Example:

Return the downloaded data.
download = await file_client.download_file()
downloaded_bytes = await download.readall()
async exists(**kwargs: Any) bool[source]

Returns True if a file exists and returns False otherwise.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

True if a file exists, False otherwise.

Return type:

bool

async flush_data(offset: int, retain_uncommitted_data: bool | None = False, **kwargs) Dict[str, str | datetime][source]

Commit the previous appended data.

Parameters:
  • offset (int) – offset is equal to the length of the file after commit the previous appended data.

  • retain_uncommitted_data (bool) – Valid only for flush operations. If “true”, uncommitted data is retained after the flush operation completes; otherwise, the uncommitted data is deleted after the flush operation. The default is false. Data at offsets less than the specified position are written to the file when flush succeeds, but this optional parameter allows data after the flush position to be retained for a future flush operation.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • close (bool) – Azure Storage Events allow applications to receive notifications when files change. When Azure Storage Events are enabled, a file changed event is raised. This event has a property indicating whether this is the final change to distinguish the difference between an intermediate flush to a file stream and the final close of a file stream. The close query parameter is valid only when the action is “flush” and change notifications are enabled. If the value of close is “true” and the flush operation completes successfully, the service raises a file change notification with a property indicating that this is the final update (the file stream has been closed). If “false” a change notification is raised indicating the file has changed. The default is false. This query parameter is set to true by the Hadoop ABFS driver to indicate that the file stream has been closed.”

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • lease_action (Literal["acquire", "auto-renew", "release", "acquire-release"]) –

    Used to perform lease operations along with appending data.

    ”acquire” - Acquire a lease. “auto-renew” - Re-new an existing lease. “release” - Release the lease once the operation is complete. “acquire-release” - Acquire a lease and release it once the operations is complete.

  • lease_duration (int) –

    Valid if lease_action is set to “acquire” or “acquire-release”.

    Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease or if lease_action is set to “acquire” or “acquire-release”. If the file has an existing lease, this will be used to access the file. If acquiring a new lease, this will be used as the new lease id. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

Returns:

response header in dict.

Return type:

dict[str, str] or dict[str, datetime]

Example:

Commit the previous appended data.
file_client = file_system_client.get_file_client("myfile")
await file_client.create_file()
with open(SOURCE_FILE, "rb") as data:
    length = data.tell()
    await file_client.append_data(data, 0)
    await file_client.flush_data(length)
async get_access_control(upn: bool | None = None, **kwargs) Dict[str, Any]

Get the owner, group, permissions, or access control list for a path.

Parameters:

upn (bool) – Optional. Valid only when Hierarchical Namespace is enabled for the account. If “true”, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names. If “false”, the values will be returned as Azure Active Directory Object IDs. The default value is false. Note that group and application Object IDs are not translated because they do not have unique friendly names.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

response dict containing access control options (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async get_file_properties(**kwargs: Any) FileProperties[source]

Returns all user-defined metadata, standard HTTP properties, and system properties for the file. It does not return the content of the file.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the directory or file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a customer-provided key.

  • upn (bool) – If True, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names in the owner, group, and acl fields of FileProperties. If False, the values will be returned as Azure Active Directory Object IDs. The default value is False. Note that group and application Object IDs are not translate because they do not have unique friendly names.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

All user-defined metadata, standard HTTP properties, and system properties for the file.

Return type:

FileProperties

Example:

Getting the properties for a file.
properties = await file_client.get_file_properties()
query_file(query_expression: str, **kwargs: Any) DataLakeFileQueryReader[source]

Enables users to select/project on datalake file data by providing simple query expressions. This operations returns a DataLakeFileQueryReader, users need to use readall() or readinto() to get query data.

Parameters:

query_expression (str) – Required. a query statement. eg. Select * from DataLakeStorage

Keyword Arguments:
  • on_error (Callable[DataLakeFileQueryError]) – A function to be called on any processing errors returned by the service.

  • file_format (DelimitedTextDialect or DelimitedJsonDialect or QuickQueryDialect or str) – Optional. Defines the serialization of the data currently stored in the file. The default is to treat the file data as CSV data formatted in the default dialect. This can be overridden with a custom DelimitedTextDialect, or DelimitedJsonDialect or “ParquetDialect” (passed as a string or enum). These dialects can be passed through their respective classes, the QuickQueryDialect enum or as a string.

  • output_format (DelimitedTextDialect or DelimitedJsonDialect or list[ArrowDialect] or QuickQueryDialect or str) – Optional. Defines the output serialization for the data stream. By default the data will be returned as it is represented in the file. By providing an output format, the file data will be reformatted according to that profile. This value can be a DelimitedTextDialect or a DelimitedJsonDialect or ArrowDialect. These dialects can be passed through their respective classes, the QuickQueryDialect enum or as a string.

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a Customer-Provided Key.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A streaming object (DataLakeFileQueryReader)

Return type:

DataLakeFileQueryReader

Example:

select/project on datalake file data by providing simple query expressions.
errors = []
def on_error(error):
    errors.append(error)

# upload the csv file
file_client = datalake_service_client.get_file_client(filesystem_name, "csvfile")
file_client.upload_data(CSV_DATA, overwrite=True)

# select the second column of the csv file
query_expression = "SELECT _2 from DataLakeStorage"
input_format = DelimitedTextDialect(delimiter=',', quotechar='"', lineterminator='\n', escapechar="", has_header=False)
output_format = DelimitedJsonDialect(delimiter='\n')
reader = file_client.query_file(query_expression, on_error=on_error, file_format=input_format, output_format=output_format)
content = reader.readall()
async remove_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Removes the Access Control on a path and sub-paths.

Parameters:

acl (str) – Removes POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, and a user or group identifier in the format “[scope:][type]:[id]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

async rename_file(new_name: str, **kwargs: Any) DataLakeFileClient[source]

Rename the source file.

Parameters:

new_name (str) – the new file name the user want to rename to. The value must have the following format: “{filesystem}/{directory}/{subdirectory}/{file}”.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • source_lease (DataLakeLeaseClient or str) – A lease ID for the source path. If specified, the source path must have an active lease and the lease ID must match.

  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • source_if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • source_if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • source_etag (str) – The source ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • source_match_condition (MatchConditions) – The source match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

the renamed file client

Return type:

DataLakeFileClient

Example:

Rename the source file.
new_client = await file_client.rename_file(file_client.file_system_name + '/' + 'newname')
async set_access_control(owner: str | None = None, group: str | None = None, permissions: str | None = None, acl: str | None = None, **kwargs) Dict[str, str | datetime]

Set the owner, group, permissions, or access control list for a path.

Parameters:
  • owner (str) – Optional. The owner of the file or directory.

  • group (str) – Optional. The owning group of the file or directory.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported. permissions and acl are mutually exclusive.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”. permissions and acl are mutually exclusive.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

dict containing access control options after setting modifications (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async set_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Sets the Access Control on a path and sub-paths.

Parameters:

acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

async set_file_expiry(expiry_options: str, expires_on: datetime | int | None = None, **kwargs) None[source]

Sets the time a file will expire and be deleted.

Parameters:
  • expiry_options (str) – Required. Indicates mode of the expiry time. Possible values include: ‘NeverExpire’, ‘RelativeToCreation’, ‘RelativeToNow’, ‘Absolute’

  • expires_on (datetime or int) – The time to set the file to expiry. When expiry_options is RelativeTo*, expires_on should be an int in milliseconds

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Return type:

None

async set_http_headers(content_settings: ContentSettings | None = None, **kwargs) Dict[str, Any]

Sets system properties on the file or directory.

If one property is set for the content_settings, all properties will be overridden.

Parameters:

content_settings (ContentSettings) – ContentSettings object used to set file/directory properties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, set_file_system_metadata only succeeds if the file system’s lease is active and matches this ID.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

file/directory-updated property dict (Etag and last modified)

Return type:

dict[str, Any]

async set_metadata(metadata: Dict[str, str], **kwargs) Dict[str, str | datetime]

Sets one or more user-defined name-value pairs for the specified file system. Each call to this operation replaces all existing metadata attached to the file system. To remove all metadata from the file system, call this operation with no metadata dict.

Parameters:

metadata (dict[str, str]) – A dict containing name-value pairs to associate with the file system as metadata. Example: {‘category’:’test’}

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, set_file_system_metadata only succeeds if the file system’s lease is active and matches this ID.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

file system-updated property dict (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

async update_access_control_recursive(acl: str, **kwargs: Any) AccessControlChangeResult

Modifies the Access Control on a path and sub-paths.

Parameters:

acl (str) – Modifies POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

Keyword Arguments:
  • progress_hook (func(AccessControlChanges)) – Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

  • continuation_token (str) – Optional continuation token that can be used to resume previously stopped operation.

  • batch_size (int) – Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

  • max_batches (int) – Optional. Defines maximum number of batches that single, change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

  • continue_on_failure (bool) – If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Return type:

:~azure.storage.filedatalake.AccessControlChangeResult`

Raises:

AzureError – User can restart the operation using continuation_token field of AzureError if the token is available.

async upload_data(data: bytes | str | Iterable | AsyncIterable | IO, length: int | None = None, overwrite: bool | None = False, **kwargs) Dict[str, Any][source]

Upload data to a file.

Parameters:
  • data (bytes, str, Iterable[AnyStr], AsyncIterable[AnyStr], or IO[AnyStr]) – Content to be uploaded to file

  • length (int) – Size of the data in bytes.

  • overwrite (bool) – to overwrite an existing file or not.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • metadata (dict[str, str] or None) – Name-value pairs associated with the blob as metadata.

  • lease (DataLakeLeaseClient or str) – Required if the blob has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • validate_content (bool) – If true, calculates an MD5 hash for each chunk of the file. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https, as https (the default), will already validate. Note that this MD5 hash is not stored with the blob. Also note that if enabled, the memory-efficient upload algorithm will not be used because computing the MD5 hash requires buffering entire blocks, and doing so defeats the purpose of the memory-efficient algorithm.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • cpk (CustomerProvidedEncryptionKey) – Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here. This method may make multiple calls to the service and the timeout will apply to each call individually.

  • max_concurrency (int) – Maximum number of parallel connections to use when transferring the file in chunks. This option does not affect the underlying connection pool, and may require a separate configuration of the connection pool.

  • chunk_size (int) – The maximum chunk size for uploading a file in chunks. Defaults to 100*1024*1024, or 100MB.

  • encryption_context (str) – Specifies the encryption context to set on the file.

Returns:

response dict (Etag and last modified).

Return type:

dict[str, Any]

property api_version

The version of the Storage API used for requests.

Return type:

str

property location_mode

The location mode that the client is currently using.

By default this will be “primary”. Options include “primary” and “secondary”.

Return type:

str

property primary_endpoint

The full primary endpoint URL.

Return type:

str

property primary_hostname

The hostname of the primary endpoint.

Return type:

str

property secondary_endpoint

The full secondary endpoint URL if configured.

If not available a ValueError will be raised. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

str

Raises:

ValueError

property secondary_hostname

The hostname of the secondary endpoint.

If not available this will be None. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

Optional[str]

property url

The full endpoint URL to this entity, including SAS token if used.

This could be either the primary endpoint, or the secondary endpoint depending on the current location_mode(). :returns: The full endpoint URL to this entity, including SAS token if used. :rtype: str

class azure.storage.filedatalake.aio.DataLakeLeaseClient(client: FileSystemClient | DataLakeDirectoryClient | DataLakeFileClient, lease_id: str | None = None)[source]

Creates a new DataLakeLeaseClient.

This client provides lease operations on a FileSystemClient, DataLakeDirectoryClient or DataLakeFileClient.

Variables:
  • id (str) – The ID of the lease currently being maintained. This will be None if no lease has yet been acquired.

  • etag (str) – The ETag of the lease currently being maintained. This will be None if no lease has yet been acquired or modified.

  • last_modified (datetime) – The last modified timestamp of the lease currently being maintained. This will be None if no lease has yet been acquired or modified.

Parameters:
  • client (FileSystemClient or DataLakeDirectoryClient or DataLakeFileClient) – The client of the file system, directory, or file to lease.

  • lease_id (str) – A string representing the lease ID of an existing lease. This value does not need to be specified in order to acquire a new lease, or break one.

async acquire(lease_duration: int = -1, **kwargs: int | None) None[source]

Requests a new lease.

If the file/file system does not have an active lease, the DataLake service creates a lease on the file/file system and returns a new lease ID.

Parameters:

lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Return type:

None

async break_lease(lease_break_period: int | None = None, **kwargs: Any) int[source]

Break the lease, if the file system or file has an active lease.

Once a lease is broken, it cannot be renewed. Any authorized request can break the lease; the request is not required to specify a matching lease ID. When a lease is broken, the lease break period is allowed to elapse, during which time no lease operation except break and release can be performed on the file system or file. When a lease is successfully broken, the response indicates the interval in seconds until a new lease can be acquired.

Parameters:

lease_break_period (int) – This is the proposed duration of seconds that the lease should continue before it is broken, between 0 and 60 seconds. This break period is only used if it is shorter than the time remaining on the lease. If longer, the time remaining on the lease is used. A new lease will not be available before the break period has expired, but the lease may be held for longer than the break period. If this header does not appear with a break operation, a fixed-duration lease breaks after the remaining lease period elapses, and an infinite lease breaks immediately.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

Approximate time remaining in the lease period, in seconds.

Return type:

int

async change(proposed_lease_id: str, **kwargs: Any) None[source]

Change the lease ID of an active lease.

Parameters:

proposed_lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

None

async release(**kwargs: Any) None[source]

Release the lease.

The lease may be released if the client lease id specified matches that associated with the file system or file. Releasing the lease allows another client to immediately acquire the lease for the file system or file as soon as the release is complete.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

None

async renew(**kwargs: Any) None[source]

Renews the lease.

The lease can be renewed if the lease ID specified in the lease client matches that associated with the file system or file. Note that the lease may be renewed even if it has expired as long as the file system or file has not been leased again since the expiration of that lease. When you renew a lease, the lease duration clock resets.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

None

class azure.storage.filedatalake.aio.DataLakeServiceClient(account_url: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | AsyncTokenCredential | None = None, **kwargs: Any)[source]

A client to interact with the DataLake Service at the account level.

This client provides operations to retrieve and configure the account properties as well as list, create and delete file systems within the account. For operations relating to a specific file system, directory or file, clients for those entities can also be retrieved using the get_client functions.

Variables:
  • url (str) – The full endpoint URL to the datalake service endpoint.

  • primary_endpoint (str) – The full primary endpoint URL.

  • primary_hostname (str) – The hostname of the primary endpoint.

Parameters:
  • account_url (str) – The URL to the DataLake storage account. Any other entities included in the URL path (e.g. file system or file) will be discarded. This URL can be optionally authenticated with a SAS token.

  • credential (AzureNamedKeyCredential or AzureSasCredential or AsyncTokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential - except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:
  • api_version (str) – The Storage API version to use for requests. Default value is the most recent service version that is compatible with the current SDK. Setting to an older version may result in reduced feature compatibility.

  • audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Example:

Creating the DataLakeServiceClient from connection string.
from azure.storage.filedatalake.aio import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(connection_string)
Creating the DataLakeServiceClient with Azure Identity credentials.
from azure.identity.aio import DefaultAzureCredential
token_credential = DefaultAzureCredential()
datalake_service_client = DataLakeServiceClient("https://{}.dfs.core.windows.net".format(account_name),
                                                credential=token_credential)
classmethod from_connection_string(conn_str: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any) Self[source]

Create DataLakeServiceClient from a Connection String.

Parameters:
  • conn_str (str) – A connection string to an Azure Storage account.

  • credential (AzureNamedKeyCredential or AzureSasCredential or TokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token, or the connection string already has shared access key values. The value can be a SAS token string, an instance of a AzureSasCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. Credentials provided here will take precedence over those in the connection string.

Keyword Arguments:

audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Returns:

A DataLakeServiceClient.

Return type:

DataLakeServiceClient

Example:

Creating the DataLakeServiceClient from a connection string.
from azure.storage.filedatalake import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(self.connection_string)
async close() None[source]

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

async create_file_system(file_system: FileSystemProperties | str, metadata: Dict[str, str] | None = None, public_access: PublicAccess | None = None, **kwargs) FileSystemClient[source]

Creates a new file system under the specified account.

If the file system with the same name already exists, a ResourceExistsError will be raised. This method returns a client with which to interact with the newly created file system.

Parameters:
  • file_system (str) – The name of the file system to create.

  • metadata (dict(str, str)) – A dict with name-value pairs to associate with the file system as metadata. Example: {‘Category’:’test’}

  • public_access (PublicAccess) – Possible values include: file system, file.

Keyword Arguments:
Returns:

FileSystemClient under the specified account.

Return type:

FileSystemClient

Example:

Creating a file system in the datalake service.
await datalake_service_client.create_file_system("filesystem")
async delete_file_system(file_system: FileSystemProperties | str, **kwargs) FileSystemClient[source]

Marks the specified file system for deletion.

The file system and any files contained within it are later deleted during garbage collection. If the file system is not found, a ResourceNotFoundError will be raised.

Parameters:

file_system (str or FileSystemProperties) – The file system to delete. This can either be the name of the file system, or an instance of FileSystemProperties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, delete_file_system only succeeds if the file system’s lease is active and matches this ID. Required if the file system has an active lease.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

FileSystemClient after marking the specified file system for deletion.

Return type:

FileSystemClient

Example:

Deleting a file system in the datalake service.
await datalake_service_client.delete_file_system("filesystem")
get_directory_client(file_system: FileSystemProperties | str, directory: DirectoryProperties | str) DataLakeDirectoryClient[source]

Get a client to interact with the specified directory.

The directory need not already exist.

Parameters:
  • file_system (str or FileSystemProperties) – The file system that the directory is in. This can either be the name of the file system, or an instance of FileSystemProperties.

  • directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

Returns:

A DataLakeDirectoryClient.

Return type:

DataLakeDirectoryClient

Example:

Getting the directory client to interact with a specific directory.
directory_client = datalake_service_client.get_directory_client(file_system_client.file_system_name,
                                                                "mydirectory")
get_file_client(file_system: FileSystemProperties | str, file_path: FileProperties | str) DataLakeFileClient[source]

Get a client to interact with the specified file.

The file need not already exist.

Parameters:
  • file_system (str or FileSystemProperties) – The file system that the file is in. This can either be the name of the file system, or an instance of FileSystemProperties.

  • file_path (str or FileProperties) – The file with which to interact. This can either be the full path of the file(from the root directory), or an instance of FileProperties. eg. directory/subdirectory/file

Returns:

A DataLakeFileClient.

Return type:

DataLakeFileClient

Example:

Getting the file client to interact with a specific file.
file_client = datalake_service_client.get_file_client(file_system_client.file_system_name, "myfile")
get_file_system_client(file_system: FileSystemProperties | str) FileSystemClient[source]

Get a client to interact with the specified file system.

The file system need not already exist.

Parameters:

file_system (str or FileSystemProperties) – The file system. This can either be the name of the file system, or an instance of FileSystemProperties.

Returns:

A FileSystemClient.

Return type:

FileSystemClient

Example:

Getting the file system client to interact with a specific file system.
# Instantiate a DataLakeServiceClient using a connection string
from azure.storage.filedatalake.aio import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(self.connection_string)

async with datalake_service_client:
    # Instantiate a FileSystemClient
    file_system_client = datalake_service_client.get_file_system_client("mynewfilesystems")
async get_service_properties(**kwargs: Any) Dict[str, Any][source]

Gets the properties of a storage account’s datalake service, including Azure Storage Analytics.

Added in version 12.4.0: This operation was introduced in API version ‘2020-06-12’.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

An object containing datalake service properties such as analytics logging, hour/minute metrics, cors rules, etc.

Return type:

dict[str, Any]

async get_user_delegation_key(key_start_time: datetime, key_expiry_time: datetime, **kwargs: Any) UserDelegationKey[source]

Obtain a user delegation key for the purpose of signing SAS tokens. A token credential must be present on the service object for this request to succeed.

Parameters:
  • key_start_time (datetime) – A DateTime value. Indicates when the key becomes valid.

  • key_expiry_time (datetime) – A DateTime value. Indicates when the key stops being valid.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

The user delegation key.

Return type:

UserDelegationKey

Example:

Get user delegation key from datalake service client.
from datetime import datetime, timedelta
user_delegation_key = await datalake_service_client.get_user_delegation_key(datetime.utcnow(),
                                                                      datetime.utcnow() + timedelta(hours=1))
list_file_systems(name_starts_with: str | None = None, include_metadata: bool | None = None, **kwargs) ItemPaged[FileSystemProperties][source]

Returns a generator to list the file systems under the specified account.

The generator will lazily follow the continuation tokens returned by the service and stop when all file systems have been returned.

Parameters:
  • name_starts_with (str) – Filters the results to return only file systems whose names begin with the specified prefix.

  • include_metadata (bool) – Specifies that file system metadata be returned in the response. The default value is False.

Keyword Arguments:
  • results_per_page (int) – The maximum number of file system names to retrieve per API call. If the request does not specify the server will return up to 5,000 items per page.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

  • include_deleted (bool) – Specifies that deleted file systems to be returned in the response. This is for file system restore enabled account. The default value is False. .. versionadded:: 12.3.0

  • include_system (bool) – Flag specifying that system filesystems should be included. .. versionadded:: 12.6.0

Returns:

An iterable (auto-paging) of FileSystemProperties.

Return type:

ItemPaged[FileSystemProperties]

Example:

Listing the file systems in the datalake service.
file_systems = datalake_service_client.list_file_systems()
async for file_system in file_systems:
    print(file_system.name)
async set_service_properties(**kwargs: Any) None[source]

Sets the properties of a storage account’s Datalake service, including Azure Storage Analytics.

If an element (e.g. analytics_logging) is left as None, the existing settings on the service for that functionality are preserved.

Added in version 12.4.0: This operation was introduced in API version ‘2020-06-12’.

Keyword Arguments:
  • analytics_logging – Groups the Azure Analytics Logging settings.

  • hour_metrics – The hour metrics settings provide a summary of request statistics grouped by API in hourly aggregates.

  • minute_metrics – The minute metrics settings provide request statistics for each minute.

  • cors – You can include up to five CorsRule elements in the list. If an empty list is specified, all CORS rules will be deleted, and CORS will be disabled for the service.

  • target_version (str) – Indicates the default version to use for requests if an incoming request’s version is not specified.

  • delete_retention_policy – The delete retention policy specifies whether to retain deleted files/directories. It also specifies the number of days and versions of file/directory to keep.

  • static_website – Specifies whether the static website feature is enabled, and if yes, indicates the index document and 404 error document to use.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

async undelete_file_system(name: str, deleted_version: str, **kwargs: Any) FileSystemClient[source]

Restores soft-deleted filesystem.

Operation will only be successful if used within the specified number of days set in the delete retention policy.

Added in version 12.3.0: This operation was introduced in API version ‘2019-12-12’.

Parameters:
  • name (str) – Specifies the name of the deleted filesystem to restore.

  • deleted_version (str) – Specifies the version of the deleted filesystem to restore.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

The FileSystemClient of the restored soft-deleted filesystem.

Return type:

FileSystemClient

property api_version

The version of the Storage API used for requests.

Return type:

str

property location_mode

The location mode that the client is currently using.

By default this will be “primary”. Options include “primary” and “secondary”.

Return type:

str

property primary_endpoint

The full primary endpoint URL.

Return type:

str

property primary_hostname

The hostname of the primary endpoint.

Return type:

str

property secondary_endpoint

The full secondary endpoint URL if configured.

If not available a ValueError will be raised. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

str

Raises:

ValueError

property secondary_hostname

The hostname of the secondary endpoint.

If not available this will be None. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

Optional[str]

property url

The full endpoint URL to this entity, including SAS token if used.

This could be either the primary endpoint, or the secondary endpoint depending on the current location_mode(). :returns: The full endpoint URL to this entity, including SAS token if used. :rtype: str

class azure.storage.filedatalake.aio.ExponentialRetry(initial_backoff: int = 15, increment_base: int = 3, retry_total: int = 3, retry_to_secondary: bool = False, random_jitter_range: int = 3, **kwargs)[source]

Exponential retry.

Constructs an Exponential retry object. The initial_backoff is used for the first retry. Subsequent retries are retried after initial_backoff + increment_power^retry_count seconds. For example, by default the first retry occurs after 15 seconds, the second after (15+3^1) = 18 seconds, and the third after (15+3^2) = 24 seconds.

Parameters:
  • initial_backoff (int) – The initial backoff interval, in seconds, for the first retry.

  • increment_base (int) – The base, in seconds, to increment the initial_backoff by after the first retry.

  • max_attempts (int) – The maximum number of retry attempts.

  • retry_to_secondary (bool) – Whether the request should be retried to secondary, if able. This should only be enabled of RA-GRS accounts are used and potentially stale data can be handled.

  • random_jitter_range (int) – A number in seconds which indicates a range to jitter/randomize for the back-off interval. For example, a random_jitter_range of 3 results in the back-off interval x to vary between x+3 and x-3.

configure_retries(request: PipelineRequest) Dict[str, Any]
get_backoff_time(settings: Dict[str, Any]) float[source]

Calculates how long to sleep before retrying.

Parameters:

settings (Dict[str, Any]) – The configurable values pertaining to the backoff time.

Returns:

An integer indicating how long to wait before retrying the request, or None to indicate no retry should be performed.

Return type:

int or None

increment(settings: Dict[str, Any], request: PipelineRequest, response: PipelineResponse | None = None, error: AzureError | None = None) bool

Increment the retry counters.

Parameters:
  • settings (Dict[str, Any]) – The configurable values pertaining to the increment operation.

  • request (PipelineRequest) – A pipeline request object.

  • response (Optional[PipelineResponse]) – A pipeline response object.

  • error (Optional[AzureError]) – An error encountered during the request, or None if the response was received successfully.

Returns:

Whether the retry attempts are exhausted.

Return type:

bool

async send(request)

Abstract send method for a synchronous pipeline. Mutates the request.

Context content is dependent on the HttpTransport.

Parameters:

request (PipelineRequest) – The pipeline request object

Returns:

The pipeline response object.

Return type:

PipelineResponse

async sleep(settings, transport)
connect_retries: int

The max number of connect retries.

increment_base: int

The base, in seconds, to increment the initial_backoff by after the first retry.

initial_backoff: int

The initial backoff interval, in seconds, for the first retry.

next: HTTPPolicy[HTTPRequestType, HTTPResponseType]

Pointer to the next policy or a transport (wrapped as a policy). Will be set at pipeline creation.

random_jitter_range: int

A number in seconds which indicates a range to jitter/randomize for the back-off interval.

retry_read: int

The max number of read retries.

retry_status: int

The max number of status retries.

retry_to_secondary: bool

Whether the secondary endpoint should be retried.

total_retries: int

The max number of retries.

class azure.storage.filedatalake.aio.FileSystemClient(account_url: str, file_system_name: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | AsyncTokenCredential | None = None, **kwargs: Any)[source]

A client to interact with a specific file system, even if that file system may not yet exist.

For operations relating to a specific directory or file within this file system, a directory client or file client can be retrieved using the get_directory_client() or get_file_client() functions.

Variables:
  • url (str) – The full endpoint URL to the file system, including SAS token if used.

  • primary_endpoint (str) – The full primary endpoint URL.

  • primary_hostname (str) – The hostname of the primary endpoint.

Parameters:
  • account_url (str) – The URI to the storage account.

  • file_system_name (str) – The file system for the directory or files.

  • credential (AzureNamedKeyCredential or AzureSasCredential or AsyncTokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential - except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:
  • api_version (str) – The Storage API version to use for requests. Default value is the most recent service version that is compatible with the current SDK. Setting to an older version may result in reduced feature compatibility.

  • audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Example:

Get a FileSystemClient from an existing DataLakeServiceClient.
# Instantiate a DataLakeServiceClient using a connection string
from azure.storage.filedatalake.aio import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(self.connection_string)

async with datalake_service_client:
    # Instantiate a FileSystemClient
    file_system_client = datalake_service_client.get_file_system_client("mynewfilesystems")
classmethod from_connection_string(conn_str: str, file_system_name: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any) Self[source]

Create FileSystemClient from a Connection String.

Parameters:
  • conn_str (str) – A connection string to an Azure Storage account.

  • file_system_name (str) – The name of file system to interact with.

  • credential (AzureNamedKeyCredential or AzureSasCredential or TokenCredential or str or dict[str, str] or None) – The credentials with which to authenticate. This is optional if the account URL already has a SAS token, or the connection string already has shared access key values. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. Credentials provided here will take precedence over those in the connection string. If using an instance of AzureNamedKeyCredential, “name” should be the storage account name, and “key” should be the storage account key.

Keyword Arguments:

audience (str) – The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://<account>.blob.core.windows.net.

Returns:

A FileSystemClient.

Return type:

FileSystemClient

Example:

Create FileSystemClient from connection string
from azure.storage.filedatalake import FileSystemClient
file_system_client = FileSystemClient.from_connection_string(self.connection_string, "filesystem")
async acquire_lease(lease_duration: int = -1, lease_id: str | None = None, **kwargs) DataLakeLeaseClient[source]

Requests a new lease. If the file system does not have an active lease, the DataLake service creates a lease on the file system and returns a new lease ID.

Parameters:
  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

Keyword Arguments:
  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

A DataLakeLeaseClient object, that can be run in a context manager.

Return type:

DataLakeLeaseClient

Example:

Acquiring a lease on the file_system.
# Acquire a lease on the file system
lease = await file_system_client.acquire_lease()

# Delete file system by passing in the lease
await file_system_client.delete_file_system(lease=lease)
async close() None[source]

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

async create_directory(directory: DirectoryProperties | str, metadata: Dict[str, str] | None = None, **kwargs) DataLakeDirectoryClient[source]

Create directory

Parameters:
  • directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

  • metadata (dict(str, str)) – Name-value pairs associated with the file as metadata.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeDirectoryClient with new directory and metadata.

Return type:

DataLakeDirectoryClient

Example:

Create directory in the file system.
directory_client = await file_system_client.create_directory("mydirectory")
async create_file(file: FileProperties | str, **kwargs) DataLakeFileClient[source]

Create file

Parameters:

file (str or FileProperties) – The file with which to interact. This can either be the name of the file, or an instance of FileProperties.

Keyword Arguments:
  • content_settings (ContentSettings) – ContentSettings object used to set path properties.

  • metadata (dict[str, str]) – Name-value pairs associated with the file as metadata.

  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • umask (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

  • owner (str) – The owner of the file or directory.

  • group (str) – The owning group of the file or directory.

  • acl (str) – Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format “[scope:][type]:[id]:[permissions]”.

  • lease_id (str) – Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

  • lease_duration (int) – Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

  • expires_on (datetime or int) – The time to set the file to expiry. If the type of expires_on is an int, expiration time will be set as the number of milliseconds elapsed from creation time. If the type of expires_on is datetime, expiration time will be set absolute to the time provided. If no time zone info is provided, this will be interpreted as UTC.

  • permissions (str) – Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeFileClient with new file created.

Return type:

DataLakeFileClient

Example:

Create file in the file system.
file_client = await file_system_client.create_file("myfile")
async create_file_system(metadata: Dict[str, str] | None = None, public_access: PublicAccess | None = None, **kwargs) Dict[str, str | datetime][source]

Creates a new file system under the specified account.

If the file system with the same name already exists, a ResourceExistsError will be raised. This method returns a client with which to interact with the newly created file system.

Parameters:
  • metadata (dict(str, str)) – A dict with name-value pairs to associate with the file system as metadata. Example: {‘Category’:’test’}

  • public_access (PublicAccess) – To specify whether data in the file system may be accessed publicly and the level of access.

Keyword Arguments:
Returns:

A dictionary of response headers.

Return type:

dict[str, Union[str, datetime]]

Example:

Creating a file system in the datalake service.
await file_system_client.create_file_system()
async delete_directory(directory: DirectoryProperties | str, **kwargs) DataLakeDirectoryClient[source]

Marks the specified path for deletion.

Parameters:

directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeDirectoryClient after deleting specified directory.

Return type:

DataLakeDirectoryClient

Example:

Delete directory in the file system.
await file_system_client.delete_directory("mydirectory")
async delete_file(file: FileProperties | str, **kwargs) DataLakeFileClient[source]

Marks the specified file for deletion.

Parameters:

file (str or FileProperties) – The file with which to interact. This can either be the name of the file, or an instance of FileProperties.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file has an active lease. Value can be a LeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

DataLakeFileClient after deleting specified file.

Return type:

DataLakeFileClient

Delete file in the file system.
await file_system_client.delete_file("myfile")
async delete_file_system(**kwargs: Any) None[source]

Marks the specified file system for deletion.

The file system and any files contained within it are later deleted during garbage collection. If the file system is not found, a ResourceNotFoundError will be raised.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, delete_file_system only succeeds if the file system’s lease is active and matches this ID. Required if the file system has an active lease.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Return type:

None

Example:

Deleting a file system in the datalake service.
await file_system_client.delete_file_system()
async exists(**kwargs: Any) bool[source]

Returns True if a file system exists and returns False otherwise.

Keyword Arguments:

timeout (int) –

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

True if a file system exists, False otherwise.

Return type:

bool

get_directory_client(directory: DirectoryProperties | str) DataLakeDirectoryClient[source]

Get a client to interact with the specified directory.

The directory need not already exist.

Parameters:

directory (str or DirectoryProperties) – The directory with which to interact. This can either be the name of the directory, or an instance of DirectoryProperties.

Returns:

A DataLakeDirectoryClient.

Return type:

DataLakeDirectoryClient

Example:

Getting the directory client to interact with a specific directory.
# Get the DataLakeDirectoryClient from the FileSystemClient to interact with a specific file
directory_client = file_system_client.get_directory_client("mynewdirectory")
get_file_client(file_path: FileProperties | str) DataLakeFileClient[source]

Get a client to interact with the specified file.

The file need not already exist.

Parameters:

file_path (str or FileProperties) – The file with which to interact. This can either be the path of the file(from root directory), or an instance of FileProperties. eg. directory/subdirectory/file

Returns:

A DataLakeFileClient.

Return type:

DataLakeFileClient

Example:

Getting the file client to interact with a specific file.
# Get the FileClient from the FileSystemClient to interact with a specific file
file_client = file_system_client.get_file_client("mynewfile")
async get_file_system_access_policy(**kwargs: Any) Dict[str, Any][source]

Gets the permissions for the specified file system. The permissions indicate whether file system data may be accessed publicly.

Keyword Arguments:
Returns:

Access policy information in a dict.

Return type:

dict[str, Any]

async get_file_system_properties(**kwargs: Any) FileSystemProperties[source]

Returns all user-defined metadata and system properties for the specified file system. The data returned does not include the file system’s list of paths.

Keyword Arguments:
Returns:

Properties for the specified file system within a file system object.

Return type:

FileSystemProperties

Example:

Getting properties on the file system.
properties = await file_system_client.get_file_system_properties()
get_paths(path: str | None = None, recursive: bool | None = True, max_results: int | None = None, **kwargs: Any) AsyncItemPaged[PathProperties][source]

Returns a generator to list the paths(could be files or directories) under the specified file system. The generator will lazily follow the continuation tokens returned by the service.

Parameters:
  • path (str) – Filters the results to return only paths under the specified path.

  • recursive (Optional[bool]) – Optional. Set True for recursive, False for iterative.

  • max_results (int) – An optional value that specifies the maximum number of items to return per page. If omitted or greater than 5,000, the response will include up to 5,000 items per page.

Keyword Arguments:
  • upn (bool) – If True, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names in the owner, group, and acl fields of PathProperties. If False, the values will be returned as Azure Active Directory Object IDs. The default value is False. Note that group and application Object IDs are not translate because they do not have unique friendly names.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

An iterable (auto-paging) response of PathProperties.

Return type:

AsyncItemPaged[PathProperties]

Example:

List the blobs in the file system.
path_list = file_system_client.get_paths()
async for path in path_list:
    print(path.name + '\n')
list_deleted_paths(**kwargs: Any) AsyncItemPaged[DeletedPathProperties][source]

Returns a generator to list the deleted (file or directory) paths under the specified file system. The generator will lazily follow the continuation tokens returned by the service.

Added in version 12.4.0: This operation was introduced in API version ‘2020-06-12’.

Keyword Arguments:
  • path_prefix (str) – Filters the results to return only paths under the specified path.

  • results_per_page (int) – An optional value that specifies the maximum number of items to return per page. If omitted or greater than 5,000, the response will include up to 5,000 items per page.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

An iterable (auto-paging) response of DeletedPathProperties.

Return type:

AsyncItemPaged[DeletedPathProperties]

async set_file_system_access_policy(signed_identifiers: Dict[str, AccessPolicy], public_access: str | PublicAccess | None = None, **kwargs) Dict[str, str | datetime][source]

Sets the permissions for the specified file system or stored access policies that may be used with Shared Access Signatures. The permissions indicate whether files in a file system may be accessed publicly.

Parameters:
  • signed_identifiers (dict[str, AccessPolicy]) – A dictionary of access policies to associate with the file system. The dictionary may contain up to 5 elements. An empty dictionary will clear the access policies set on the service.

  • public_access (PublicAccess) – To specify whether data in the file system may be accessed publicly and the level of access.

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – Required if the file system has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

  • if_modified_since (datetime) – A datetime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified date/time.

  • if_unmodified_since (datetime) – A datetime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

filesystem-updated property dict (Etag and last modified).

Return type:

dict[str, str or datetime]

async set_file_system_metadata(metadata: Dict[str, str], **kwargs) Dict[str, str | datetime][source]

Sets one or more user-defined name-value pairs for the specified file system. Each call to this operation replaces all existing metadata attached to the file system. To remove all metadata from the file system, call this operation with no metadata dict.

Parameters:

metadata (dict[str, str]) – A dict containing name-value pairs to associate with the file system as metadata. Example: {‘category’:’test’}

Keyword Arguments:
  • lease (DataLakeLeaseClient or str) – If specified, set_file_system_metadata only succeeds if the file system’s lease is active and matches this ID.

  • if_modified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

  • if_unmodified_since (datetime) – A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

  • etag (str) – An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

  • match_condition (MatchConditions) – The match condition to use upon the etag.

  • timeout (int) –

    Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns:

file system-updated property dict (Etag and last modified).

Return type:

dict[str, str] or dict[str, datetime]

Example:

Setting metadata on the container.
# Create key, value pairs for metadata
metadata = {'type': 'test'}

# Set metadata on the file system
await file_system_client.set_file_system_metadata(metadata=metadata)
property api_version

The version of the Storage API used for requests.

Return type:

str

property location_mode

The location mode that the client is currently using.

By default this will be “primary”. Options include “primary” and “secondary”.

Return type:

str

property primary_endpoint

The full primary endpoint URL.

Return type:

str

property primary_hostname

The hostname of the primary endpoint.

Return type:

str

property secondary_endpoint

The full secondary endpoint URL if configured.

If not available a ValueError will be raised. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

str

Raises:

ValueError

property secondary_hostname

The hostname of the secondary endpoint.

If not available this will be None. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Return type:

Optional[str]

property url

The full endpoint URL to this entity, including SAS token if used.

This could be either the primary endpoint, or the secondary endpoint depending on the current location_mode(). :returns: The full endpoint URL to this entity, including SAS token if used. :rtype: str

class azure.storage.filedatalake.aio.LinearRetry(backoff: int = 15, retry_total: int = 3, retry_to_secondary: bool = False, random_jitter_range: int = 3, **kwargs: Any)[source]

Linear retry.

Constructs a Linear retry object.

Parameters:
  • backoff (int) – The backoff interval, in seconds, between retries.

  • max_attempts (int) – The maximum number of retry attempts.

  • retry_to_secondary (bool) – Whether the request should be retried to secondary, if able. This should only be enabled of RA-GRS accounts are used and potentially stale data can be handled.

  • random_jitter_range (int) – A number in seconds which indicates a range to jitter/randomize for the back-off interval. For example, a random_jitter_range of 3 results in the back-off interval x to vary between x+3 and x-3.

configure_retries(request: PipelineRequest) Dict[str, Any]
get_backoff_time(settings: Dict[str, Any]) float[source]

Calculates how long to sleep before retrying.

Parameters:

settings (Dict[str, Any]) – The configurable values pertaining to the backoff time.

Returns:

An integer indicating how long to wait before retrying the request, or None to indicate no retry should be performed.

Return type:

int or None

increment(settings: Dict[str, Any], request: PipelineRequest, response: PipelineResponse | None = None, error: AzureError | None = None) bool

Increment the retry counters.

Parameters:
  • settings (Dict[str, Any]) – The configurable values pertaining to the increment operation.

  • request (PipelineRequest) – A pipeline request object.

  • response (Optional[PipelineResponse]) – A pipeline response object.

  • error (Optional[AzureError]) – An error encountered during the request, or None if the response was received successfully.

Returns:

Whether the retry attempts are exhausted.

Return type:

bool

async send(request)

Abstract send method for a synchronous pipeline. Mutates the request.

Context content is dependent on the HttpTransport.

Parameters:

request (PipelineRequest) – The pipeline request object

Returns:

The pipeline response object.

Return type:

PipelineResponse

async sleep(settings, transport)
connect_retries: int

The max number of connect retries.

initial_backoff: int

The backoff interval, in seconds, between retries.

next: HTTPPolicy[HTTPRequestType, HTTPResponseType]

Pointer to the next policy or a transport (wrapped as a policy). Will be set at pipeline creation.

random_jitter_range: int

A number in seconds which indicates a range to jitter/randomize for the back-off interval.

retry_read: int

The max number of read retries.

retry_status: int

The max number of status retries.

retry_to_secondary: bool

Whether the secondary endpoint should be retried.

total_retries: int

The max number of retries.

class azure.storage.filedatalake.aio.StorageStreamDownloader(downloader)[source]

A streaming object to download from Azure Storage.

Variables:
  • name (str) – The name of the file being downloaded.

  • properties (FileProperties) – The properties of the file being downloaded. If only a range of the data is being downloaded, this will be reflected in the properties.

  • size (int) – The size of the total data in the stream. This will be the byte range if specified, otherwise the total size of the file.

chunks() AsyncIterator[bytes][source]

Iterate over chunks in the download stream.

Returns:

An async iterator over the chunks in the download stream.

Return type:

AsyncIterator[bytes]

async read(size: int | None = -1) bytes[source]

Read up to size bytes from the stream and return them. If size is unspecified or is -1, all bytes will be read.

Parameters:

size (Optional[int]) – The number of bytes to download from the stream. Leave unspecified or set to -1 to download all bytes.

Returns:

The requested data as bytes. If the return value is empty, there is no more data to read.

Return type:

bytes

async readall() bytes[source]

Download the contents of this file.

This operation is blocking until all data is downloaded. :returns: The contents of the file. :rtype: bytes

async readinto(stream: IO[bytes]) int[source]

Download the contents of this file to a stream.

Parameters:

stream (IO[bytes]) – The stream to download to. This can be an open file-handle, or any writable stream. The stream must be seekable if the download uses more than one parallel connection.

Returns:

The number of bytes read.

Return type:

int