azure-ai-translation-document 1.0.0

Last updated:

0 purchases

azure-ai-translation-document 1.0.0 Image
azure-ai-translation-document 1.0.0 Images
Add to Cart

Description:

azureaitranslationdocument 1.0.0

Azure Document Translation client library for Python
Azure Cognitive Services Document Translation is a cloud service that can be used to translate multiple and complex documents across languages and dialects while preserving original document structure and data format.
Use the client library for Document Translation to:

Translate numerous, large files from an Azure Blob Storage container to a target container in your language of choice.
Check the translation status and progress of each document in the translation operation.
Apply a custom translation model or glossaries to tailor translation to your specific case.

Source code | Package (PyPI) | API reference documentation | Product documentation | Samples
Disclaimer
Azure SDK Python packages support for Python 2.7 has ended 01 January 2022. For more information and questions, please refer to https://github.com/Azure/azure-sdk-for-python/issues/20691
Getting started
Prerequisites

Python 3.6 or later is required to use this package.
You must have an Azure subscription and a
Translator resource to use this package.

Install the package
Install the Azure Document Translation client library for Python with pip:
pip install azure-ai-translation-document


Note: This version of the client library defaults to the v1.0 version of the service

Create a Translator resource
The Document Translation feature supports single-service access only.
To access the service, create a Translator resource.
You can create the resource using
Option 1: Azure Portal
Option 2: Azure CLI.
Below is an example of how you can create a Translator resource using the CLI:
# Create a new resource group to hold the Translator resource -
# if using an existing resource group, skip this step
az group create --name my-resource-group --location westus2

# Create document translation
az cognitiveservices account create \
--name document-translation-resource \
--custom-domain document-translation-resource \
--resource-group my-resource-group \
--kind TextTranslation \
--sku S1 \
--location westus2 \
--yes

Authenticate the client
In order to interact with the Document Translation feature service, you will need to create an instance of a client.
An endpoint and credential are necessary to instantiate the client object.
Looking up the endpoint
You can find the endpoint for your Translator resource using the
Azure Portal.

Note that the service requires a custom domain endpoint. Follow the instructions in the above link to format your endpoint:
https://{NAME-OF-YOUR-RESOURCE}.cognitiveservices.azure.com/

Get the API key
The API key can be found in the Azure Portal or by running the following Azure CLI command:
az cognitiveservices account keys list --name "resource-name" --resource-group "resource-group-name"
Create the client with AzureKeyCredential
To use an API key as the credential parameter,
pass the key as a string into an instance of AzureKeyCredential.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient

endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_translation_client = DocumentTranslationClient(endpoint, credential)

Create the client with an Azure Active Directory credential
AzureKeyCredential authentication is used in the examples in this getting started guide, but you can also
authenticate with Azure Active Directory using the azure-identity library.
To use the DefaultAzureCredential type shown below, or other credential types provided
with the Azure SDK, please install the azure-identity package:
pip install azure-identity
You will also need to register a new AAD application and grant access to your
Translator resource by assigning the "Cognitive Services User" role to your service principal.
Once completed, set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables:
AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET.
from azure.identity import DefaultAzureCredential
from azure.ai.translation.document import DocumentTranslationClient
credential = DefaultAzureCredential()

document_translation_client = DocumentTranslationClient(
endpoint="https://<resource-name>.cognitiveservices.azure.com/",
credential=credential
)

Key concepts
The Document Translation service requires that you upload your files to an Azure Blob Storage source container and provide
a target container where the translated documents can be written. Additional information about setting this up can be found in
the service documentation:

Set up Azure Blob Storage containers with your documents
Optionally apply glossaries or a custom model for translation
Allow access to your storage account with either of the following options:

Generate SAS tokens to your containers (or files) with the appropriate permissions
Create and use a managed identity to grant access to your storage account



DocumentTranslationClient
Interaction with the Document Translation client library begins with an instance of the DocumentTranslationClient.
The client provides operations for:

Creating a translation operation to translate documents in your source container(s) and write results to you target container(s).
Checking the status of individual documents in the translation operation and monitoring each document's progress.
Enumerating all past and current translation operations.
Identifying supported glossary and document formats.

Translation Input
Input to the begin_translation client method can be provided in two different ways:

A single source container with documents can be translated to a different language:

from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient

document_translation_client = DocumentTranslationClient("<endpoint>", AzureKeyCredential("<api_key>"))
poller = document_translation_client.begin_translation("<sas_url_to_source>", "<sas_url_to_target>", "<target_language>")


Or multiple different sources can be provided each with their own targets.

from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget

my_input = [
DocumentTranslationInput(
source_url="<sas_url_to_source_A>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language="de")
]
),
DocumentTranslationInput(
source_url="<sas_url_to_source_B>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language="de")
]
),
DocumentTranslationInput(
source_url="<sas_url_to_source_C>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language="de")
]
)
]

document_translation_client = DocumentTranslationClient("<endpoint>", AzureKeyCredential("<api_key>"))
poller = document_translation_client.begin_translation(my_input)


Note: the target_url for each target language must be unique.

To translate documents under a folder, or only translate certain documents, see sample_begin_translation_with_filters.py.
See the service documentation for all supported languages.
Long-Running Operations
Long-running operations are operations which consist of an initial request sent to the service to start an operation,
followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has
succeeded, to get the result.
Methods that translate documents are modeled as long-running operations.
The client exposes a begin_<method-name> method that returns a DocumentTranslationLROPoller or AsyncDocumentTranslationLROPoller. Callers should wait
for the operation to complete by calling result() on the poller object returned from the begin_<method-name> method.
Sample code snippets are provided to illustrate using long-running operations below.
Examples
The following section provides several code snippets covering some of the most common Document Translation tasks, including:

Translate your documents
Translate multiple inputs
List translation operations

Translate your documents
Translate all the documents in your source container to the target container. To translate documents under a folder, or only translate certain documents, see sample_begin_translation_with_filters.py.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient

endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
source_container_sas_url_en = "<sas-url-en>"
target_container_sas_url_es = "<sas-url-es>"

document_translation_client = DocumentTranslationClient(endpoint, credential)

poller = document_translation_client.begin_translation(source_container_sas_url_en, target_container_sas_url_es, "es")

result = poller.result()

print(f"Status: {poller.status()}")
print(f"Created on: {poller.details.created_on}")
print(f"Last updated on: {poller.details.last_updated_on}")
print(f"Total number of translations on documents: {poller.details.documents_total_count}")

print("\nOf total documents...")
print(f"{poller.details.documents_failed_count} failed")
print(f"{poller.details.documents_succeeded_count} succeeded")

for document in result:
print(f"Document ID: {document.id}")
print(f"Document status: {document.status}")
if document.status == "Succeeded":
print(f"Source document location: {document.source_document_url}")
print(f"Translated document location: {document.translated_document_url}")
print(f"Translated to language: {document.translated_to}\n")
else:
print(f"Error Code: {document.error.code}, Message: {document.error.message}\n")

Translate multiple inputs
Begin translating with documents in multiple source containers to multiple target containers in different languages.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget

endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
source_container_sas_url_de = "<sas-url-de>"
source_container_sas_url_en = "<sas-url-en>"
target_container_sas_url_es = "<sas-url-es>"
target_container_sas_url_fr = "<sas-url-fr>"
target_container_sas_url_ar = "<sas-url-ar>"

document_translation_client = DocumentTranslationClient(endpoint, credential)

poller = document_translation_client.begin_translation(
[
DocumentTranslationInput(
source_url=source_container_sas_url_en,
targets=[
TranslationTarget(target_url=target_container_sas_url_es, language="es"),
TranslationTarget(target_url=target_container_sas_url_fr, language="fr"),
],
),
DocumentTranslationInput(
source_url=source_container_sas_url_de,
targets=[
TranslationTarget(target_url=target_container_sas_url_ar, language="ar"),
],
)
]
)

result = poller.result()

for document in result:
print(f"Document ID: {document.id}")
print(f"Document status: {document.status}")
if document.status == "Succeeded":
print(f"Source document location: {document.source_document_url}")
print(f"Translated document location: {document.translated_document_url}")
print(f"Translated to language: {document.translated_to}\n")
else:
print(f"Error Code: {document.error.code}, Message: {document.error.message}\n")

List translation operations
Enumerate over the translation operations submitted for the resource.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient

endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")

document_translation_client = DocumentTranslationClient(endpoint, credential)

operations = document_translation_client.list_translation_statuses() # type: ItemPaged[TranslationStatus]

for operation in operations:
print(f"\nID: {operation.id}")
print(f"Status: {operation.status}")
print(f"Created on: {operation.created_on}")
print(f"Last updated on: {operation.last_updated_on}")
print(f"Total number of translations on documents: {operation.documents_total_count}")
print(f"Total number of characters charged: {operation.total_characters_charged}")

print("Of total documents...")
print(f"{operation.documents_failed_count} failed")
print(f"{operation.documents_succeeded_count} succeeded")
print(f"{operation.documents_canceled_count} canceled")

To see how to use the Document Translation client library with Azure Storage Blob to upload documents, create SAS tokens
for your containers, and download the finished translated documents, see this sample.
Note that you will need to install the azure-storage-blob library to run this sample.
Advanced Topics
The following section provides some insights for some advanced translation features such as glossaries and custom translation models.
Glossaries
Glossaries are domain-specific dictionaries. For example, if you want to translate some medical-related documents, you may need support for the many words, terminology, and idioms in the medical field which you can't find in the standard translation dictionary, or you simply need specific translation. This is why Document Translation provides support for glossaries.
How To Create Glossary File
Document Translation supports glossaries in the following formats:



File Type
Extension
Description
Samples




Tab-Separated Values/TAB
.tsv, .tab
Read more on wikipedia
glossary_sample.tsv


Comma-Separated Values
.csv
Read more on wikipedia
glossary_sample.csv


Localization Interchange File Format
.xlf, .xliff
Read more on wikipedia
glossary_sample.xlf



View all supported formats here.
How Use Glossaries in Document Translation
In order to use glossaries with Document Translation, you first need to upload your glossary file to a blob container, and then provide the SAS URL to the file as in the code samples sample_translation_with_glossaries.py.
Custom Translation Models
Instead of using Document Translation's engine for translation, you can use your own custom Azure machine/deep learning model.
How To Create a Custom Translation Model
For more info on how to create, provision, and deploy your own custom Azure translation model, please follow the instructions here: Build, deploy, and use a custom model for translation
How To Use a Custom Translation Model With Document Translation
In order to use a custom translation model with Document Translation, you first
need to create and deploy your model, then follow the code sample sample_translation_with_custom_model.py to use with Document Translation.
Troubleshooting
General
Document Translation client library will raise exceptions defined in Azure Core.
Logging
This library uses the standard
logging library for logging.
Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO level.
Detailed DEBUG level logging, including request/response bodies and unredacted
headers, can be enabled on the client or per-operation with the logging_enable keyword argument.
See full SDK logging documentation with examples here.
Optional Configuration
Optional keyword arguments can be passed in at the client and per-operation level.
The azure-core reference documentation
describes available configurations for retries, logging, transport protocols, and more.
Next steps
The following section provides several code snippets illustrating common patterns used in the Document Translation Python client library.
More samples can be found under the samples directory.
More sample code
These code samples show common scenario operations with the Azure Document Translation client library.

Client authentication: sample_authentication.py
Begin translating documents: sample_begin_translation.py
Translate with multiple inputs: sample_translate_multiple_inputs.py
Check the status of documents: sample_check_document_statuses.py
List all submitted translation operations: sample_list_translations.py
Apply a custom glossary to translation: sample_translation_with_glossaries.py
Use Azure Blob Storage to set up translation resources: sample_translation_with_azure_blob.py

Async samples
This library also includes a complete set of async APIs. To use them, you must
first install an async transport, such as aiohttp. Async clients
are found under the azure.ai.translation.document.aio namespace.

Client authentication: sample_authentication_async.py
Begin translating documents: sample_begin_translation_async.py
Translate with multiple inputs: sample_translate_multiple_inputs_async.py
Check the status of documents: sample_check_document_statuses_async.py
List all submitted translation operations: sample_list_translations_async.py
Apply a custom glossary to translation: sample_translation_with_glossaries_async.py
Use Azure Blob Storage to set up translation resources: sample_translation_with_azure_blob_async.py

Additional documentation
For more extensive documentation on Azure Cognitive Services Document Translation, see the Document Translation documentation on docs.microsoft.com.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Release History
1.0.0 (2022-06-07)
Breaking Changes

Changed: begin_translation parameter target_language_code has been renamed to target_language.
Changed: begin_translation keyword-only argument source_language_code has been renamed to source_language.
Changed: DocumentTranslationInput keyword-only argument and property source_language_code has been renamed to source_language.
Changed: TranslationTarget keyword-only argument and property language_code has been renamed to language.
Changed: TranslationStatus property documents_not_yet_started_count has been renamed to documents_not_started_count.
Removed: results_per_page keyword-only argument from list_translation_statuses and list_document_statuses.

1.0.0b6 (2022-02-08)
Other Changes

Python 2.7 is no longer supported. Please use Python version 3.6 or later.

1.0.0b5 (2021-09-08)
Breaking Changes

Changed: list_all_translation_statuses has been renamed to list_translation_statuses
Changed: list_all_document_statuses has been renamed to list_document_statuses
Changed: TranslationStatus property documents_cancelled_count has been renamed to documents_canceled_count
Changed: FileFormat has been renamed to DocumentTranslationFileFormat
Changed: Operation statuses Cancelled and Cancelling have been renamed to Canceled and Canceling, respectively.

Bugs Fixed

The operation id under details of the poller object now populates correctly.

1.0.0b4 (2021-08-10)
Features Added

The single translation input version of begin_translation(source, target, target_language_code) now accepts keyword arguments
storage_type, glossaries, category_id, prefix, suffix, and source_language_code.

Breaking Changes

Changed: renamed kwargs translated_before and translated_after to created_before and created_after, respectively,
for list_all_document_statuses.
Changed: renamed order_by sorting query option createdDateTimeUtc to created_on for list_all_translation_statuses and
list_all_document_statuses.

1.0.0b3 (2021-07-07)
Breaking changes

TranslationStatusResult was renamed to TranslationStatus.
DocumentStatusResult was renamed to DocumentStatus.
get_document_formats was renamed to get_supported_document_formats.
get_glossary_formats was renamed to get_supported_glossary_formats.

1.0.0b2 (2021-06-08)
This version of the SDK defaults to the latest supported service version, which currently is v1.0
Breaking changes

create_translation_job was removed and replaced with begin_translation which follows a long-running operation (LRO)
approach. The client method now returns a DocumentTranslationLROPoller (or AsyncDocumentTranslationLROPoller) to begin the
long-running operation. A call to .result() can be made on the poller object to wait until the translation is complete.
See the README for more information about LROs.
Upon completion of the LRO, begin_translation now returns a pageable of DocumentStatusResult. All job-level metadata can still
be found on poller.details.
has_completed has been removed from JobStatusResult and DocumentStatusResult. Use poller.done() to check if the
translation has completed.
Client method wait_until_done has been removed. Use poller.result() to wait for the LRO to complete.
Client method list_submitted_jobs has been renamed to list_all_translation_statuses.
Client method get_job_status has been renamed to get_translation_status.
Client method cancel_job has been renamed to cancel_translation.
Parameter job_id was renamed to translation_id for get_translation_status, cancel_translation, list_all_document_statuses, and get_document_status.
JobStatusResult has been renamed to TranslationStatusResult.
DocumentStatusResult property translate_to has been renamed to translated_to

New features

Authentication using azure-identity credentials now supported.

see the Azure Identity documentation for more information.


Added paging and filtering options to list_all_document_statuses and list_submitted_jobs.
The input to begin_translation now accepts either the parameter inputs as a List[DocumentTranslationInput] to
perform multiple translations, or the parameters source_url, target_url, and target_language_code to perform a
single translation of your documents.

Dependency updates

Package requires azure-core version 1.14.0 or greater.

1.0.0b1 (2021-04-06)
This is the first beta package of the azure-ai-translation-document client library that targets the Document Translation
service version 1.0-preview.1. This package's documentation and samples demonstrate the new API.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.