datacatalog-custom-entries-manager 0.1.2

Creator: coderz1093

Last updated:

Add to Cart

Description:

datacatalogcustomentriesmanager 0.1.2

datacatalog-custom-entries-manager
A Python package intended to manage Google Cloud Data Catalog custom entries, loading metadata
from external sources. Currently supports the CSV and JSON file formats.
It is built on top of GoogleCloudPlatform/datacatalog-connectors and, differently from the
existing connectors, allows ingesting metadata with no need to connect to other systems than Data
Catalog. Known use cases include validating Custom Entries ingestion workloads before coding their
specific features and loading metadata into development / PoC environments.
In case you need not only Entries but also Tags to validate your model/workload, consider giving
datacatalog-custom-model-manager a try.


Table of Contents


1. Environment setup

1.1. Python + virtualenv

1.1.1. Install Python 3.6+
1.1.2. Create a folder
1.1.3. Create and activate an isolated Python environment
1.1.4. Install the package


1.2. Docker

1.2.1. Get the source code


1.3. Auth credentials

1.3.1. Create a service account and grant it below roles
1.3.2. Download a JSON key and save it as
1.3.3. Set the environment variables




2. Manage Custom Entries

2.1. Synchronize

2.1.1. To a CSV file
2.1.2. To a JSON file







1. Environment setup
1.1. Python + virtualenv
Using virtualenv is optional, but strongly recommended unless you use Docker.
1.1.1. Install Python 3.6+
1.1.2. Create a folder
This is recommended so all related stuff will reside at the same place, making it easier to follow
below instructions.
mkdir ./datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager

All paths starting with ./ in the next steps are relative to the
datacatalog-custom-entries-manager folder.
1.1.3. Create and activate an isolated Python environment
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package
pip install --upgrade datacatalog-custom-entries-manager

1.2. Docker
Docker may be used as an alternative to run datacatalog-custom-entries-manager. In this case,
please disregard the above virtualenv setup instructions.
1.2.1. Get the source code
git clone https://github.com/ricardolsmendes/datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager

1.3. Auth credentials
1.3.1. Create a service account and grant it below roles

DataCatalog entryGroup Owner
DataCatalog entry Owner
Data Catalog Viewer

1.3.2. Download a JSON key and save it as

./credentials/datacatalog-custom-entries-manager.json

1.3.3. Set the environment variables
This step can be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-custom-entries-manager.json

2. Manage Custom Entries
2.1. Synchronize
2.1.1. To a CSV file

SCHEMA

The metadata schema to synchronize Custom Entries is presented below. Use as many lines as needed
to describe all Data Catalog Entries you need.



Column
Description
Mandatory




user_specified_system
Indicates the Entry source system
yes


group_id
Id of the Entry Group the Entry belongs to
yes


linked_resource
The resource a metadata Entry refers to
yes


display_name
Display information such as title and description; a short name to identify the Entry (the entry_id field will be generated as a normalized version of the display name)
yes


description
Can consist of several sentences that describe the Entry contents
no


user_specified_type
A custom value indicating the Entry type
yes


created_at
The creation time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ)
no


updated_at
The last-modified time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ)
no




SAMPLE INPUT


sample-input/csv for reference;
Data Catalog Sample Custom Entries (Google Sheets) might help to create/export a CSV file.


COMMANDS

Python + virtualenv
datacatalog-custom-entries sync \
--csv-file <CSV-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

Docker
docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
datacatalog-custom-entries-manager sync \
--csv-file /data/<CSV-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

2.1.2. To a JSON file

STRUCTURE

The metadata structure to synchronize Custom Entries is presented below. Use as many objects as
needed to describe all Data Catalog Entries you need.
{
"userSpecifiedSystems": [
{
"name": "STRING",
"entryGroups": [
{
"id": "STRING",
"entries": [
{
"linkedResource": "STRING",
"displayName": "STRING",
"description": "STRING (optional)",
"type": "STRING",
"createdAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)",
"updatedAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)"
}
]
}
]
}
]
}


SAMPLE INPUT


sample-input/json for reference;


COMMANDS

Python + virtualenv
datacatalog-custom-entries sync \
--json-file <JSON-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

Docker
docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
datacatalog-custom-entries-manager sync \
--json-file <JSON-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.