Pingpong Datahub 0.8.26 | GitLocker.com Product

Description:

pingpongdatahub 0.8.26

Metadata Ingestion

This module hosts an extensible Python-based metadata ingestion system for DataHub.
This supports sending data to DataHub using Kafka or through the REST API.
It can be used through our CLI tool, with an orchestrator like Airflow, or as a library.
Getting Started
Prerequisites
Before running any metadata ingestion job, you should make sure that DataHub backend services are all running. If you are trying this out locally, the easiest way to do that is through quickstart Docker images.
Install from PyPI
The folks over at Acryl Data maintain a PyPI package for DataHub metadata ingestion.
# Requires Python 3.6+
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
datahub version
# If you see "command not found", try running this instead: python3 -m datahub version

If you run into an error, try checking the common setup issues.
Installing Plugins
We use a plugin architecture so that you can install only the dependencies you actually need. Click the plugin name to learn more about the specific source recipe and any FAQs!
Sources:

Plugin Name
Install Command
Provides

file
included by default
File source and sink

athena
pip install 'acryl-datahub[athena]'
AWS Athena source

bigquery
pip install 'acryl-datahub[bigquery]'
BigQuery source

bigquery-usage
pip install 'acryl-datahub[bigquery-usage]'
BigQuery usage statistics source

datahub-business-glossary
no additional dependencies
Business Glossary File source

dbt
no additional dependencies
dbt source

druid
pip install 'acryl-datahub[druid]'
Druid Source

feast
pip install 'acryl-datahub[feast]'
Feast source

glue
pip install 'acryl-datahub[glue]'
AWS Glue source

hive
pip install 'acryl-datahub[hive]'
Hive source

kafka
pip install 'acryl-datahub[kafka]'
Kafka source

kafka-connect
pip install 'acryl-datahub[kafka-connect]'
Kafka connect source

ldap
pip install 'acryl-datahub[ldap]' (extra requirements)
LDAP source

looker
pip install 'acryl-datahub[looker]'
Looker source

lookml
pip install 'acryl-datahub[lookml]'
LookML source, requires Python 3.7+

metabase
pip install 'acryl-datahub[metabase]
Metabase source

mode
pip install 'acryl-datahub[mode]'
Mode Analytics source

mongodb
pip install 'acryl-datahub[mongodb]'
MongoDB source

mssql
pip install 'acryl-datahub[mssql]'
SQL Server source

mysql
pip install 'acryl-datahub[mysql]'
MySQL source

mariadb
pip install 'acryl-datahub[mariadb]'
MariaDB source

openapi
pip install 'acryl-datahub[openapi]'
OpenApi Source

oracle
pip install 'acryl-datahub[oracle]'
Oracle source

postgres
pip install 'acryl-datahub[postgres]'
Postgres source

redash
pip install 'acryl-datahub[redash]'
Redash source

redshift
pip install 'acryl-datahub[redshift]'
Redshift source

sagemaker
pip install 'acryl-datahub[sagemaker]'
AWS SageMaker source

snowflake
pip install 'acryl-datahub[snowflake]'
Snowflake source

snowflake-usage
pip install 'acryl-datahub[snowflake-usage]'
Snowflake usage statistics source

sql-profiles
pip install 'acryl-datahub[sql-profiles]'
Data profiles for SQL-based systems

sqlalchemy
pip install 'acryl-datahub[sqlalchemy]'
Generic SQLAlchemy source

superset
pip install 'acryl-datahub[superset]'
Superset source

tableau
pip install 'acryl-datahub[tableau]'
Tableau source

trino
pip install 'acryl-datahub[trino]
Trino source

starburst-trino-usage
pip install 'acryl-datahub[starburst-trino-usage]'
Starburst Trino usage statistics source

nifi
pip install 'acryl-datahub[nifi]
Nifi source

Sinks

Plugin Name
Install Command
Provides

file
included by default
File source and sink

console
included by default
Console sink

datahub-rest
pip install 'acryl-datahub[datahub-rest]'
DataHub sink over REST API

datahub-kafka
pip install 'acryl-datahub[datahub-kafka]'
DataHub sink over Kafka

These plugins can be mixed and matched as desired. For example:
pip install 'acryl-datahub[bigquery,datahub-rest]'

You can check the active plugins:
datahub check plugins

Basic Usage
pip install 'acryl-datahub[datahub-rest]' # install the required plugin
datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml

The --dry-run option of the ingest command performs all of the ingestion steps, except writing to the sink. This is useful to ensure that the
ingestion recipe is producing the desired workunits before ingesting them into datahub.
# Dry run
datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml --dry-run
# Short-form
datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml -n

The --preview option of the ingest command performs all of the ingestion steps, but limits the processing to only the first 10 workunits produced by the source.
This option helps with quick end-to-end smoke testing of the ingestion recipe.
# Preview
datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml --preview
# Preview with dry-run
datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml -n --preview

Install using Docker

If you don't want to install locally, you can alternatively run metadata ingestion within a Docker container.
We have prebuilt images available on Docker hub. All plugins will be installed and enabled automatically.
Limitation: the datahub_docker.sh convenience script assumes that the recipe and any input/output files are accessible in the current working directory or its subdirectories. Files outside the current working directory will not be found, and you'll need to invoke the Docker image directly.
# Assumes the DataHub repo is cloned locally.
./metadata-ingestion/scripts/datahub_docker.sh ingest -c ./examples/recipes/example_to_datahub_rest.yml

Install from source
If you'd like to install from source, see the developer guide.
Recipes
A recipe is a configuration file that tells our ingestion scripts where to pull data from (source) and where to put it (sink).
Here's a simple example that pulls metadata from MSSQL and puts it into datahub.
# A sample recipe that pulls metadata from MSSQL and puts it into DataHub
# using the Rest API.
source:
type: mssql
config:
username: sa
password: ${MSSQL_PASSWORD}
database: DemoData

transformers:
- type: "fully-qualified-class-name-of-transformer"
config:
some_property: "some.value"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"

Running a recipe is quite easy.
datahub ingest -c ./examples/recipes/mssql_to_datahub.yml

A number of recipes are included in the examples/recipes directory. For full info and context on each source and sink, see the pages described in the table of plugins.
Handling sensitive information in recipes
We automatically expand environment variables in the config (e.g. ${MSSQL_PASSWORD}),
similar to variable substitution in GNU bash or in docker-compose files. For details, see
https://docs.docker.com/compose/compose-file/compose-file-v2/#variable-substitution. This environment variable substitution should be used to mask sensitive information in recipe files. As long as you can get env variables securely to the ingestion process there would not be any need to store sensitive information in recipes.
Transformations
If you'd like to modify data before it reaches the ingestion sinks – for instance, adding additional owners or tags – you can use a transformer to write your own module and integrate it with DataHub.
Check out the transformers guide for more info!
Using as a library
In some cases, you might want to construct Metadata events directly and use programmatic ways to emit that metadata to DataHub. In this case, take a look at the Python emitter and the Java emitter libraries which can be called from your own code.
Programmatic Pipeline
In some cases, you might want to configure and run a pipeline entirely from within your custom python script. Here is an example of how to do it.

programmatic_pipeline.py - a basic mysql to REST programmatic pipeline.

Developing
See the guides on developing, adding a source and using transformers.

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

pingpong-datahub 0.8.26

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit

pingpong-datahub 0.8.26

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit