astro-projects 0.8.3

Last updated:

0 purchases

astro-projects 0.8.3 Image
astro-projects 0.8.3 Images
Add to Cart

Description:

astroprojects 0.8.3

astro


workflows made easy









astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python.
It helps DAG authors to achieve more with less code.
It is powered by Apache Airflow and maintained by Astronomer.

:warning: Disclaimer This project's development status is alpha. In other words, it is not production-ready yet.
The interfaces may change. We welcome alpha users and brave souls to test it - any feedback is welcome.

Install
Astro is available at PyPI. Use the standard Python
installation tools.
To install a cloud-agnostic version of Astro, run:
pip install astro-projects

If using cloud providers, install using the optional dependencies of interest:
pip install astro-projects[amazon,google,snowflake,postgres]

Quick-start
After installing Astro, copy the following example dag calculate_popular_movies.py to a local directory named dags:
from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.sql.table import Table


@aql.transform()
def top_five_animations(input_table: Table):
return """
SELECT Title, Rating
FROM {{input_table}}
WHERE Genre1=='Animation'
ORDER BY Rating desc
LIMIT 5;
"""


with DAG(
"calculate_popular_movies",
schedule_interval=None,
start_date=datetime(2000, 1, 1),
catchup=False,
) as dag:
imdb_movies = aql.load_file(
path="https://raw.githubusercontent.com/astro-projects/astro/main/tests/data/imdb.csv",
task_id="load_csv",
output_table=Table(
table_name="imdb_movies", database="sqlite", conn_id="sqlite_default"
),
)

top_five_animations(
input_table=imdb_movies,
output_table=Table(
table_name="top_animation", database="sqlite", conn_id="sqlite_default"
),
)

Set up a local instance of Airflow by running:
export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True

airflow db init

Create an SQLite database for the example to run with and run the DAG:
# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`

sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
airflow dags test calculate_popular_movies `date -Iseconds`

Check the top five animations calculated by your first Astro DAG by running:
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"

You should see the following output:
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9

Requirements
Because astro relies on the Task Flow API and
it depends on Apache Airflow >= 2.1.0.
Supported technologies



Databases
File types
File locations




Google BigQuery
CSV
Amazon S3


Postgres
JSON
Filesystem


Snowflake
NDJSON
Google GCS


SQLite
Parquet




Available operations
A summary of the currently available operations in astro. More details are available in the reference guide.

load_file: load a given file into a SQL table
transform: applies a SQL select statement to a source table and saves the result to a destination table
truncate: remove all records from a SQL table
run_raw_sql: run any SQL statement without handling its output
append: insert rows from the source SQL table into the destination SQL table, if there are no conflicts
merge: insert rows from the source SQL table into the destination SQL table, depending on conflicts:

ignore: do not add rows that already exist
update: replace existing rows with new ones


save_file: export SQL table rows into a destination file
dataframe: export given SQL table into in-memory Pandas data-frame
render: given a directory containing SQL statements, dynamically create transform tasks within a DAG

Documentation
The documentation is a work in progress, and we aim to follow the Diátaxis system:

Tutorial: a hands-on introduction to astro
How-to guides: simple step-by-step user guides to accomplish specific tasks
Reference guide: commands, modules, classes and methods
Explanation: Clarification and discussion of key decisions when designing the project.

Changelog
We follow Semantic Versioning for releases. Check the changelog for the latest changes.
Release Managements
To learn more about our release philosophy and steps, check here
Contribution Guidelines
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Read the Contribution Guideline for a detailed overview on how to contribute.
As contributors and maintainers to this project, you should abide by the Contributor Code of Conduct.
License
Apache Licence 2.0

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.