airflow-provider-skypilot 0.1.3

Creator: bradpython12

Last updated:

0 purchases

TODO
Add to Cart

Description:

airflowproviderskypilot 0.1.3

Apache Airflow Provider for SkyPilot


A provider you can utilize multiple clouds on Apache Airflow through SkyPilot.


Installation
The SkyPilot provider for Apache Airflow was developed and tested on an environment with the following dependencies installed:

Apache Airflow >= 2.6.0
SkyPilot >= 0.4.1

The installation of the SkyPilot provider may start from the Airflow environment configured with Docker instructed in "Running Airflow in Docker".
Base on the docker configuration, add a pip install command in the Dockerfile and build your own Docker image.
RUN pip install --user airflow-provider-skypilot

Then, make sure that SkyPilot is properly installed and initialized on the same environment. The initialization includes cloud account setup and access verification.
Please refer to SkyPilot Installation for more information.
Configuration
A SkyPilot provider process runs on an Airflow worker, but it stores its metadata into the Airflow master node.
This scheme allows a set of consecutive sky tasks runs across multiple workers by sharing the metadata.
Following settings in the docker-compose.yaml defines the data sharing, including cloud credentials, metadata and workspace.
x-airflow-common:
environment:
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
# mount cloud credentials
- ${HOME}/.aws:/opt/airflow/sky_home_dir/.aws
- ${HOME}/.azure:/opt/airflow/sky_home_dir/.azure
- ${HOME}/.config/gcloud:/opt/airflow/sky_home_dir/.config/gcloud
- ${HOME}/.scp:/opt/airflow/sky_home_dir/.scp
# mount sky metadata
- ${HOME}/.sky:/opt/airflow/sky_home_dir/.sky
- ${HOME}/.ssh:/opt/airflow/sky_home_dir/.ssh
# mount sky working dir
- ${HOME}/sky_workdir:/opt/airflow/sky_home_dir/sky_workdir

This example mounts the cloud credentials for AWS, Azure, GCP, and SCP,
which have been made by SkyPilot could account setup.
For SkyPilot metadata, check .sky/ and .ssh/ are placed in your ${HOME} directory and mount them.
Additionally, you can mount your own directory like sky_workdir/ for user resources including user codes and yaml task definition files for Skypilot execution.

Note that all Sky directories are mounted under sky_home_dir/.
They will be symbolic-linked to ${HOME}/ in workers where a SkyPilot provider process actually runs.

Usage
The SkyPilot provider includes the following operators:

SkyLaunchOperator
SkyExecOperator
SkyDownOperator
SkySSHOperator
SkyRsyncUpOperator
SkyRsyncDownOperator

SkyLaunchOperator creates an cloud cluster and executes a Sky task, as shown below:
sky_launch_task = SkyLaunchOperator(
task_id="sky_launch_task",
sky_task_yaml="~/sky_workdir/my_task.yaml",
cloud="cheapest", # aws|azure|gcp|scp|ibm ...
gpus="A100:1",
minimum_cpus=16,
minimum_memory=32,
auto_down=False,
sky_home_dir='/opt/airflow/sky_home_dir', #set by default
dag=dag
)

Once SkyLaunchOperator creates a Sky cluster with auto_down=False, the created cluster can be utilized by the other Sky operators.
Please refer to an example dag for multiple Sky operators running on a single Sky cluster.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

Customer Reviews

There are no reviews.