Last updated:
0 purchases
pandablob 0.0.5
PandaBlob
Functions to easily transform Azure blobs into pandas DataFrames and vice versa.
Installation
Installing PandaBlob via pip is the preferred method, as it will always install the most recent stable release. If you do not have
pip installed, this Python installation guide can guide you through the process.
To install PandaBlob, run this command in your terminal:
# Use pip to install PandaBlob
pip install pandablob
Downloading and installing PandaBlob from source is also possible, follow the code below.
# Download the package
git clone https://github.com/uijl/pandablob
# Go to the correct folder
cd pandablob
# Install package
pip install -e .
Usage
The code snip below shows how you can use PandaBlob, all you need is a BlobClient and possibly a pandas DataFrame or some keyword arguments for pandas.
# Import the Azure SDK and pandablob
import pandablob
from azure.storage.blob import ContainerClient
# Your Azure Credentials
account_url = "https://my_account_url.blob.core.windows.net/"
token = "your_key_string"
container = "your_container"
blobname = "your_blob_name.csv"
container_client = ContainerClient(account_url, container, credential=token)
blob_client = container_client.get_blob_client(blob=blobname)
# Specifiy your pandas keyword arguments
pandas_kwargs = {"index_col": 0}
# Read the blob as a pandas DataFrame
df = pandablob.blob_to_df(blob_client, pandas_kwargs)
Potential errors
There are three common errors that can be returned. Two are related to the blob storage and one because of the current limitations of pandablob.
ResourceExistsError - If the specified blob is already on the blob, this error is returned. There are two options, you can add the overwrite=True argument to your df_to_blob function or you can catch the exception. If you wish to enter it in an except statement, you can import it using from azure.core.exceptions import ResourceExistsError;
ResourceNotFoundError - If the specified blob is not found, this error is returned. If you wish to enter it in an except statement, you can import it using from azure.core.exceptions import ResourceNotFoundError;
TypeError - This error is returned by pandablob if you want to upload or download an extensiontype that is not yet supported. Currently only the following extensions are supported: .csv .json .txt, .xls and .xlsx.
To do list:
Some other stuff that needs to be done:
Include other files;
Easier downloading a .csv file;
Added MIT license.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.