pandas-paddles 1.5.0

Creator: railscoder56

Last updated:

Add to Cart

Description:

pandaspaddles 1.5.0

Pandas Paddles
Access the calling pandas data frame in loc[], iloc[],
assign() and other methods with DF to write better chains of
data frame operations, e.g.:
df = (df
# Select all rows with column "x" < 2
.loc[DF["x"] < 2]
.assign(
# Shift "x" by its minimum.
y = DF["x"] - DF["x"].min(),
# Clip "x" to it's central 50% window. Note how DF is used
# in the argument to `clip()`.
z = DF["x"].clip(
lower=DF["x"].quantile(0.25),
upper=DF["x"].quantile(0.75)
),
)
)






Overview

Motivation: Make chaining Pandas operations easier and bring
functionality to Pandas similar to Spark’s col()
function or referencing columns in R’s dplyr.
Install from PyPI with pip install pandas-paddles. Pandas versions 1+ (>=1,<3) are supported.
Documentation can be found at readthedocs.
Source code can be obtained from GitHub.
Changelog



Example: Create new column and filter
Instead of writing “traditional” Pandas like this:
df_in = pd.DataFrame({"x": range(5)})
df = df_in.assign(y = df_in["x"] // 2)
df = df.loc[df["y"] <= 1]
df
# x y
# 0 0 0
# 1 1 0
# 2 2 1
# 3 3 1
One can write:
from pandas_paddles import DF
df = (df_in
.assign(y = DF["x"] // 2)
.loc[DF["y"] <= 1]
)
This is especially handy when re-iterating on data frame manipulations
interactively, e.g. in a notebook (just imagine you have to rename
df to df_out).
But you can access all methods and attributes of the data frame from the
context:
df = pd.DataFrame({
"X": range(5),
"y": ["1", "a", "c", "D", "e"],
})
df.loc[DF["y"]str.isupper() | DF["y"]str.isnumeric()]
# X y
# 0 0 1
# 3 3 D
df.loc[:, DF.columns.str.isupper()]
# X
# 0 0
# 1 1
# 2 2
# 3 3
# 4 4
You can even use DF in the arguments to methods:
df = pd.DataFrame({
"x": range(5),
"y": range(2, 7),
})
df.assign(z = DF['x'].clip(lower=2.2, upper=DF['y'].median()))
# x y z
# 0 0 2 2.2
# 1 1 3 2.2
# 2 2 4 2.2
# 3 3 5 3.0
# 4 4 6 4.0
When working with ~pd.Series the S object exists. It can be used
similar to DF:
s = pd.Series(range(5))
s[s < 3]
# 0 0
# 1 1
# 2 2
# dtype: int64



Similar projects for pandas

siuba

(+) active
(-) new API to learn


pandas-ply

(-) stale(?), last change 6 years ago
(-) new API to learn
(-) Symbol / pandas_ply.X works only with ply_* functions


pandas-select

(+) no explicite df necessary
(-) new API to learn


pandas-selectable

(+) simple select accessor
(-) usage inside chains clumsy (needs explicite df):
((df
.select.A == 'a')
.select.B == 'b'
)

(-) hard-coded str, dt accessor methods
(?) composable?





Development
Development is containerized with [Docker](https://www.docker.com/) to
separte from host systems and improve reproducability. No other
prerequisites are needed on the host system.
Recommendation for Windows users: install WSL 2 (tested
on Ubuntu 20.04), and for containerized workflows, Docker
Desktop for Windows.
The common tasks are collected in Makefile (See make help for a
complete list):

Run the unit tests: make test or make watch for continuously running
tests on code-changes.
Build the documentation: make docs
TODO: Update the poetry.lock file: make lock
Add a dependency:

Start a shell in a new container.
Add dependency with poetry add in the running container. This will update
poetry.lock automatically:
# 1. On the host system
% make shell
# 2. In the container instance:
I have no name!@7d0e85b3a303:/app$ poetry add --dev --lock falcon



Build the development image make devimage
(Note: This should be done automatically for the targets.)

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.