pd-helper 1.0.0

Last updated:

0 purchases

pd-helper 1.0.0 Image
pd-helper 1.0.0 Images
Add to Cart

Description:

pdhelper 1.0.0

pd-helper
A helpful package to streamline Pandas DataFrame optimization.
Save 50-75% on DataFrame memory usage by running the optimizer.
Autoconfigure dtypes for appropriate data types in each column with helper.
Generate a random DataFrame of controlled random variables for testing with maker.
Install
pip install pd-helper

Basic Usage to Iterate over DataFrame
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df)

Better Usage With Multiprocessing
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df, enable_mp=True)

Specify Special Mappings
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
special_mappings = {'string': ['object_id'],
'category': ['item_name']}

# special mappings will be applied instead of by optimize ruleset, they will be returned.
df = optimize(df
, enable_mp=True,
special_mappings=special_mappings
)

Sample Results with Helper
Starting with 175.63 MB memory.

After optmization.

Ending with 65.33 MB memory.

Generating a Randomly Imperfect DataFrame with Maker
Maker provides a class, MakeData(), to generate a table of made-up records.
Each row is an event where an item was retrieved.
Options to make the table imperfectly random in various ways.
Sample table below:




Retrieved Date
Item Name
Retrieved
Condition
Sector




Example
2019-01-01, 2019-03-4
Toaster, Lighter
True, False
Junk, Excellent
1, 2


Data Type
String
String
String
String
Integer



References


Pandas Categorical: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html


Pandas Pickle: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html


Pandas CSV: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html


Pandas Datetime: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html


TODO


Improve efficiency of iterating on DataFrame.


Allow user to toggle logging.


Provide tools for imputing missing data.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.