pandas-save-profiler 0.0.0

Creator: bigcodingguy24

Last updated:

Add to Cart

Description:

pandassaveprofiler 0.0.0

Pandas Save Profiler
pandas_save_profiler helps you evaluating and comparing the performance of different pandas read and write methods.
Install
pip install pandas-save-profiler

Usage
Load pandas and a dataframe you want to save.
import pandas as pd
data = pd.util.testing.makeMissingDataframe()

Load pandas_save_profiler and use it to evaluate pandas performance saving a pickle file:
import pandas_save_profiler
data.save_profiler('to_pickle')

The output is a pandas series:
format pickle
writer to_pickle
reader read_pickle
writer_args {'path': '/tmp/tmppk7nkivk'}
reader_args {'filepath_or_buffer': '/tmp/tmppk7nkivk'}
writer_time 0.0798338
reader_time 0.0294895
writer_memory 1.09087e+08
reader_memory 1.09118e+08
df_memory 288
file_size 1122
writer_memory_h 109.1 MB
reader_memory_h 109.1 MB
df_memory_h 288 Bytes
file_size_h 1.1 kB
repeats 5
reads_the_same True
dtype: object

Values in the series indicate:

The format used to persist the dataframe and the writing and reading options.
Writing and reading times in seconds.
Writing and reading memory increment.
Size of the dataframe in memory.
Size of the saved file.

Memory values are in bytes but a "humanized" version is also reported.
The saving and reloading process is repeated 5 times and average values are returned.
The flag reads_the_same indicates whether the reloaded file is exactly the same as the original one or has some differences.
To compare several writing options you can use the save_profiler function on each of them
and combine the results into a results dataframe:
pd.DataFrame([
data.save_profiler('to_csv'),
data.save_profiler('to_pickle'),
data.save_profiler('to_parquet'),
])


returns:
format writer reader writer_args \
0 csv to_csv read_csv {'path_or_buf': '/tmp/tmpsedehjob'}
1 pickle to_pickle read_pickle {'path': '/tmp/tmp_vhue2q7'}
2 parquet to_parquet read_parquet {'path': '/tmp/tmp0zn8qsnk'}

reader_args writer_time reader_time \
0 {'filepath_or_buffer': '/tmp/tmpsedehjob'} 0.031842 0.039830
1 {'filepath_or_buffer': '/tmp/tmp_vhue2q7'} 0.025705 0.028469
2 {'path': '/tmp/tmp0zn8qsnk'} 0.039009 0.052447

writer_memory reader_memory df_memory file_size writer_memory_h \
0 110149632.0 110599372.8 288 139 110.1 MB
1 110813184.0 110813184.0 288 1122 110.8 MB
2 116892467.2 118014771.2 288 3449 116.9 MB

reader_memory_h df_memory_h file_size_h repeats reads_the_same
0 110.6 MB 288 Bytes 139 Bytes 5 False
1 110.8 MB 288 Bytes 1.1 kB 5 True
2 118.0 MB 288 Bytes 3.4 kB 5 True

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.