pandas-streaming 0.5.0

Last updated:

0 purchases

pandas-streaming 0.5.0 Image
pandas-streaming 0.5.0 Images
Add to Cart

Description:

pandasstreaming 0.5.0

pandas-streaming
aims at processing big files with pandas,
too big to hold in memory, too small to be parallelized with a significant gain.
The module replicates a subset of pandas API
and implements other functionalities for machine learning.
from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")

for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
The module can also stream an existing dataframe.
import pandas
df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
dict(cf=1, cint=1, cstr="1"),
dict(cf=3, cint=3, cstr="3")])

from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_df(df)

for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
It contains other helpers to split datasets into
train and test with some weird constraints.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.