Last updated:
0 purchases
pandasstreaming 0.5.0
pandas-streaming
aims at processing big files with pandas,
too big to hold in memory, too small to be parallelized with a significant gain.
The module replicates a subset of pandas API
and implements other functionalities for machine learning.
from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")
for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
The module can also stream an existing dataframe.
import pandas
df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
dict(cf=1, cint=1, cstr="1"),
dict(cf=3, cint=3, cstr="3")])
from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_df(df)
for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
It contains other helpers to split datasets into
train and test with some weird constraints.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.