Last updated:
0 purchases
pandasdiff 1.4.7
Installation
Install pandas_diff with pip
pip install pandas_diff
Usage/Examples
import pandas_diff as pd_diff
import pandas as pd
# Create two example dataframes
df_infinity_war = pd.DataFrame([
{"hero" : "hulk" , "power" : "strength"},
{"hero" : "black_widow" , "power" : "spy"},
{"hero" : "thor" , "hammers" : 0 },
{"hero" : "thor" , "hammers" : 1 } ] )
df_endgame = pd.DataFrame([
{"hero" : "hulk" , "power" : "smart"},
{"hero" : "captain marvel" , "power" : "strength"},
{"hero" : "thor" , "hammers" : 2 } ] )
# Get differences, using the key "hero"
df = pd_diff.get_diffs(df_infinity_war ,df_endgame ,"hero")
df
#operation object_keys object_values object_json attribute_changed old_value new_value
#0 create [hero] captain marvel {'hero': 'captain marvel', 'power': 'strength'... NaN NaN NaN
#1 delete [hero] black_widow {'hero': 'black_widow', 'power': 'spy', 'hamme... NaN NaN NaN
#2 modify [hero] thor {'hero': 'thor', 'power': nan, 'hammers': 2.0} hammers 1 2
#3 modify [hero] hulk {'hero': 'hulk', 'power': 'smart', 'hammers': ... power strength smart
Why pandas diff ? Cases of use
Migrating from batch to an event driven architecture
In my work, we use a lot of data pipelines to get info from external
platforms, (active directory, github, jira). We load the new data
replacing the entire table.
By using pandas_diff we detect how the infraestructure changes between
executions, and stream those change events into a kafka cluster, so
other teams could suscribe to their favourite events. Also, by defining
a pandas_diff step in the master pipeline, every item in our project has
ther life cycle events controlled.
Events log
For every item in a table, by using pandas_diff you will have an event
log to audit of how the resources are being consumed.
Conciliation
To conciliate one datasource against the source of truth. Eg: You have a CMDB controlling with info regarding virtual machines. As there are several methods for creating those VMs, you use pandas_diff to replicate state of the infraestructure against the CMDB.
Features
Filtering of columns
Roadmap
Support for stand alone app
Documentation
Documentation
History
0.7.18 (2021-12-05)
* Add codacy badge
0.7.19 (2021-12-05)
* Feat filter column
0.7.20 (2021-12-05)
* Feat filter column
0.7.21 (2021-12-05)
* Add filter fest
0.7.22 (2021-12-06)
* Add confition keys exist in df’s
1.1.0 (2021-12-06)
* Add confition keys exist in df’s
1.2.0 (2021-12-06)
——————
* Improve doc
1.2.0 (2021-12-06)
* Improve doc
1.3.0 (2021-12-06)
* Remove workflows
1.4.0 (2021-12-06)
* Remove workflows
1.4.0 (2023-09-01)
* Improve doc
1.4.1 (2023-09-01)
* Improve doc
1.4.2 (2023-09-17)
* Bugfix version string
1.4.3 (2023-09-17)
* bugfix version tag
1.4.4 (2023-09-17)
* bugfix version tag
1.4.5 (2023-09-17)
* bugfixx history string
1.4.6 (2023-09-17)
* bugfix history string
1.4.7 (2023-09-17)
* bugfix release description
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.