pandas-diff 1.4.7

Last updated:

0 purchases

pandas-diff 1.4.7 Image
pandas-diff 1.4.7 Images
Add to Cart

Description:

pandasdiff 1.4.7

Installation
Install pandas_diff with pip
pip install pandas_diff


Usage/Examples
import pandas_diff as pd_diff

import pandas as pd

# Create two example dataframes
df_infinity_war = pd.DataFrame([
{"hero" : "hulk" , "power" : "strength"},
{"hero" : "black_widow" , "power" : "spy"},
{"hero" : "thor" , "hammers" : 0 },
{"hero" : "thor" , "hammers" : 1 } ] )
df_endgame = pd.DataFrame([
{"hero" : "hulk" , "power" : "smart"},
{"hero" : "captain marvel" , "power" : "strength"},
{"hero" : "thor" , "hammers" : 2 } ] )

# Get differences, using the key "hero"
df = pd_diff.get_diffs(df_infinity_war ,df_endgame ,"hero")

df

#operation object_keys object_values object_json attribute_changed old_value new_value
#0 create [hero] captain marvel {'hero': 'captain marvel', 'power': 'strength'... NaN NaN NaN
#1 delete [hero] black_widow {'hero': 'black_widow', 'power': 'spy', 'hamme... NaN NaN NaN
#2 modify [hero] thor {'hero': 'thor', 'power': nan, 'hammers': 2.0} hammers 1 2
#3 modify [hero] hulk {'hero': 'hulk', 'power': 'smart', 'hammers': ... power strength smart


Why pandas diff ? Cases of use

Migrating from batch to an event driven architecture
In my work, we use a lot of data pipelines to get info from external
platforms, (active directory, github, jira). We load the new data
replacing the entire table.
By using pandas_diff we detect how the infraestructure changes between
executions, and stream those change events into a kafka cluster, so
other teams could suscribe to their favourite events. Also, by defining
a pandas_diff step in the master pipeline, every item in our project has
ther life cycle events controlled.


Events log
For every item in a table, by using pandas_diff you will have an event
log to audit of how the resources are being consumed.


Conciliation
To conciliate one datasource against the source of truth. Eg: You have a CMDB controlling with info regarding virtual machines. As there are several methods for creating those VMs, you use pandas_diff to replicate state of the infraestructure against the CMDB.



Features

Filtering of columns



Roadmap

Support for stand alone app



Documentation
Documentation


History


0.7.18 (2021-12-05)
* Add codacy badge


0.7.19 (2021-12-05)
* Feat filter column


0.7.20 (2021-12-05)
* Feat filter column


0.7.21 (2021-12-05)
* Add filter fest


0.7.22 (2021-12-06)
* Add confition keys exist in df’s


1.1.0 (2021-12-06)
* Add confition keys exist in df’s
1.2.0 (2021-12-06)
——————
* Improve doc


1.2.0 (2021-12-06)
* Improve doc


1.3.0 (2021-12-06)
* Remove workflows


1.4.0 (2021-12-06)
* Remove workflows


1.4.0 (2023-09-01)
* Improve doc


1.4.1 (2023-09-01)
* Improve doc


1.4.2 (2023-09-17)
* Bugfix version string


1.4.3 (2023-09-17)
* bugfix version tag


1.4.4 (2023-09-17)
* bugfix version tag


1.4.5 (2023-09-17)
* bugfixx history string


1.4.6 (2023-09-17)
* bugfix history string


1.4.7 (2023-09-17)
* bugfix release description

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.