seamster 0.0.1

Creator: bradpython12

Last updated: September 26, 2024

0 purchases

Free

Donate

Languages

Python

Description:

seamster 0.0.1

Seamster

High Performance Fuzzy Business Entity Matching
Motivation
The purpose of this package is to facilitate a broader goal of centralizing and standardizing publicly
available data on businesses. Juniper is doing this because we believe that the key to innovation
in Commercial Insurance underwriting lies in making public data accessible, reliable, and complete.
Features

Built on top of Pandas and Scipy to do parallelized calculation of string similarities.
Extensible Join class allows for custom joins

Installation
Seamster requires Python 3.5 or newer to run.
Python package
You can easily install Seamster using pip:
pip3 install seamster
Manual
Alternatively, to get the latest development version, you can clone this repository and then manually install it:
git clone [email protected]:juniperlabs-foss/seamster.git
cd seamster
python3 setup.py install

Usage
import pandas as pd
from seamster.join_side import JoinSide
from seamster.join import NameZipEntTypeJoin

source1 = {
"id": [1, 2, 3, 4],
"names": [
"Subway",
"Blimpies",
"McDonalds Hamburguesas, Inc.",
"MacDonalds Hamburgers",
],
"zip": [80238, 80238, 80230, 80238],
"entity_type": ["llc", "llc", "corporation", "corporation"],
}

source2 = pd.DataFrame(
{
"id": [5, 6, 7],
"names": ["McDonalds Hamburgers Inc", "Burger King", "Wendys"],
"zip": [80238, 80238, 80230],
"entity_type": ["corporation", "llc", "inc"],
}
)

js_a = JoinSide(
data=pd.DataFrame(source1),
source="a",
entity_name_field="names",
id_field="id",
zip_field="zip",
entity_type_field="entity_type",
)
js_b = JoinSide(
data=pd.DataFrame(source2),
source="b",
entity_name_field="names",
id_field="id",
zip_field="zip",
entity_type_field="entity_type",
)

bs = NameZipEntTypeJoin(join_sides=(js_a, js_b))

df = bs.join(lower_bound=0.8)

print(df.to_dict(orient="records"))
# [
# {
# "id_a": 4,
# "names_a": "MacDonalds Hamburgers",
# "zip_a": 80238,
# "entity_type_a": "corporation",
# "source_a": "a",
# "clean_names_a": "macdonalds hamburgers",
# "clean_entity_type_a": "corp",
# "id_b": 5,
# "names_b": "McDonalds Hamburgers Inc",
# "zip_b": 80238,
# "entity_type_b": "corporation",
# "source_b": "b",
# "clean_names_b": "mcdonalds hamburgers",
# "clean_entity_type_b": "corp",
# "similarity": 0.86529,
# }
# ]

TODO

Create transform class that can permute and enrich the dataframe (e.g., geolocation, )
Support for multiple fuzzy joins

Contributing
For information on how to contribute to the project, please check the Contributor's Guide.
Contact
[email protected]
incoming+juniperlabs-foss/[email protected]
License
Apache 2.0
Credits
This package was created with Cookiecutter and the python-cookiecutter project template.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

There are no reviews.

zed

seamster 0.0.1

Languages

Categories

Description:

License

Share

Files In This Product:

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0

seamster 0.0.1

Languages

Categories

Description:

License

Share

Files In This Product:

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0