Bayte 0.2.1

Description:

bayte 0.2.1

Overview
This package is a lightweight implementation of bayesian target encoding. This implementation is taken
from Slakey et al., with ensemble methodology from Larionov.
The encoding proceeds as follows:

User observes and chooses a likelihood for the target variable (e.g. Bernoulli for a binary classification problem),
Using Fink's Compendium of Priors, derive the conjugate prior for the likelihood (e.g. Beta),
Use the training data to initialize the hyperparameters for the prior distribution

NOTE: This process is generally reliant on common interpretations of hyperparameters.

Using Fink's Compendium, derive the methodology for generating the posterior distribution,
For each level in the categorical variable,

Generate the posterior distribution using the observed target values for the categorical level,
Set the encoding value to a sample from the posterior distribution

If a new level has appeared in the dataset, the encoding will be sampled from the prior distribution.
To disable this behaviour, initialize the encoder with handle_unknown="error".

Then, we repeat step 5.2 a total of n_estimators times, generating a total of n_estimators training datasets
with unique encodings. The end model is a vote from each sampled dataset.
For reproducibility, you can set the encoding value to the mean of the posterior distribution instead.
Installation
Install from PyPI:
python -m pip install bayte

Usage
Encoding
Let's create a binary classification dataset.
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2)
X = pd.DataFrame(X)

# Categorical data
X[5] = np.random.choice(["red", "green", "blue"], size=1000)

Import and fit the encoder:
import bayte as bt

encoder = bt.BayesianTargetEncoder(dist="bernoulli")
encoder.fit(X[[5]], y)

To encode your categorical data,
X[5] = encoder.transform(X[[5]])

Ensemble
If you want to utilize the ensemble methodology described above, construct the same dataset
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2)
X = pd.DataFrame(X)

# Categorical data
X[5] = np.random.choice(["red", "green", "blue"], size=1000)

and import a classifier to supply to the ensemble class
from sklearn.svm import SVC

import bayte as bt

ensemble = bt.BayesianTargetClassifier(
base_estimator=SVC(kernel="linear"),
encoder=bt.BayesianTargetEncoder(dist="bernoulli")
)

Fit the ensemble. NOTE: either supply an explicit list of categorical features to categorical_feature, or
use a DataFrame with categorical data types.
ensemble.fit(X, y, categorical_feature=[5])

When you call predict on a novel dataset, note that the encoder will transform your data at runtime and it
will encode based on the mean of the posterior distribution:
ensemble.predict(X)

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit

bayte 0.2.1

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit