GitLocker: The Coding Marketplace

Description:

KAVICA 1.3.4

KAVICA: Powerful Python Cluster Analysis and Inference Toolkit

What is it?
kavica is a Python package that provides semi-automated, flexible, and expressive clustering
analysis designed to make working with "unlabeled" data easy and intuitive.
It aims to be the fundamental high-level building block for doing practical, real world cluster analysis in Python.
Additionally, it has the broader goal of becoming A powerful and flexible open source AutoML unsupervised / clustering
analysis tool and pipeline. It is already well on its way towards this goal.
Main Features
Here are just a few of the things that kavica does well:

Intelligent Density
Maping
to model the density structuer of the data in analogy to
Einstein's theory of relativity,
and automated Density
Homogenizing
to prepare the
data for the density-based clustering (e.g DBSCAN)

Automatic, and powerful Organization Component
Analysis to interpret
the clustering result by understanding the topological structuer of each cluster

Topological and powerful Self-Organizing Maps Inference
System to
use the self-learning ability of the SOM to understand the topological structuer of the data

Automated and Bayesian-based DBSCAN Hyper-parameter
Tuner to select the optimal
hyper-parameters configuration of the DBSCAN clustering algorithm

Efficient handling of feature
selection in a potentially
high-dimensional and
massive datasets

Gravitational implementation of Kohonen Generational Self-Organizing Maps (
GSOM) useful
for unsupervised learning and supper-clustering by providing an enriched graphics, plots and animations features.

Computational geometrical model Polygonal
Cage to transfer
feature vectors from a curved non-euclidean feature space to a new euclidean one.

Robust factor analysis to reduce a
large number of variables into fewer numbers

Easy handling of missing data (represented
as NaN, NA, or NaT) in floating point
as well as non-floating point data

Flexible implementation of directed and undirected graph data
structuer and
algorithms.

Intuitive resampling data sets

Powerful, flexible parser functionality to
perform parsing, manipulating, and generating
operations on flat, massive and unstructured Traces datasets
which are generated by MareNostrum

Utilities functionality: intuitive explanatory
data analysis, plotting, load and generate
data, and etc...

Examples:

Feature Space Curvature Map

Density
Homogenizing

Application of Feature Space Curvature Map on a multi-density 2D dataset Synt10 containing ten clusters. (a) A scatter
plot of clusters with varied densities. The legend shows the size/N(μ,σ2) per cluster, the colors represent the data
original labeling and the
red lines draw the initial FSF. (b) shows the FSC model that is computed with our FSCM method. Note that the red lines
show the deformation of the FSF. (c) scatter plots the data (a) projected by applying our transformation through
model (b).
As a result, the diversity of the clusters’ density scaled appropriately to achieve a better density-based clustering
performance.

Polygonal Cage
Multilinear transformation

Feature Space Curvetuer
Feature Space Fabric

Data point transformation between a bent FSC (a) and regular FSF (b) based on the Multi-linear transformation in
R2.

Organization Component Analysis

Application of the OCA on the Iris dataset

Video

Where to get it
The source code is currently hosted on GitHub at: kavica
Binary installers for the latest released version are available at the
Python Package Index (PyPI) and on Conda.
The recommended way to install kavica is to use:
# PyPI
pip install kavica

But it can also be installed using:
# or conda
conda config --add channels conda-forge
conda install kavica

To verify your setup, start Python from the command line and run the following:
import kavica

Dependencies
See the requirement.txt for installing the required packages:
pip install -r requirements.txt

Publications
Unsupervised Feature Selection for Noisy Data
Organization Component Analysis: The method for extracting insights from the shape of cluster
Feature Space Curvature Map: A Method To Homogenize Cluster Densities
Issue tracker
If you find a bug, please help us solve it by filing a report.
Contributing
If you want to contribute, check out the
contribution guidelines.
License
The main library of kavica is
released under the BSD 3 clause license.