densmap-learn 0.2.2

Creator: bigcodingguy24

Last updated:

Add to Cart

Description:

densmaplearn 0.2.2

densMAP
This software package contains an implementation of density-preserving data visualization tool densMAP, which augments the UMAP algorithm (based on v0.3.10).
Some of the following instructions are adapted from the UMAP repository.
Installation
densMAP shares the same dependencies as UMAP, including:

numpy
scipy
scikit-learn
numba==0.48.0

Our code currently does not support the latest version of numba (0.49.0).
Install Options
PyPI installation of densMAP can be performed as:
pip install densmap-learn

For a manual install, first download this package:
wget https://github.com/hhcho/densvis/archive/master.zip
unzip densvis-master.zip
rm densvis-master.zip
cd densvis-master/densmap/

Install the requirements:
sudo pip install -r requirements.txt

or
conda install scikit-learn numba

Finally, install the package:
python setup.py install

Usage
Like UMAP, the densMAP package inherits from sklearn classes, and thus drops in neatly
next to other sklearn transformers with an identical calling API.
import densmap
from sklearn.datasets import fetch_openml
from sklearn.utils import resample

digits = fetch_openml(name='mnist_784')
subsample, subsample_labels = resample(digits.data, digits.target, n_samples=7000,
stratify=digits.target, random_state=1)

embedding, ro, re = densmap.densMAP().fit_transform(subsample)

Input arguments
There are a number of parameters that can be set for the densMAP class; the
major ones inherited from UMAP are:


n_neighbors: This determines the number of neighboring points used in
local approximations of manifold structure. Larger values will result in
more global structure being preserved at the loss of detailed local
structure. In general this parameter should often be in the range 5 to
50; we set a default of 30.


min_dist: This controls how tightly the embedding is allowed compress
points together. Larger values ensure embedded points are more evenly
distributed, while smaller values allow the algorithm to optimise more
accurately with regard to local structure. Sensible values are in the
range 0.001 to 0.5, with 0.1 being a reasonable default.


metric: This determines the choice of metric used to measure distance
in the input space. A wide variety of metrics are already coded, and a user
defined function can be passed as long as it has been JITd by numba.


The additional parameters specific to densMAP are:


dens_frac: This determines the fraction of iterations that will include
the density-preservation term of the gradient (float, between 0 and 1); default 0.3.


dens_lambda: This determines the weight of the density-preservation
objective. See the original paper for the effect this parameter has when changed (float, non-negative); default 2.0.


final_dens: When this flag is True, the code returns, in addition to the embedding,
the local radii for the original dataset and for the embedding. If False, only the embedding
is returned (bool); default True.


Other parameters that can be set include:


ndim: Dimensions of the embedding (int); default 2.


n_epochs: Number of epochs to run the algorithm (int); default 750.


var_shift: Regularization term added to the variance of embedding local radius for stability (float, non-negative); default 0.1.


Output arguments
If final_dens is True, returns (embedding, ro, re), where:


embedding: a (number of data points)-by-ndims numpy array containing the embedding coordinates


ro: a numpy array of length (number of data points) that contains the log local radius of
the input data


re: a numpy array of length (number of data points) that contains the log local radius
of the embedded data


If final_dens is False, returns just embedding.
An example of making use of these options:
embedding, ro, re = densmap.densMAP(n_neighbors=25, n_epochs=500, dens_frac=0.3,
dens_lambda=0.5).fit_transform(data)

R wrapper
We use the reticulate library to provide compatibility with R as well with the
script densmap.R. Since reticulate runs Python code with an R wrapper, to use this
library you must have Python3 installed. The script will automatically install the
densmap-learn package via pip if it is not installed.
From then, within your R script, you can run
source("densmap.R")

# Assume `data` is an R dataframe, needs to be converted to a matrix

out <- densMAP(as.matrix(data))

The R function densMAP takes the same optional arguments listed in Input Arguments section
above with the same names and default values. So you can, for example, run:
out <- densMAP(as.matrix(data), n_neighbors=25, n_epochs=500, dens_frac=0.3, dens_lambda=0.5)

If final_dens is TRUE then out[[1]] will contain the embedding, out[[2]] will be the
log original local radii, and out[[3]] the log embedding local radii.
If final_dens is FALSE then out will be the embeddings itself.
From the command line
We also provide the file densmap.py which allows you to run densMAP from the terminal,
specifying the major options from above. Simply run:
python densmap.py -i [--input INPUT] -o [--outname OUTNAME] -f [--dens_frac DENS_FRAC]
-l [--dens_lambda DENS_LAMBDA] -s [--var_shift VAR_SHIFT] -d [--ndim NDIM]
-n [--n-epochs N-EPOCHS] -k [--n-nei N-NEIGHBORS] [--final_dens/--no_final_dens FINAL_DENS]

where within the square braces are the long-form flag and the capitalized text corresponds to
the parameters above. For example:
python densmap.py -i data.txt -o out -f .3 -k 25

and
python densmap.py --input data.txt --outname out --dens_frac .3 --n-nei 25

both run densMAP on input file data.txt to produce output files out_emb.txt
and out_dens.txt, using dens_frac=0.3 and n_neighbors=25.
The input file is parsed using numpy’s loadtxt function if it is a .txt file; another option is to provide a .pkl file.
We assume that the first dimension (row index) iterates over the data instances, and the second dimension (column index) iterates over the features.
The output files include:

out_emb.txt a TSV file containing the embedding coordinatesof the data, and
out_dens.txt a (number of data points)-by-2 TSV file containing in the first column the log local radii in the original data and in the second column the log local radii in the embedding.

Example data and script
We have included the file trial_densmap.py which allows you to run an example straight out of
the box.
Run:
python trial_densmap.py

The code will load a dataset that contains a mixture of six Gaussian point clouds
with increasing variance and
will run both densMAP and UMAP on the dataset with default parameters and plot the embeddings
(if you have matplotlib installed),
and alignment of the local radius in each case. It will also save the embeddings in {umap,densmap}_trial_emb.txt,
the local radii in {umap,densmap}_trial_dens.txt, and the plot in densmap_trial_fig.png.
The plot will look like:



References
Our densMAP algorithm is described in:
Ashwin Narayan, Bonnie Berger*, and Hyunghoon Cho*. "Density-Preserving Data Visualization Unveils Dynamic Patterns of Single-Cell Transcriptomic Variability", bioRxiv, 2020.
Original UMAP algorithm is described in:
Leland McInnes, John Healy, and James Melville. "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction", arXiv, 1802.03426, 2018.
License
This package is licensed under the MIT license.
Contact for questions
Ashwin Narayan, ashwinn@mit.edu
Hoon Cho, hhcho@broadinstitute.org
Additionally, some questions regarding the UMAP-specific aspects of this software may be answered by browsing the UMAP documentation at Read the Docs, which includes an FAQ.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.