gnocis 0.9.12

Creator: bradpython12

Last updated:

0 purchases

TODO
Add to Cart

Description:

gnocis 0.9.12

Gnocis
Bjørn André Bredesen-Aa, 2018-
Stability





About
Large parts of complex genomes of multi-cellular organisms are non-coding. Cis-regulatory elements (CREs) are non-coding sequences that establish or modify gene transcription by multiple mechanisms. Multiple classes of CREs have been identified, including promoters, enhancers, silencers, insulators and Polycomb/Trithorax Response Elements. CREs can be identified experimentally or by means of in silico prediction. Experimental identification of CREs can depend on the cells that are used. Genome-wide in silico prediction, on the other hand, can potentially comprehensively predict CREs in a genome. In order to use machine learning for CRE prediction, a variety of functionality is required. A variety of packages exist for Python 3 for machine learning and sequence analysis, but successfully combining them requires the implementation of interfacing between them. Ensuring that the solution is efficient is important for large genomes, but can be challenging for end-users.
Gnocis is a system in Python 3 for the interactive and reproducible analysis and modelling of CRE DNA sequences. A broad suite of tools is implemented for data preparation, feature set definition, model formulation, training, cross-validation and genome-wide prediction. Gnocis employs Cython and a variety of techniques in order to optimally implement the glue necessary in order to apply machine learning for CRE analysis and prediction.

Installing
The recommended way to install Gnocis is through the PyPI package manager. In order to install via the PyPI package manager, open a terminal and execute:
pip install gnocis
Alternatively, Gnocis can be built from source. To build a wheel and install it, run:
make wheel
pip install dist/*.whl

Finally, Gnocis can be used by building from source and including the entire gnocis directory in the source tree. In order to do so, run
make all
Installing dependencies for tutorial
sudo apt-get install python3-sphinx
sudo pip3 install notebook pandas numpy matplotlib cupy sklearn tensorflow


Documentation
For the complete manual, see: https://bjornbredesen.github.io/gnocis/
For an in-depth tutorial, see the Jupyter Notebooks in the tutorial/ folder.

Features

DNA sequence handling

File format support - Loading and streaming

FASTA
2bit


File format support - Saving

FASTA


Operations

Printing
Sliding window extraction
Reverse complement generation




Sequence region handling

File format support - Loading and saving

GFF
BED
Coordinate lists (chromosome:start..end)


Operations

Overlap acquisition
Non-overlap acquisition
Merged set generation
Exclusion set generation
Sequence region extraction




Modelling

Generative DNA sequence models, with training and sequence generation

I.i.d.
N'th order Markov chains


Confusion matrices

Generation from model statistics
Printing
Receiver Operating Characteristic curve generation
Precision Recall Curve generation
Area Under the Curve calculation


Feature models

Log-odds
Dummy
Support Vector Machines (via sklearn)
Random Forest (via sklearn)


Features

k-mer spectrum
Motif occurrence spectrum
Motif pair occurrence spectrum




Motifs

Types

IUPAC nucleotide motifs
Position Weight Matrices
k-mer spectra




Feature networks

Directed acyclic graphs of features
Transformations of feature sets: filtering; concatenation; scaling; square; ...
Feature network nodes for constructing models
Application to sequences


Optionally integrates with established packages

Numpy – for integration with external methods
Pandas – for integration with external methods
Scikit-learn – for extended analyses and classic machine learning
TensorFlow – for neural networks
Jupyter Notebooks – for interactive and reproducible analysis and modelling


Easy to use
Objects are represented by classes, with human-readable descriptions
Optimized with Cython
...


Requirements

Python 3.6, 3.7, 3.8, 3.9
Windows, MacOS or Linux
C++ compiler when installing on Linux
Optional: Cython – required only when building from source
Optional: sklearn – for SVM and RF modelling
Optional: CuPy and CUDA – for CUDA-optimized SVM
Optional: TensorFlow – for neural networks


Citing
If you use Gnocis in published research, Gnocis must be cited. An article for Gnocis is in the process of being submitted for peer review. Please check back for an updated citation policy.

License
MIT License
Copyright (c) 2018- Bjørn André Bredesen-Aa
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Logo
Logo: Copyright Bjørn André Bredesen-Aa

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files:

Customer Reviews

There are no reviews.