pysequila 0.4.1

Creator: railscoder56

Last updated:

Add to Cart

Description:

pysequila 0.4.1

pysequila
pysequila is a Python entrypoint to SeQuiLa, an ANSI-SQL compliant solution for efficient sequencing reads processing and genomic intervals querying built on top of Apache Spark. Range joins, depth of coverage and pileup computations are bread and butter for NGS analysis but the high volume of data make them execute very slowly or even failing to compute.

Requirements

Python 3.7, 3.8, 3.9



Features

custom data sources for bioinformatics file formats (BAM, CRAM, VCF)
depth of coverage calculations
pileup calculations
reads filtering
efficient range joins
other utility functions
support for both SQL and Dataframe/Dataset API



Setup
$ python -m pip install --user pysequila
or
(venv)$ python -m pip install pysequila


Usage
$ python
>>> from pysequila import SequilaSession
>>> ss = SequilaSession \
.builder \
.config("spark.jars.packages", "org.biodatageeks:sequila_2.12:1.1.0") \
.config("spark.driver.memory", "2g") \
.getOrCreate()
>>> ss.sql(
f"""
CREATE TABLE IF NOT EXISTS reads
USING org.biodatageeks.sequila.datasources.BAM.BAMDataSource
OPTIONS(path "/features/data/NA12878.multichrom.md.bam")
"""
>>> ss.sql ("SELECT * FROM coverage('reads', 'NA12878','/features/data/Homo_sapiens_assembly18_chr1_chrM.small.fasta")
>>> # or using DataFrame/DataSet API
>>> ss.coverage("/features/data/NA12878.multichrom.md.bam", "/features/data/Homo_sapiens_assembly18_chr1_chrM.small.fasta")


ChangeLog
0.1.0 (2020-09-16)

Initial release.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.