Last updated:
0 purchases
bioconvert 1.1.1
Bioconvert
Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.
contributions:
Want to add a convertor ? Please join https://github.com/bioconvert/bioconvert/issues/1
Overview
Life science uses many different formats. They may be old, or with complex syntax and converting those formats may be a challenge. Bioconvert aims at providing a common tool / interface to convert life science data formats from one to another.
Many conversion tools already exist but they may be dispersed, focused on few specific formats, difficult to install, or not optimised. With Bioconvert, we plan to cover a wide spectrum of format conversions; we will re-use existing tools when possible and provide facilities to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.
In Jan 2023, we had 50 formats, 100 direct conversions available.
Installation
BioConvert is developped in Python. Please use conda or any Python environment manager to install BioConvert using the pip command:
pip install bioconvert
50% of the conversions should work out of the box. However, many conversions require external tools. This is why we
recommend to use a conda environment. In particular, most external tools are available on the bioconda channel.
For instance if you want to convert a SAM file to a BAM file you would need to install samtools as follow:
conda install -c bioconda samtools
Since bioconvert is available on bioconda on solution that installs BioConvert and all its dependencies is to use conda/mamba:
conda env create --name bioconvert mamba
conda activate bioconvert
mamba install bioconvert
bioconvert --help
See the Installation section for more details and alternative solutions (docker, singularity).
Quick Start
There are many conversions available. Type:
bioconvert --help
to get a list of valid method of conversions. Taking the example of a conversion from a FastQ file into
a FastA file, you could do the conversion as follows:
bioconvert fastq2fasta input.fastq output.fasta
bioconvert fastq2fasta input.fq output.fasta
bioconvert fastq2fasta input.fq.gz output.fasta.gz
bioconvert fastq2fasta input.fq.gz output.fasta.bz2
When there is no ambiguity, you can be implicit:
bioconvert input.fastq output.fasta
The default method of conversion is used but you may use another one. Checkout the available methods with:
bioconvert fastq2fasta --show-methods
For more help about a conversion, just type:
bioconvert fastq2fasta --help
and more generally:
bioconvert --help
You may also call BioConvert from a Python shell:
# import a converter
from bioconvert.fastq2fasta import FASTQ2FASTA
# Instanciate with infile/outfile names
convert = FASTQ2FASTA(infile, outfile)
# the conversion itself:
convert()
Available Converters
Conversion table
Converters
CI testing
Default method
abi2fasta
BIOPYTHON
abi2fastq
BIOPYTHON
abi2qual
BIOPYTHON
bam2bedgraph
BEDTOOLS
bam2bigwig
DEEPTOOLS
bam2cov
BEDTOOLS
bam2cram
SAMTOOLS
bam2fasta
SAMTOOLS
bam2fastq
SAMTOOLS
bam2json
BAMTOOLS
bam2sam
SAMBAMBA
bam2tsv
SAMTOOLS
bam2wiggle
WIGGLETOOLS
bcf2vcf
BCFTOOLS
bcf2wiggle
WIGGLETOOLS
bed2wiggle
WIGGLETOOLS
bedgraph2bigwig
UCSC
bedgraph2cov
BIOCONVERT
bedgraph2wiggle
WIGGLETOOLS
bigbed2bed
DEEPTOOLS
bigbed2wiggle
WIGGLETOOLS
bigwig2bedgraph
DEEPTOOLS
bigwig2wiggle
WIGGLETOOLS
bplink2plink
PLINK
bplink2vcf
PLINK
bz22gz
Unix commands
clustal2fasta
BIOPYTHON
clustal2nexus
GOALIGN
clustal2phylip
BIOPYTHON
clustal2stockholm
BIOPYTHON
cram2bam
SAMTOOLS
cram2fasta
SAMTOOLS
cram2fastq
SAMTOOLS
cram2sam
SAMTOOLS
csv2tsv
BIOCONVERT
csv2xls
Pandas
dsrc2gz
DSRC software
embl2fasta
BIOPYTHON
embl2genbank
BIOPYTHON
fasta2clustal
BIOPYTHON
fasta2faa
BIOCONVERT
fasta2fasta_agp
BIOCONVERT
fasta2fastq
PYSAM
fasta2genbank
BIOCONVERT
fasta2nexus
GOALIGN
fasta2phylip
BIOPYTHON
fasta2twobit
UCSC
fasta_qual2fastq
PYSAM
fastq2fasta
BIOCONVERT available
fastq2fasta_qual
BIOCONVERT
fastq2qual
READFQ
genbank2embl
BIOPYTHON
genbank2fasta
BIOPYTHON
genbank2gff3
BIOCODE
gfa2fasta
BIOCONVERT
gff22gff3
BIOCONVERT
gff32gff2
BIOCONVERT
gff32gtf
BIOCONVERT
gz2bz2
pigz/pbzip2 software
gz2dsrc
DSRC software
json2yaml
Python
maf2sam
BIOCONVERT
newick2nexus
GOTREE
newick2phyloxml
GOTREE
nexus2clustal
GOALIGN
nexus2fasta
BIOPYTHON
nexus2newick
GOTREE
nexus2phylip
GOALIGN
nexus2phyloxml
GOTREE
ods2csv
pyexcel library
pdb2faa
BIOCONVERT
phylip2clustal
BIOPYTHON
phylip2fasta
BIOPYTHON
phylip2nexus
GOALIGN
phylip2stockholm
BIOPYTHON
phylip2xmfa
BIOPYTHON
phyloxml2newick
GOTREE
phyloxml2nexus
GOTREE
plink2bplink
PLINK
plink2vcf
PLINK
sam2bam
SAMTOOLS
sam2cram
SAMTOOLS
sam2paf
BIOCONVERT
scf2fasta
BIOCONVERT
scf2fastq
BIOCONVERT
sra2fastq
FASTQDUMP
stockholm2clustal
BIOPYTHON
stockholm2phylip
BIOPYTHON
tsv2csv
BIOCONVERT
twobit2fasta
DEEPTOOLS
vcf2bcf
BCFTOOLS
vcf2bed
BIOCONVERT
vcf2bplink
PLINK
vcf2plink
PLINK
vcf2wiggle
WIGGLETOOLS
wig2bed
BEDOPS
xls2csv
xlsx2csv
Pandas library
xmfa2phylip
BIOPYTHON
yaml2json
Pandas library
Contributors
Setting up and maintaining Bioconvert has been possible thanks to users and contributors.
Thanks to all:
Changes
Version
Description
1.1.1
Fix benchmark labels.
NEW: fast52pod5 conversion
FIX: set goalign and gotree instead of go requirements
1.1.0
Implement ability to benchmark the CPU and memory usage (not just time)
benchmark incorporates CPU/memory usage
1.0.0
Fix bam2fastq for paired data that computed useless intermediate file
https://github.com/bioconvert/bioconvert/issues/325
more realistic fastq simulator
pin openpyxl to <=3.0.10 to prevent regression error in v3.1.0
0.6.3
add picard method in bam2sam
Fixed all CI workflows to use mamba
drop python3.7 support and add 3.10 support
update bedops test file to fit the latest bedops 2.4.41 version
revisit logging system
0.6.2
added gff3 to gtf conversion.
Added pdb to faa conversion
Added missing –reference argument to the cram2sam conversion
0.6.1
output file can be in sub-directories allowing syntax such as
‘bioconvert fastq2fasta test.fastq outputs/test.fasta
fix all CI actions
add more examples as notebooks in ./examples
add a Snakefile for the paper in ./doc/Snakefile_paper
0.6.0
Fix bug in bam2sam (method sambamba)
Fix graph layout
add threading in fastq2fasta (seqkit method)
multibenchmark feature added
stable version used for web interface
0.5.2
Update requirements and environment.yml and add a conda spec-file.txt file
0.5.1
add genbank2gff3 requirement material in bioconvert.utils.biocode
0.5.0
Add CI actions for all converters
remove sniffer (now in biosniff on pypi https://pypi.org/project/biosniff/)
A complete benchmarking suite (see doc/Snakefile_benchmark file and
benchmarking)
documentation and tests for all converters
removed the validators (we assume intputs are correct)
0.4.X
(aug 2019) added nexus2fasta, cram2fasta, fasta2faa … ; 1-to-many and
many-to-one converters are now part of the API.
0.3.X
may 2019. new methods abi2qual, bigbed2bed, etc. added –threads option
0.2.X
aug 2018. abi2fastx, bioconvert_stats tool added
0.1.X
major refactoring to have subcommands with implicit/explicit mode
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.