Blasttools 0.1.16

Description:

blasttools 0.1.16

blasttools
Commands for turning blast queries into pandas dataframes.
Blast against any built blast databases
blasttools blast --out=my.pkl query.fasta my_blastdbs_dir/*.pot

Install
Install with
python -m pip install -U blasttools
# *OR*
python -m pip install -U 'git+https://github.com/arabidopsis/blasttools.git'

Once installed you can update with blasttools update
Common Usages:
Build some blast databases from Ensembl Plants.
blasttools plants --release=40 build triticum_aestivum zea_mays

Find out what species are available:
blasttools plants --release=40 species

Blast against my.fasta and save dataframe as a pickle file (the default is to
save as a csv file named my.fasta.csv).
blasttools plants blast --out=dataframe.pkl my.fasta triticum_aestivum zea_mays

Get your blast data!
import pandas as pd
df = pd.read_pickle('dataframe.pkl')

Parallelization
When blasting, you can specify --num-threads which is passed directly to the
underlying blast command. If you want to parallelize over species, databases or fasta files,
I suggest you use GNU Parallel [Tutorial].
parallel has a much better set of options for controlling how the parallelization works
and is also quite simple for simple things.
e.g. build blast databases from a set of fasta files concurrently:
parallel blasttools build ::: *.fa.gz

Or blast everything!
species=$(blasttools plants species)
parallel blasttools plants build ::: $species
# must have different output files here...
parallel blasttools plants blast --out=my{}.pkl my.fasta ::: $species
# or in batches of 4 species at a time
parallel -N4 blasttools plants blast --out='my{#}.pkl' my.fasta ::: $species

Then gather them all together...
blasttools concat --out=alldone.xlsx my*.pkl && rm my*.pkl

or programmatically:
from glob import glob
import pandas as pd
df = pd.concat([pd.read_pickle(f) for f in glob('my*.pkl')], ignore_index=True)

Remember: if you parallelize your blasts and use --num-threads > 1
then you are probably going to be fighting for cpu time
amongst yourselves!
Best matches
Usually if you want the top/best --best=3 will select the lowest evalue's for
each query sequence. However if you want say the best to, say, be the longest query match
then you can add --expr='qstart - qend'. (Remember we are looking for the lowest values).
XML
Blast offers an xml (--xml) output format that adds query, match, sbjct strings. The other
fields are equivalent to adding --columns='+score gaps nident positive qlen slen'.
It also offers a way to display the blast match as a pairwise alignment.
from blasttools.blastxml import hsp_match
df = pd.read_csv('results.csv')
df['alignment'] = df.apply(hsp_match, axis=1)
print(df.iloc[0].alignment)

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit

blasttools 0.1.16

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

CSS Menu

CLI Spinners

Type Fest

dtm-main

es-toolkit