conifer-analysis 0.1.0

Creator: coderz1093

Last updated:

Add to Cart

Description:

coniferanalysis 0.1.0

Post-process conifer output for downstream statistical analysis.
conifer-analysis uses dask in order to analyze
conifer results in a distributed and
out-of-memory fashion. This can be helpful when processing many such results.

Example
Say that you have a bunch of conifer results in a directory. You can
generate a histogram of the confidence values per file (sample) and per taxa
using the provided pipeline confidence_hist. Even when you work locally, it
can be helpful to explicitly create a distributed client controlling the number
of workers.
from dask.distributed import Client
from conifer_analysis import confidence_hist

client = Client(n_workers=8)
You can then visit the default dashboard in
your browser to observe tasks live. Next, we run the pipeline which returns a
pandas.DataFrame.
hist = confidence_hist("data/*.tsv")
hist.info()
As an example of the returned shape:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7700 entries, 0 to 7699
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 path 7700 non-null category
1 name 7700 non-null category
2 taxonomy_id 7700 non-null category
3 bin 7700 non-null interval[float64, right]
4 midpoints 7700 non-null float64
5 read1_hist 7700 non-null int64
6 read2_hist 7700 non-null int64
7 avg_hist 7700 non-null int64
dtypes: category(3), float64(1), int64(3), interval(1)
memory usage: 385.3 KB


Install
It’s as simple as:
pip install conifer-analysis
If you want to observe tasks in the dask dashboard, you will need additional
dependencies.
pip install conifer-analysis[dashboard]


Copyright

Copyright © 2022, Moritz E. Beber.
Free software distributed under the Apache Software License 2.0.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.