sequana-nanomerge 1.5.0

Creator: bradpython12

Last updated: September 25, 2024

0 purchases

Free

Donate

Languages

Python

Description:

sequanananomerge 1.5.0

This is is the nanomerge pipeline from the Sequana project

Overview:
merge fastq files generated by Nanopore run and generates raw data QC.

Input:
individual fastq files generated by nanopore demultiplexing

Output:
merged fastq files for each barcode (or unique sample)

Status:
production

Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation
You can install the packages using pip:
pip install sequana_nanomerge --upgrade
An optional requirements is pycoQC, which can be install with conda/mamba using e.g.:
conda install pycoQC
you will also need graphviz installed.

Usage
sequana_nanomerge --help
If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(–input-pattern) such as */*.gz:
sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*/*fastq.gz'
otherwise all fastq files are in DATAPATH/ so the input pattern can just be *.fastq.gz:
sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*fastq.gz'
The –summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt
Note that the different between the two is the extra */ before the *.fastq.gz pattern since barcoded files are in individual subdirectories.
In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline:
cd nanomerge
sh nanomerge.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt
Or use sequanix interface.
Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file
barcode,project,sample
barcode01,main,A
barcode02,main,B
barcode03,main,C
For a non-barcoded run, you must provide a file where the barcode column can be set (empty):
barcode,project,sample
,main,A
or just removed:
project,sample
main,A

Usage with apptainer:
With apptainer, initiate the working directory as follows:
sequana_nanomerge --use-apptainer
Images are downloaded in the working directory but you can store then in a directory globally (e.g.):
sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers
and then:
cd nanomerge
sh nanomerge.sh
if you decide to use snakemake manually, do not forget to add apptainer options:
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

Requirements
This pipelines requires the following executable(s), which is optional:

pycoQC
dot

Details
This pipeline runs nanomerge in parallel on the input fastq files (paired or not).
A brief sequana summary report is also produced.

Rules and configuration details
Here is the latest documented configuration file
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version
Description

1.5.0

refactoring to use Click

1.4.0

sub sampling was biased in v1.3.0. Using stratified sampling to
correcly sample large file. Also set a –promethion option that
auomatically sub sample 10% of the data
add summary table

1.3.0

handle large promethium run by using a sub sample of the
sequencing summary file (–sample of pycoQC still loads the entire
file in memory)

1.2.0

handle large promethium run by using find+cat instead of just
cat to cope with very large number of input files.

1.1.0

add subsample option and set to 1,000,000 reads to handle large
runs such as promethion

1.0.1

CSV can now handle sample or samplename column name in samplesheet.
Fix the pyco file paths, update requirements and doc

1.0.0
Stable release ready for production

0.0.1
First release.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

There are no reviews.

zed

sequana-nanomerge 1.5.0

Languages

Categories

Description:

License

Share

Files In This Product:

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0

sequana-nanomerge 1.5.0

Languages

Categories

Description:

License

Share

Files In This Product:

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0