version 2023.04
Creating a BAM files from non-sensitive fragments data (i.e. FinaleDB frag.tsv.bgz file) using sequences extracted from a reference genome.

Test usage

Make sure you have all the dependencies and you will be able to run the program.

samtools version 1.7 or higher;
bedtools version v2.30.0 or higher;
awk version 20200816 or higher;
gunzip (gzip) version 1.6 or higher;
Python version 3.10 or higher, only if you install it as a Python package;

For installing fragmentstein from the Python PyPi repository:
pip install fragmentstein

Optional, you can install it in a dedicated Python environment:
conda create -n fragmentstein python=3.10 samtools bedtools -c bioconda
conda activate fragmentstein
pip install fragmentstein

The same you can do using the Mamba package manager:
mamba create -n fragmentstein python=3.10 samtools bedtools -c bioconda
mamba activate fragmentstein
pip install fragmentstein

Afterwards, you can use fragmentstein directly from your shell:
fragmentstein -h

Alternatively, you can install it from sources:
Clone the repository.
git clone
cd fragmentstein

Add the path of the './scripts/' into your PATH, best in your ~/.bashrc or ~/.zshrc using the following command:
echo 'export PATH=$(pwd)/scripts/$PATH' >> ~/.bashrc

The script should be available in your shell: -h

The following examples will show you how to do a test run
mkdir results -i -i tests/data/test_sample1.tsv.bgz -o results/test_sample1.bam \
-g tests/data/resources/test_ref_hg38.fna -c tests/data/resources/test_ref.chrom.sizes

You can install the Python wrapper also from sources as follows:
First install the Python dependency management and packaging tool called Poetry:
curl -sSL | python3 -

Followed by installing the fragmentstein Python wrapper from the root of the cloned repository:
poetry install

To run tests use the following command:
poetry run pytest

Required arguments
-i or --input Path to finaleDB frag.tsv.bgz file or .bed or .bedpe file. Expected are either a 6-column BED file or a 10-column paired-end BEDPE file.
-g or --genome Path to the reference genome fasta file.
-c or --chrom_sizes Chromosome sizes file.
Optional arguments
-o or --output Path to and name of the output BAM file. Default is to substitute the .tsv.gz part of the extension with .bam.
-r or --read_length Both reverse and forward reads of a fragment will have this length unless the fragment is shorter than the read length. Default: 101.
-qf or --map_quality_filter Minimum mapping quality. Setting it to '0' accepts all fragments. Default: 30.
-qd or --map_quality_default Mapping quality to set for example if missing from the input files or if you want to change it for downstream analyses. Default: 60.
-bq or --base_quality ASCII of Phred-scaled base QUALity+33. Default: F (quality: 37).
-N or --replace_incomplete_nucleotides Replace all incompletely specified nucleotides with N.
-s or --sort Sort the output BAM file by coordinate. No value has to be specified, just type -s for sorting.
-t or --threads Number of parallel threads to be used when possible. Default: 1.
--temp Temporary folder where to store intermediate temporary files. Default: same folder as the output file.
Fragmentstein is developed and maintained by Zsolt Balázs and Todor Gitchev.


