gc-count 0.2.1

Creator: bradpython12

Last updated:

Add to Cart

Description:

gccount 0.2.1

rust-gc-count
cargo build --release
target/release/gccount --input in.fa --output out.wig

Description
A tool for generating wiggle files of GC from DNA written in Rust.
Help
Calculate GC and write into a wiggle file

Usage: gccount [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
--input <INPUT> FASTA formatted file (can be gziped) to calculate GC from
--output <OUTPUT> Output wiggle file. One file will be produced
--window <WINDOW> Window size to calculate GC over [default: 5]
--omit-tail Remove any trailing sequence and do not calcualte GC. Default behaviour is to retain the leftover sequence. GC is calculated over the remaining sequence length
--write-chrom-sizes Write a chrom.sizes file into the current directory. Use --chrom-sizes-path to configure location
--chrom-sizes-path <CHROM.SIZES> Path of the chrom.sizes file. Defaults to chrom.sizes [default: chrom.sizes]
--verbose Be verbose
-h, --help Print help
-V, --version Print version

Checksum calculator
target/release/checksumseq --input in.fa --output chrom.file

Another binary for calculating sequence lengths and checksums from a file. The resulting file is formted as tab separated with the following columns:

Sequence ID as it appears in the FASTA file
Sequence length
Refget ga4gh identifier (SQ.sha512t24u)
MD5 checksum hex encoded

The resulting file can be used as a chrom.sizes file too.
Command line
Iterates through a FASTA file calclating checksums and sequence length

Usage: checksumseq [OPTIONS]

Options:
--input <INPUT> FASTA formatted file to calculate checksums from (- mean STDIN). Reads gzipped FASTA if the filename ends with .gz (including bgzip files) [default: -]
--output <OUTPUT> Output file (- means STDOUT). Each line is tab separated reporting "ID Length sha512t24u md5" [default: -]
--verbose Be verbose
-h, --help Print help
-V, --version Print version

From within Python
Python bindings are available for the checksumseq calculation. The following code demonstrates how to use the bindings.
Install the bindings
maturin is used to build the bindings and install them into the current environment. Ensure you are using the Python environment you want to install the bindings into.
pip install maturin

Then navigate to the rust-gc-count/bindings directory and run the following command to install the bindings.
maturin build --release

Use the bindings
To use the bindings in Python, the following code demonstrates how to use the bindings.
from gc_count import checksum

results = checksum("path/to/seq/fasta")
for result in results:
print(result.sha512)

Level of code quality
The code developed here has not been extensively tested but has been verified as producing correct and expected output.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.