asgard-saga 0.9.0a0.dev0

Last updated:

0 purchases

asgard-saga 0.9.0a0.dev0 Image
asgard-saga 0.9.0a0.dev0 Images
Add to Cart

Description:

asgardsaga 0.9.0a0.dev0

ASGARD
User Manual for Asgard
ASGARD is a configuration file created by the Costa Rica National High Technology Center to automate the identification of antibiotic resistance genes in bacterias like Salmonella. ASGARD provides an easy to use interface to process big batches of fastq files with little to no configuration. It also provides a CPU optimization algorithm that reduces the processing time. This tool is based on the ARIBA software that was developed by Sanger-Pathogens.
SAGA is a compiled workflow of programs that enables the alignment, indexing and mapping of genes samples against a reference genome. Multiple reference genomes are available in different databases using the NCBI API as fasta files. SAGA provides an easy to use way to select the reference genome and analyze a series of samples to obtain a Phylogenetic tree using RAxML.
Usage
Python asgard.py <options>
Required Arguments:
--config_dir: Path to the directory containing the configuration files. All .ajson configuration files contained in this directory will be executed in alphabetical order.
AJSON specification
ASGARD json files are an extension of the JavaScript Object Notation that provides references to internal and external properties of the objects. Certain elements must be present in the configuration file for the program to work.
Syntax
Files contained in the config_dir directory with the .ajson extension are treated as configuration files for the execution of ASGARD
The configuration file is read from top to bottom and any reference values are resolved in the same manner.

Internal Objects

Internal references are defined using double braces. The referenced property must be assigned before it is referenced. In this example the value of the color key inside the motorcycle object would be lightblue after the evaluation.



{
"motorcycle": {
"variant": "light",

"color": "{{variant}}blue",

"year": 2010
}
}


External Objects

References to external objects are defined using double braces and using dot to navigate the object depth, all external references must be made from the top object and are case sensitive, in this example the color of the helmet will match the color of the motorcycle.



{

"motorcycle":{
"color": "blue",
"year": 2010
},

"helmet":{
"color":"{{motorcycle.color}}"
}

It is possible to create composite values from multiple references and strings.
The definition of the name/value pair must be defined before it is referenced so that it can be resolved properly.



Object
Description
Key
Description




constants
Contains non changing configuration parameters that can be referenced by other objects. Properties inside the "constant" object must not contain external references.
name
Name of the script that will be executed, this name is used to generate the output directory.




input_directory
Directory with the fastq gz files forward and reverse. Each fastq file must have its pair in the same directory. Each pair is composed of a name and a suffix specified in the forward and reverse properties.




output_directory
Directory where the output of each configuration file will be created. Each execution creates a new directory with an unique name at start of its execution, resulting files are then created inside this directory.




input_extension
All files in the input directory ending with the input extension are listed and used for the execution of the commands.




reference_accession
Accession number of the genomes to be downloaded and analysed. This file is downloaded with the fasta extension using NCBI efetch utility.




accessory_accession
Accession number of the genome to be appended to the reference_accession fasta file.




entrez_database
Database from where the fasta file will be searched and downloaded




workers
Specifies the number of parallel jobs created of each command, each time a task finishes a new job is spawned with the next iteration.




forward
Suffix of the forward files in the input directory.




reverse
Suffix of the matching pair of the input fastq files.




iterator
Expandable bash expression that represents a list of files to iterate with the workflow. This expression can be a composite value. Other wildcards can be used for the filename expansion.


dynamic
This object contains information that is variable at run time, this enables it to iterate through the files present in the input directory.
prefix_regex
Regular expressions that define the pattern of the valid filename without extension nor suffix.




placeholder
Symbol used as a placeholder for the fastq file names before its evaluation at runtime.















Execution Modes
Each command can be executed in different modes depending on the number of iterations required.



Object
Description
Execution mode
Description




execute
Each key and value pair describe the execution mode of each of the commands within the configuration file. The objects that describe the tasks of each command must have the same name as the key in the execute object. All commands with its respective task must be written after the execute object.
single
The object will be evaluated and will be executed one single time. Dynamic values should not be used in this command since these will not be evaluated.




iterate-parallel
The object will be executed in a new process created by the subprocess library, the number of parallel processes is determined by the workers constant. Dynamic placeholders will be evaluated when the new process is spawned. Filenames will be replaced in no logical order.




iterate-sequential
The command object will be iterative but only one process is run at a time. Dynamic placeholders will be evaluated the same way as in iterate-parallel.




false
The task is disabled and will be ignored.



Command Types
Objects declared at the root level are checked for the <<command>> property, if this property is defined the program will queue its execution in the same order it's been read.
SAGA

Simple

These are simple commands designed to manipulate and download files and directories.



Command
Description




create_file
Creates an empty text file in the specified in the file parameter. Absolute path to the file is recommended. Required parameters: file: Symbolic link to the new file to be created.


check_directory
Verifies that the directory exists, if not it creates one with the specified name, recursive creation of directories is enabled. Required parameters: directory: Absolute path to be checked or created.


entrez_download
This command downloads the fasta files using its accession number in the NCBI database. HTTPS GET request is used for the download. Required parameters: url: https URL to the fasta file in the NCBI database. Use the constant accession variable. file: Symbolic link to the new file to be created.


merge
The merge command enables the program to concatenate two or more text files into a new file. A new line is added between each file listed. Required parameters: files: JSON list of the absolute or relative paths to the files to be merged. output_file: Path to the file to be created. If the file exists it will be overwritten.


replace
Replaces all occurrences of a text value with a new string. Required parameters: file: Path to the file where the text fragment will be replaced. old_data: Text to be replaced. new_data: The new text that will replace the old text fragment.







Complex commands are specified using a json array, dynamically generated items are evaluated and then executed sequentially. These commands are run using the subprocess library of python. If POSIX is being used, the path to the program must be the first parameter of the list.
It is possible to add extra parameters, these will be evaluated by the program to be executed. If the expansion of bash parameters is necessary, it is possible to use the "shell" property to specify whether it should be executed by the shell interpreter. These types of complex commands can be used to iterate over multiple files with similar names. To iterate these files, the placeholders defined in the "dynamic" object must be used, these placeholders will be replaced by the real values at runtime. In order to enable file iteration, it is necessary to select the "iterate-parallel" or "iterate-sequential" execution modes.
Example:
In this case the program samtools must be accessible from the directory where ASGARD is being run, this can be achieved by setting the environmental variables or specifying the full path to the executable.
The values in the command list can be composite, constant, or strings.
"sam_view": {
"extension": ".bam",
"file": "{{dynamic.output_file}}{{extension}}",
"output_pipeline": "{{file}}",
"command": ["samtools","view","-bS","-q","15","{{bwa_mem.file}}"]

},

Default Configuration files.
Two different configuration files are provided with the software one corresponding to ASGARD and the other one for SAGA. These configuration files implement the following pipeline.

TODO
ASGARD



Task
Command
Parameters
Description



|
SAGA



Task
Command
Parameters
Description



|

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

Customer Reviews

There are no reviews.