Last updated:
0 purchases
gtf to genes 1.40
***************************************Overview*************************************** We want an extremely fast, lightweight way to access gene data stored in GTF format. The parsed data is held in an intuitive Gene -> transcript -> transcript with exons being stored as intervals Our aim is to * cache data in binary format, which can be * re-read in < 10s for even the largest genomes Currently initial parsing Ensembl Homo sapiens release 56 takes around 4.5 minutes. The binary data can be reloaded in < 10s. This contains *all* of the data structure in the original GTF file Note that we sacrifice memory usage for speed. This is seldom a problem for modern computers and genome sizes (There are around ~400,000 exons but there are stored as intervals / int pairs)***************************************A Simple example*************************************** :: gene_structures = t_parse_gtf("Mus musculus") # # used cached data for speed # ignore_cache = False # # get all protein coding genes only # genes_by_type = gene_structures.get_genes(gtf_file, logger, ["protein_coding"], ignore_cache = ignore_cache) # # print out gene counts # t_parse_gtf.log_gene_types (logger, genes_by_type) return genes_by_type
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.