geo-alchemy 0.0.20

Creator: bradpython12

Last updated:

Add to Cart

Description:

geoalchemy 0.0.20

geo-alchemy
a Python library and command line tool to make GEO data into gold.

why geo-alchemy
installation
use as Python library

parse metadata from GEO

parse platform
parse sample
parse series


serialization and deserialization


use as command line software

using OCM
preprocessing



why geo-alchemy
GEO is like a gold mine that contains a huge many gold ore.
But processing these gold ore(GEO series) into gold(expression matrix, clinical data) is not very easy:

how to map microarray probe to gene?
how about multiple probes map to same gene?
hot to get clinical data?
...

geo-alchemy was born to deal with it.
installation
If you only want use as Python library:
pip install geo-alchemy

If you also want use as command line software:
pip install 'geo-alchemy[cmd]'

use as Python library
parse metadata from GEO
parse platform
from geo_alchemy import PlatformParser


parser = PlatformParser.from_accession('GPL570')
platform1 = parser.parse()


# or
platform2 = PlatformParser.from_accession('GPL570').parse()


print(platform1 == platform2)

# get platform annotation data
platform = PlatformParser.from_accession('GPL570', view='full').parse()
print(platform.internal_data)

parse sample
from geo_alchemy import SampleParser


parser = SampleParser.from_accession('GSM1885279')
sample1 = parser.parse()

# or
sample2 = SampleParser.from_accession('GSM1885279').parse()

print(sample1 == sample2)

parse series
from geo_alchemy import SeriesParser


parser = SeriesParser.from_accession('GSE73091')
series1 = parser.parse()

# or
series2 = SeriesParser.from_accession('GSE73091').parse()


print(series1 == series2)
print(series1.platforms)
print(series1.samples)
print(series1.organisms)

serialization and deserialization
For the convenience of saving, all objects in geo-alchemy can be converted to dict,
and this dict can be directly saved to a file in json form.
Moreover, geo-alchemy also provides methods to convert these dicts into objects.
from geo_alchemy import SeriesParser


series1 = SeriesParser.from_accession('GSE73091').parse()
data = series1.to_dict()
series2 = SeriesParser.parse_dict(data)


print(series1 == series2)

use as command line software
using OCM
OCM(object command mapping) is a Python framework mapping Python object to command line software.
It can capture intermediate results of command, you can enable OCM output like this:
geo-alchemy xxx --ocmir

probe reannotation
Prerequisites:

NCBI BLAST must be installed.
BLAST Index must be generated.

for more details, refer to this page.
geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13


-p GPL15303 probe reannotation for GPL15303
-s 9 the 9th column of platform annotation file is probe sequence
-d xxx blast indexes location

if your reference sequences are download from GENCODE, enable --gencode
can extract gene symbol from gene ID:
geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode

preprocessing(microarray series only)
download metadata using network:
geo-alchemy pp -s GSE174772 -p GPL570 -g 11


-s GSE174772 preprocessing for GSE174772
-p GPL570 preprocessing samples who use GPL570 of GSE174772
-g 11 NO.11 column of GPL570 annotation file is gene

this command generate 2 files under current directory:

clinical file GSE174772_clinical.txt
gene expression file GSE174772_expression.txt

use existed series metadata:
import json
from geo_alchemy import SeriesParser


series = SeriesParser.from_accession('GSE174772').parse()
data = series.to_dict()


with open('GSE174772.json', 'w') as fp:
json.dump(data, fp)

geo-alchemy pp -sf GSE174772.json -g 11

using existing probe gene mapping file.
usually you use geo-alchemy reanno do probe reannotation,
this make you get a probe gene mapping file, you can:
geo-alchemy reanno -p GPL6480 -s 17 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode
geo-alchemy pp -s GSE12435 -m GPL6480_reanno.txt

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.