Last updated:
0 purchases
bitepi 0.1.9
bitepi
bitepi is a python wrapper around the BitEpi project found at https://github.com/aehrc/BitEpi. It provides a pandas
interface for identification of epistasis interactions.
It exposes a single Epistasis class, through which the analysis can be
performed by calling compute_epistasis.
Input is two arrays, which can be lists, numpy arrays or pandas dataframes.
sample_array contains mappings of sample names to case (1) or control (0).
Note that the header is ignored for numpy arrays and pandas dataframes, and
should not be present in python lists.
sample
case/control
S1
0
S2
1
S3
1
S4
0
S5
0
genotype_array contains the genotypes of each sample at each SNP, with a 0, 1
and 2 representing 0|0, 0|1 and 1|1 respectively. Headers are used to match
samples to the sample_array, but the first column's header is ignored.
SNP
S1
S2
S3
S4
S5
snpA
0
0
2
1
0
snpB
2
1
2
1
2
snpC
0
1
1
2
1
snpD
0
2
2
2
1
snpE
1
1
1
0
1
The sets of samples do not need to match exactly, unless Epistasis is
called with strict_intersect=True. If the sample sets do not match, analysis is done on the intersect.
The output will be a dictionary, with metric codes e.g. "IG.1" as the keys and
pandas dataframes as the values.
import bitepi
sample_array = [
['S1', 0],
['S2', 1],
['S3', 1],
['S4', 0],
['S5', 0],
]
genotype_array = [
['SNP', 'S1', 'S2', 'S3', 'S4', 'S5'],
['snpA', 0, 0, 2, 1, 2],
['snpB', 2, 1, 2, 1, 2],
['snpC', 0, 0, 1, 1, 1],
['snpD', 2, 2, 1, 2, 1],
['snpE', 0, 1, 1, 1, 2],
]
epistasis = bitepi.Epistasis(
genotype_array=genotype_array,
sample_array=sample_array,
)
interactions = epistasis.compute_epistasis(
sort=True,
best_ig=True,
)['best_ig']
print(interactions)
This should return:
SNP SNP_P PAIR_P TRIPLET_P QUADLET_P SNP_IG PAIR_IG TRIPLET_IG QUADLET_IG PAIR TRIPLET_1 TRIPLET_2 QUADLET_1 QUADLET_2 QUADLET_3
0 snpA 1.707692 2.000000 2.109091 0.0 1.187692 0.266667 0.109091 0.0 snpE snpB snpE snpA snpA snpA
1 snpB 1.642424 1.909091 2.109091 0.0 1.122424 0.266667 0.200000 0.0 snpC snpC snpE snpA snpA snpA
2 snpC 1.641026 1.909091 2.109091 0.0 1.121026 0.266667 0.200000 0.0 snpB snpD snpE snpA snpA snpA
3 snpD 1.642424 1.909091 2.109091 0.0 1.122424 0.175758 0.200000 0.0 snpE snpC snpE snpA snpA snpA
4 snpE 1.733333 2.000000 2.109091 0.0 1.213333 0.266667 0.200000 0.0 snpA snpC snpD snpA snpA snpA
For higher order interactions (p3, ig3, p4 and ig4) Epistasis may take several
minutes to run, depending on the number of SNPs. If more information is
required when running, the logging level can be increased to logging.INFO
or logging.DEBUG. logging.DEBUG will provide the greatest detail, including
logging from within the binary.
import logging
logging.root.setLevel(logging.DEBUG)
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.