pyclusterprofiler 0.1.dev14

Creator: bradpython12

Last updated:

Add to Cart

Description:

pyclusterprofiler 0.1.dev14

pyclusterprofiler


A limited python implementation of clusterProfiler from R, borrowing some functions and concepts from sharepathway and goatools.
Currently KEGG and GO interfaces are implemented.

Installation
You can install pyclusterprofiler via pip:
pip install pyclusterprofiler

Usage
import pyclusterprofiler

To find enriched KEGG pathways in groupings ("cluster" column) of genes ("gene_id" column) identified in df:
df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='KEGG')

Or using GO terms (instead using database="GO-slim" here will use reduced set of terms):
df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='GO')

Example filter for any pathways/annotations with significant enrichment:
significant_pathways = (df_enrichment
.query('(corrected_pvalue<0.05)&(cluster_pathway_genes>3)')
['pathway']
.unique()
)

Plot results as a dot plot:
ax = pyclusterprofiler.dotplot(df_enrichment.query('pathway in @significant_pathways'))

compare_clusters arguments



argument
description




df
dataframe with "gene_id" column containing NCBI gene id's and a column specifying group membership


grouping
column or list of columns in df to use for group membership


correction
method for correcting p-values for multiple hypothesis testing, used as argument to statsmodels.stats.multitest.multipletests (default "fdr_bh")


organism
organism databases to download. GO uses NCBI taxid; for KEGG see their organism list (default is human databases for each)


database
"KEGG", "GO", or "GO-slim" (default "KEGG")


exclude
pathway/annotation groupings to exclude. For KEGG, can be "human_diseases", "organismal_systems," or a list of both (see KEGG pathways). For GO, can be "molecular_function","biological_process", "cellular_component", or a list of one or more (can also use abbreviations "MF","BP","CC" respectively) (default None)


force
force fresh download of databases, otherwise uses previously downloaded files if found in the current working directory (default False)


verbose
If True, prints provided NCBI gene id's that could not be found in the database (default True)



Contributing
Contributions are very welcome.
License
Distributed under the terms of the MIT license,
"pyclusterprofiler" is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.