Last updated:
0 purchases
biosynonyms 0.0.1
Biosynonyms
A decentralized database of synonyms for biomedical entities and concepts. This
resource is meant to be complementary to ontologies, databases, and other
controlled vocabularies that provide synonyms. It's released under a permissive
license (CC0), so they can be easily adopted by/contributed back to upstream resources.
Here's how to get the data:
import biosynonyms
# Uses an internal data structure
positive_synonyms = biosynonyms.get_positive_synonyms()
negative_synonyms = biosynonyms.get_negative_synonyms()
# Get ready for use in NER with Gilda, only using positive synonyms
gilda_terms = biosynonyms.get_gilda_terms()
Synonyms
The data are also accessible directly through TSV such that anyone can consume them
from any programming language.
The positives.tsv has the following
columns:
text the synonym text itself
curie the compact uniform resource identifier (CURIE) for a biomedical
entity or concept, standardized using the Bioregistry
name the standard name for the concept
scope the match type, written as a CURIE from
the OBO in OWL (oio) controlled vocabulary,
i.e., one of:
oboInOwl:hasExactSynonym
oboInOwl:hasNarrowSynonym
oboInOwl:hasBroadSynonym
oboInOwl:hasRelatedSynonym
oboInOwl:hasSynonym (use this if the scope is unknown)
type the synonym property type, written as a CURIE from
the OBO Metadata Ontology (omo) controlled vocabulary,
e.g., one of:
OMO:0003000 (abbreviation)
OMO:0003001 (ambiguous synonym)
OMO:0003002 (dubious synonym)
OMO:0003003 (layperson synonym)
OMO:0003004 (plural form)
...
references a comma-delimited list of CURIEs corresponding to publications
that use the given synonym (ideally using highly actionable identifiers from
semantic spaces like pubmed,
pmc, doi)
contributor the ORCID identifier of the contributor
Here's an example of some rows in the synonyms table (with linkified CURIEs):
text
curie
scope
references
contributor
PI(3,4,5)P3
CHEBI:16618
oio:hasExactSynonym
pubmed:29623928, pubmed:20817957
0000-0003-4423-4370
phosphatidylinositol (3,4,5) P3
CHEBI:16618
oio:hasExactSynonym
pubmed:29695532
0000-0003-4423-4370
Incorrect Synonyms
The negatives.tsv has the following
columns for non-trivial examples of text strings that aren't synonyms. This
document doesn't address the same issues as context-based disambiguation, but
rather helps dscribe issues like incorrect sub-string matching:
text the non-synonym text itself
curie the compact uniform resource identifier (CURIE) for a biomedical
entity or concept that does not match the following text, standardized
using the Bioregistry
references same as for positives.tsv, illustrating documents where this
string appears
contributor the ORCID identifier of the contributor
Here's an example of some rows in the negative synonyms table (with linkified
CURIEs):
text
curie
references
contributor
PI(3,4,5)P3
hgnc:22979
pubmed:29623928, pubmed:20817957
0000-0003-4423-4370
Known Limitations
It's hard to know which exact matches between different vocabularies could be
used to deduplicate synonyms. Right now, this isn't covered but some partial
solutions already exist that could be adopted.
License
All data are available under CC0 license. All code is available under MIT
license.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.