PheMAP is a general, automatic, and portable approach to enable accurate high-throughput phenotyping within electronic health records (EHR). PheMAP quantifies relationships between phenotypes and relevant clinical concepts represented by standard medical terminologies. For each individual, PheMAP assigns a score and probability of having a particular phenotype from identified related concepts within EHRs.
We parsed phenotype descriptions from multiple publicly available resources (e.g.,MedlinePlus, MedicineNet, and Wikipedia) using natural language processing (NLP). We mapped the identified concepts to concept unique identifiers (CUIs) from the United Medical Language System (UMLS) and to codes of standard clinical terminologies(e.g., ICD-9-CM, ICD-10-CM, SNOMED CT, CPT, LOINC, and RxNorm). We then weighted each concept relative to a phenotype to reflect how important the concept is to the phenotype in a collection of all phenotype documents.
PheMAP is available for free and is ready to be implemented for 1400 unique phenotypes with EHRs in the OMOP Common Data model. The knowledge base is provided for download as well as a Python script for calculating phenotype scores and phenotype probabilities.
PheMap_Mapped_Terminologies_1.1.csv – The main knowledge base file containing weighted concepts mapped to standard medical terminologies, e.g.,ICDs, SNOMED CT, CPT, LOINC, and RxNorm.
PheMap_UMLS_Concepts_1.1.csv – The raw PheMAP knowledge base containing weighted concepts mapped to CUIs from UMLS.
ICD_to_Phecode_mapping.csv – Mapping of ICD9CM and ICD10CM to phecode (used in phemap_phenotyping.py).
Phecode_Relationship.csv – The hierarchical relationship mapping between phecodes (used in phemap_phenotyping.py).
README.txt – Description of data elements in the above files.
phemap_phenotyping.py – Python script that calculates PheMap phenotype score and probabilities for EHRs structured with OMOP Common Data Model. The script is meant to be run line-by-line.
PheMap v1.1(07/07/20)
PheMap v1.0 (05/14/20)