The Use of Computational Phenotypes within Electronic Healthcare Data to
Identify Transgender People in the United States: A Narrative Review
Abstract
Purpose: With the expansion of research utilizing electronic
healthcare data to identify transgender (TG) population health trends,
the validity of computational phenotype algorithms to identify TG
patients is not well understood. We aim to identify the current state of
the literature that has utilized CPs to identify TG people within
electronic healthcare data and their validity, potential gaps, and a
synthesis of future recommendations based on past studies.
Methods: Authors searched the National Library of Medicine’s
PubMed, Scopus, and the American Psychological Association Psyc Info’s
databases to identify studies published in the United States that
applied CPs to identify TG people within electronic health care data.
Results: Twelve studies were able to validate or enhance the
positive predictive value (PPV) of their CP through manual chart reviews
(n=5), hierarchy of code mechanisms (n=4), key text-strings (n=2), or
self-surveys (n=1). CPs with the highest PPV to identify TG patients
within their study population contained diagnosis codes and other
components such as key text-strings. However, if key text-strings were
not available, researchers have been able to find most TG patients
within their electronic healthcare databases through diagnosis codes
alone. Conclusion: CPs with the highest accuracy to identify TG
patients contained diagnosis codes along with components such as
procedural codes or key text-strings. CPs with high validity are
essential to identifying TG patients when self-reported gender identity
is not available. Still, self-reported gender identity information
should be collected within electronic healthcare data as it is the gold
standard method to better understand TG population health patterns.