Introduction
Legal rules and regulations keep healthcare data secure and prevent
violations of patient privacy.1 While necessary, these
precautions make it difficult to obtain permission to combine data from
multiple sources, increase the time required to conduct straightforward
analyses and make more complex analyses impossible.2-4As a result, data on population-level drug safety and effectiveness
generally come from a smattering of single-database studies with limited
precision and in-sample diversity applying differing analytic approaches
and statistical analyses.5
As healthcare data were digitized and information technology advanced,
an alternative approach was proposed: analyses using distributed data.
In 2008, the FDA launched the Sentinel Initiative to explore a system
where database custodians, called “partners,” maintained ownership of
their data as separate “nodes” of the network but transformed it into
a common data model to be analyzed in a consistent
way.6 A similar effort started in Canada with the
Canadian Network for Observational Drug Effect Studies
(CNODES),7, 8 formally funded in 2011, and the
Patient-Centered Outcomes Research Institute (PCORI) began to design its
own distributed network of partner organizations, PCORnet (the National
Patient Centered Clinical Research Network), in 2013.9All three networks focus on generating one “network-wide” effect
estimate in some fashion from the node-level data. Other distributed
networks include the Data Analysis and Real World Interrogation Network
(DARWIN-EU) project in Europe;10 a network that
leverages the infrastructure built by the Observational Health Data
Sciences and Informatics (OHDSI) community;11 the
Asian pharmacoepidemiology network (AsPEN);12the
Vaccine Safety Datalink (VSD);13 and a distributed
network created for the purposes of pregnancy research titled
ConcePTION.14
Much has already been written about the steps these and other networks
take to reduce confounding and information bias15-1718in analyses within the individual nodes; after all, internal validity
within nodes is necessary to generate unbiased estimates in
nonexperimental research.19 Concepts related to
external validity – such as effect measure modification, target
populations, generalizability, and transportability – have received
comparably less attention in methodologic work on distributed data.
Here, we describe the unique roles external validity and related
concepts play in analyses of distributed data networks, especially those
that seek to obtain a single “network-wide” effect estimate. We then
provide an overview of the structure of Sentinel, CNODES, and PCORnet
and describe how each network deals with these concepts.