Discussion
Limitations of reporting subgroup analysis in RCT have been widely
reported on the literature. Inflated false positives due to multiple
testing, high false negatives due to inadequate statistical power and
inappropriate a priori specification are well-known limitations of
subgroup analysis2,7-8,22-24. A prespecified subgroup
analysis is one that is planned and documented before any examination of
the data. They are more reliable than those no prespecified because
their hypotheses are based on biological rationale or data obtained on
previous studies. In this review only half of trials conducted
prespecified subgroup analysis. When analysis of a large number of
subgroups are made, even if a hypothesis has been clearly specified,
their results should be considered cautiously, since the strength of
inference associated with the apparent confirmation of any single
hypothesis will decrease if it is one of a large number that have been
tested25. In this systematic review, multiple subgroup
analyses were performed, around three quarters of trials reported at
least 6 subgroups. Statistical analysis of interaction establishes the
difference in benefit between subgroups by calculating interaction
probability (p), which suggests that chance is an unlikely explanation
for apparent differences, therefore the interaction test is the
appropriate method to analyse subgroups. In this review only a few
trials (18.37%) used an interaction test to assess heterogeneity of the
treatment effect.
Due to important methodological problems bias, subgroup interpretation
can lead to erroneous conclusions, producing wrongful clinical decision
making. Several tools have been developed to assess the credibility of
the effects of subgroups reported in clinical
trials12-17. In our study we have based ourselves on
the “10 criteria to assess credibility of subgroup claims” by Sun et
al 201217. The credibility of subgroup claims in phase
III haematology RCT was low. Of the 44 claims of a subgroup effect for
the primary outcome identified, 26 were strong claims and only 24% (n =
6) of these claims were able to satisfy at least half of the credibility
criteria and none satisfied all criteria. Multiple significant
interactions were the only criteria satisfied by more than 50% of the
claims. All 24 assessed studies failed to prespecify the correct
direction of the subgroup hypotheses, and the hypothesis was
prespecified for only 11 (25%) claims.
Sun et al 201217 considered three out of their 10
criteria as critical: the use of subgroup variables measured at
baseline, prespecification of subgroup hypothesis and statistical
significance of interaction test. In our study the first of these
criteria was met for most of trials (86.36%), however the other two
criteria were only met by 25.2% and 40.91% respectively. As stated
before, interaction test is the appropriate method to analyse subgroups,
but only a 40% of strong claims of this review were made base on this
test. This finding indicates that most authors are unaware of how to
interpret a subgroup analysis correctly and make statements based on
intragroup comparisons, instead of intergroup comparisons. The latter
determines evidence of differences in the results for different
subgroups, this comparison is made by the interaction test. The lack of
compliance of previously cited criteria in the claims of the haematology
RCTS demonstrates their limited credibility.
Similar results have been
reported in other studies areas. Zhang et al 201526,
reported low credibility of subgroup claims in phase III RCT solid
tumours using The CONSORT statements to evaluate subgroup
claims27. They found as most common problems for
reporting subgroup analysis the great number of subgroups reported,
although frequently not prespecified and the underused of interaction
test. Sun et al. 201217 reported low credibility of
subgroup claims in pharmacological RCT published in 2007. Most of these
trials failed to prespecify the hypotheses or present significant
interaction tests. Two recent reviews investigated subgroup analysis
quality in low back pain management trials28-29 and
reported the failure to specify the subgroup hypotheses a prior as a
common problem in trials, which is also consistent with our findings.
Vidic et al 201610 reviewed phase III cardiovascular
RCTs with subgroup analysis, concluding that subgroup analysis were
reported with several shortcomings, including lack of prespecification
and testing of a large number of subgroups without the use of the
statistically appropriate test for interaction. All these studies
reported the failure to specify the subgroup hypotheses, many subgroup
analyses conducted and underuse of interaction test as common problems
in trials, which is consistent with our findings.
By contrast in other studies the number of claims of subgroup effect in
this review was low. Zhang et al 201526, Sun et al
201217, Saragiotto et al29 and Vidic
et al 201610 reported that a 54.26%, 40.10%,
57.57%, 53.84% of trials assessed made claims of subgroup effect,
respectively. The number of subgroup claims identify in haematological
trials was half of those reported in other areas.
This study had several strengths: It is the first systematic review of
the credibility of subgroup analysis reported on haematological
malignancies RCTs. A rigorous systematic review method was employed, and
standardized criteria were used for assessing credibility of subgroup
claims17.
This study had several limitations: This study is based on authors’
reported trial information in published articles, which may be
vulnerable to selective reporting or underreporting. Our study was
limited to phase III RCT, although Sun et al 201217criteria could be applied to all phase clinical trials. The low number
of subgroup claims identified is also a limitation of this study.