Discussion
Limitations of reporting subgroup analysis in RCT have been widely reported on the literature. Inflated false positives due to multiple testing, high false negatives due to inadequate statistical power and inappropriate a priori specification are well-known limitations of subgroup analysis2,7-8,22-24. A prespecified subgroup analysis is one that is planned and documented before any examination of the data. They are more reliable than those no prespecified because their hypotheses are based on biological rationale or data obtained on previous studies. In this review only half of trials conducted prespecified subgroup analysis. When analysis of a large number of subgroups are made, even if a hypothesis has been clearly specified, their results should be considered cautiously, since the strength of inference associated with the apparent confirmation of any single hypothesis will decrease if it is one of a large number that have been tested25. In this systematic review, multiple subgroup analyses were performed, around three quarters of trials reported at least 6 subgroups. Statistical analysis of interaction establishes the difference in benefit between subgroups by calculating interaction probability (p), which suggests that chance is an unlikely explanation for apparent differences, therefore the interaction test is the appropriate method to analyse subgroups. In this review only a few trials (18.37%) used an interaction test to assess heterogeneity of the treatment effect.
Due to important methodological problems bias, subgroup interpretation can lead to erroneous conclusions, producing wrongful clinical decision making. Several tools have been developed to assess the credibility of the effects of subgroups reported in clinical trials12-17. In our study we have based ourselves on the “10 criteria to assess credibility of subgroup claims” by Sun et al 201217. The credibility of subgroup claims in phase III haematology RCT was low. Of the 44 claims of a subgroup effect for the primary outcome identified, 26 were strong claims and only 24% (n = 6) of these claims were able to satisfy at least half of the credibility criteria and none satisfied all criteria. Multiple significant interactions were the only criteria satisfied by more than 50% of the claims. All 24 assessed studies failed to prespecify the correct direction of the subgroup hypotheses, and the hypothesis was prespecified for only 11 (25%) claims.
Sun et al 201217 considered three out of their 10 criteria as critical: the use of subgroup variables measured at baseline, prespecification of subgroup hypothesis and statistical significance of interaction test. In our study the first of these criteria was met for most of trials (86.36%), however the other two criteria were only met by 25.2% and 40.91% respectively. As stated before, interaction test is the appropriate method to analyse subgroups, but only a 40% of strong claims of this review were made base on this test. This finding indicates that most authors are unaware of how to interpret a subgroup analysis correctly and make statements based on intragroup comparisons, instead of intergroup comparisons. The latter determines evidence of differences in the results for different subgroups, this comparison is made by the interaction test. The lack of compliance of previously cited criteria in the claims of the haematology RCTS demonstrates their limited credibility.
Similar results have been reported in other studies areas. Zhang et al 201526, reported low credibility of subgroup claims in phase III RCT solid tumours using The CONSORT statements to evaluate subgroup claims27. They found as most common problems for reporting subgroup analysis the great number of subgroups reported, although frequently not prespecified and the underused of interaction test. Sun et al. 201217 reported low credibility of subgroup claims in pharmacological RCT published in 2007. Most of these trials failed to prespecify the hypotheses or present significant interaction tests. Two recent reviews investigated subgroup analysis quality in low back pain management trials28-29 and reported the failure to specify the subgroup hypotheses a prior as a common problem in trials, which is also consistent with our findings. Vidic et al 201610 reviewed phase III cardiovascular RCTs with subgroup analysis, concluding that subgroup analysis were reported with several shortcomings, including lack of prespecification and testing of a large number of subgroups without the use of the statistically appropriate test for interaction. All these studies reported the failure to specify the subgroup hypotheses, many subgroup analyses conducted and underuse of interaction test as common problems in trials, which is consistent with our findings.
By contrast in other studies the number of claims of subgroup effect in this review was low. Zhang et al 201526, Sun et al 201217, Saragiotto et al29 and Vidic et al 201610 reported that a 54.26%, 40.10%, 57.57%, 53.84% of trials assessed made claims of subgroup effect, respectively. The number of subgroup claims identify in haematological trials was half of those reported in other areas.
This study had several strengths: It is the first systematic review of the credibility of subgroup analysis reported on haematological malignancies RCTs. A rigorous systematic review method was employed, and standardized criteria were used for assessing credibility of subgroup claims17.
This study had several limitations: This study is based on authors’ reported trial information in published articles, which may be vulnerable to selective reporting or underreporting. Our study was limited to phase III RCT, although Sun et al 201217criteria could be applied to all phase clinical trials. The low number of subgroup claims identified is also a limitation of this study.