Statistical Controversies in Reporting of Clinical Trials

statisticsIn one of the recent issues of the Journal of the American College of Cardiology, Professor Stuart Pocock and colleagues discussed controversies in reporting clinical trial results.  Prof. Pocock is a well-known biostatistician from the Department of Medical Statistics at London School of Hygiene and Tropical Medicine who is a widely recognized expert in the field. In the article, the authors list multiplicity of data, composite endpoints, covariate adjustment, subgroups analysis, assessing of individual results, analysis of intention to treat (ITT) population and interpreting surprises as these issues that often are a real pain in the neck for statisticians.

One of the most important statistical issues in reporting of clinical trials is multiplicity of data, i.e. “repeated looks at a data set in different ways, until something statistically significant emerges” [5]. In other words, it seems really hard to validly select data from the numerous variables collected at baseline and during the follow-up, which should be included in major trial publications; just to ensure, that such report is fair to what it includes.

According to authors, the best way to avoid this problem is to have a predefined statistical analysis plan (SAP), that is fully signed off before database locking and study unblinding. Actually, SAP is a regulatory requirement, which must not be overlooked. Furthermore, it is critical to pre-define the primary endpoint (with definition of the endpoint itself), put particular focus should be on the time of follow-up and the precise statistical method for determining its point estimate, confidence interval (CI), and p value.

It is also good practice to have a pre-defined and limited set of secondary endpoints for treatment efficacy. Their results are shown alongside those of the primary endpoint. When the primary endpoint findings are inconclusive, claims of efficacy for any secondary endpoints are more doubtful, like in the PROactive (Prospective pioglitazone clinical trial in macrovascular events) trial [2]. Dormandy et al. put the emphasis on the main secondary endpoint with HR of 0.84 (95% CI: 0.72 to 0.98; p = 0.027) and ignored the lack of statistical significance for the primary endpoint (the HR was 0.90 (95% CI: 0.80 to 1.02; p = 0.095). Such practice from the regulatory point of view, is more than controversial, but as prof. Pocock says: “regulators need to recognize the statistical uncertainties” and interpret the results taking them into considerations.

Composite endpoints are the result of combining two or more outcomes into a single primary endpoint. Anyway, such combination may generate a risk of oversimplifying the evidence by highlighting the composite, without proper assessment of the contribution from each outcome separately. For example in the SYNTAX (Synergy between Percutaneous Coronary Intervention [PCI] with Taxus and Cardiac Surgery) trial of bypass surgery (CABG) versus the TAXUS drug-eluting stent (DES) [3, 4] composite primary endpoint comprising i.a. stroke, and repeat revascularization. It turned out, that more events appeared after DES, what suggests, that DES is inferior to CABG. Anyway, the main difference was in repeat revascularization (majority repeat PCIs). Then, there was a significant excess of strokes after CABG, even though there was no overall difference in the composite endpoint.

The next challenge in reporting clinical trial data is making a decision in terms of whether key results should be adjusted for baseline covariates, and if yes, which ones. Actually, this inconsistency is automatically managed in randomised trials, as randomisation ensures good balance across treatments for baseline variables, and hence, covariate adjustment usually makes little difference. Anyway, to avoid controversies, it is good practice to set-up an appropriate covariate analysis in the SAP. To gain it, first of all, one should specify a limited number of covariates known (on basis of prior knowledge) to have a significant impact on patient prognosis. Then, prepare SAP which contains the clear covariate-adjusted model which is to be fitted. Furthermore, one should avoid post-hoc variable selection- such choices may be used for enhancing the effect of the treatment. Finally- the covariate adjustment can be considered as primary analysis, if the choice of covariate is a generally accepted convention for a specific endpoint.

In many trials, recruited patients do not form a homogenous group. Hence, it may be important to check, if the effects of the treatment apply to the entire study population or depend only on particular baseline characteristic, i.e. age. Despite this fact, usually researchers face problems whilst interpreting the results of subgroup analyses. Trials usually lack power to reliably detect subgroup effects. Moreover, there may be too many subgroups, which one should control. Every additional subgroup analysis may increase the problems with statistical significance as every additional analysis impacts the overall p-value. Without losing in details, in case of several subgroup analysis you only can assure a significance level less than 5% comparison-wise with keeping the overall (study specific) significance level massively under 5% and making subgroup claims (i.e. p value does not reach 5%). To handle the statistical insignificance it is better to use statistical tests of interactions that examine the extent to which the observed difference for example in HRs across subgroups may be attributed to change in explanatory factors.

Even though there are no efficacy and safety differences between subgroups, there may be important differences between individuals. Therefore, one needs to determine the individuals’ risks and benefits in order to check if studied treatment is efficient and safe in each case. A good way to get appropriate results is by using multivariable logistic models to separately predict any patient’s risks. It aims to estimate on the absolute scale, how the trade-off between treatments differences in some particular parameters is patient specific.

One can also have some doubts about how to deal with missing data and non-adherence during follow-up analysis. Actually, it is almost impossible to have full follow-up data for every patient, because some of them withdraw from the study or are lost for follow-up. The more patients that are lost for follow-up, the further analysis deviates from true ITT; therefore, loss for follow-up should be minimalized. The easiest way to manage this is by improving treatment compliance and the second- even if subject drops out the study, his follow-up should be continued.

In the majority of time-to-event analysis there is a variation in the actual number of observed patients during the follow-up. However, if the patient is withdrawn from the earlier stage of the study, experiencing the primary endpoint cannot be assumed to occur at random (i.e. patient is likely to have higher risk of primary event, which is unrecorded). Consequently a relatively high percentage of early drop-outs might bias the estimation of the end-points. Neglecting this fact could lead to a biased treatment comparison.

The last, but not the least is interpreting unexpected findings which might be related to endpoints, subgroups, or treatment effects. Sometimes it is about an effect inconsistent with the overall treatment effect or an exaggerate effect, that exceeds prior expectations. Small studies can be the subject of this type or bias with a higher probability.

As Prof. Pocock concludes: “Nevertheless, controversies will continue to arise”, and he hopes that his paper “has provided a statistical insight that will help trialists to present and readers to acquire a balanced perspective.”


  1. Statistical Controversies in Reporting of Clinical Trials
  2. Dormandy JA, Charbonnel B, Eckland DJA, et al., for the PROactive Investigators. Secondary prevention of macrovascular events in patients with type 2 diabetes in the ROactive Study (PROspective pioglitazone Clinical Trial In macroVascular Events): a randomised controlled trial. Lancet 2005; 366: 1279–89.
  3. Serruys PW, Morice MC, Kappetein AP, et al., for the SYNTAX Investigators. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. N Engl J Med 2009; 360:961–72.
  4. Mohr FW, Morice MC, Kappetein AP, et al. Coronary artery bypass graft surgery versus percutaneous coronary intervention in patients with three-vessel disease and left main coronary disease: 5-year follow-up of the randomised, clinical SYNTAX trial. Lancet 2013; 381:629–38.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s