Clinical research mostly involves observing a population or individual patients for clinical events. But medicine is not mathematics and there are innumerable unknown factors at play. This is why we use statistical methods to find meaning from our observations.

As an example, let's consider we want to know whether a drug can cure a disease. If we observe the effect of the drug we will see that some patients who receive the drug will be cured, whereas some patients who receive the drug won't be cured. On the other hand, some people will be cured even without receiving the drug. In this case we need to use statistics to figure out whether using the drug has any *significant* benefit or not.

## Two approaches

There are two approaches to deal with the problem of finding the significance.

- Hypothesis testing (p-value)
- Quantification of effect (confidence interval)

Both have their advantages and disadvantages and we will discuss them in the end. Let's understand their meanings first.

## P-value

P-value comes from the testing of *null hypothesis*. Null hypothesis is the assumption that the apparent effect has been seen by chance and is not really there. The p-value is the probability of this null hypothesis being true. In fact the name *p* comes from probability.

### Example

Suppose we are comparing drug *A* and drug *B* in a particular disease. Our objective is to find any difference between the effect of the two drugs. Let's say 50 patients received drug *A* and 50 patients received drug *B*. Among the recipients of *A*, 26 people were cured and among the recipients of *B*, 20 people were cured. At first glance it would appear that drug *A* performed better, but this result may also appear by chance alone and we need to use statistical analysis to be sure.

In this case our null hypothesis will be, *there is no significant difference between drug A and B*. We will test this hypothesis by using chi-squared test in this particular example.

The chi-squared test gives the p-value 0.6, which means, there is a 60% chance that the null hypothesis is true.

### Accepting or rejecting the null hypothesis

When the p-value is large i.e. the chance of the null hypothesis being true is large, we cannot reject the null hypothesis and embrace the apparent effect. This begs the question how small the chance has to be in order to reject the null hypothesis.

The cut-off mark for the p-value is entirely arbitrary. In almost all medical literature a p-value of 0.05 is taken as the cut-off, which translates to a 5% chance of the null hypothesis being true.

What it means is that if a study reports a positive finding with a p-value 0.04, there is still a 4% chance that the findings are due to the chance and our conslusion is wrong.

## Confidence interval

Confidence interval also gives us the information about statistical significance, in a slightly different manner.

Instead of testing for a null hypothesis, we can find some way to quantify the effect, like risk ratio when comparing two drugs.

### Example

We will take the above example where 26 out of 50 people were cured with drug *A* and 20 out of 50 people were cured with drug *B*. This time instead of testing for null hypothesis we will measure the effect in terms of risk ratio.

Calculation of risk ratio of this observation gives the value of 1.2, which means that drug *A* appears to be 20% better than drug *B*. But the test also gives a result called *95% confidence interval* which is 0.76 to 1.92 and it means there is a 95% chance that this range includes the actual value of the whole population. In case of risk ratio, the value 1 means both drugs are equal and as the 95% confidence interval spans both side of this mark, the result is insignificant.

### Reproducibility

The concept of confidence interval may be explained in terms of reproducibility. If we performed the study on the entire population, whatever result we got would be the final answer. But we are doing the study on a sample from that population and our finding is bound to be different from the real value in the whole population. So, if we performed the same study multiple times with different a sample each time, we would get a different result every time. However, if our sampling methods are good enough all this findings will be quite close to one another.

This is the basis of the confidence interval. If we repeated the study 100 times, the result will fall in the range of 95% confidence interval 95 out of 100 times.

In other words we can be 95% certain that the true value of the population lies somewhere between the 95% confidence interval of our study result.

The value 95 is entirely arbitrary just like the case with the p-value, but it has been widely accepted as the standard.

### Statistical significance

When we measure the effect of an intervention, risk factor etc. we use some sort of measurements for it like odds ratio, risk ratio etc.

Measurement | Equivalent point |
---|---|

Odds ratio | 1 |

Risk ratio | 1 |

Risk difference | 0 |

A result is considered significant only if its 95% confidence interval *doesn't include* the equivalent point of the measurement used.

## Comparison

P-value | Confidence interval |
---|---|

Comes from null-hypothesis testing | Comes from measurements of some quantity |

Information about statistical significance only | Both statistical significance and quantification of effect |

Comparison between two p-values is meaningless e.g. an effect with a p-value of 0.2 cannot be said to be better than an effect with a p-value of 0.4. | The measurements underlying confidence intervals can be meaningfully compared in most cases e.g. a drug with a risk ratio of 1.4 vs placebo can be considered to be better than a drug with a risk ratio of 1.1 vs placebo. |

So it appears that confidence interval is better in comparison to placebo. However in some cases quantification of the effect may not be possible, making hypothesis testing and p-value estimation the only option.

## TL:DR

A finding is statistically significant if p-value < 0.05 or if 95% confidence interval doesn't include the point of equivalence.

If it's possible to calculate both p-value and confidence interval, it is better to go for the confidence interval as it is more informative.

Calculations mentioned in this article has been performed with *R*. The discussion on statistical significance also involves *type 1* and *type 2* errors, but we will discuss them in another article.

## Comments