SMHS SurvivalAnalysis

From SOCR
Revision as of 13:51, 27 August 2014 by Dinov (talk | contribs) (Hazard Ratios)
Jump to: navigation, search

Scientific Methods for Health Sciences - Survival Analysis

Overview

Survival analysis is statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts). Typically, survival analysis accommodates data from randomized clinical trial or cohort study designs. In this section, we will present a general introduction to survival analysis including terminology and data structure, survival/hazard functions, parametric versus semi-parametric regression techniques and introduction to (non-parametric) Kaplan-Meier methods. Code examples are also included showing the applications survival analysis in practical studies.

Motivation

Studies in the field of public health often involve questions like “What is the proportion of a population which will survive past time t?” “What is the expected rate of death in study participants, or the population”? “How do particular circumstances increase or decrease the probability of survival?” To answer questions like this, we would need to define the term of lifetime and model the time to event data. In cases like this, death or failure is considered as an event in survival analysis of time duration to until one or more events happen.

Theory

Hazard Ratios

Hazard is the slope of the survival curve — a measure of how rapidly critical events occur (e.g., subjects are dying).

  • The hazard ratio compares two treatments. If the hazard ratio is 2.0, then the rate of deaths in one treatment group is twice the rate in the other group.
  • The hazard ratio is not computed at any one time point, but is computed from all the data in the survival curve.
  • Since there is only one hazard ratio reported, it can only be interpreted if you assume that the population hazard ratio is consistent over time, and that any differences are due to random sampling. This is called the assumption of proportional hazards.
  • If the hazard ratio is not consistent over time, the value reported for the hazard ratio may not be useful. If two survival curves cross, the hazard ratios are certainly not consistent (unless they cross at late time points, when there are few subjects still being followed so there is a lot of uncertainty in the true position of the survival curves).
  • The hazard ratio is not directly related to the ratio of median survival times. A hazard ratio of 2.0 does not mean that the median survival time is doubled (or halved). A hazard ratio of 2.0 means a patient in one treatment group who has not died (or progressed, or whatever end point is tracked) at a certain time point has twice the probability of having died (or progressed...) by the next time point compared to a patient in the other treatment group.
  • Hazard ratios, and their confidence intervals, may be computed using two methods, each reporting both the hazard ratio and its reciprocal. If people in group A die at twice the rate of people in group B (HR=2.0), then people in group B die at half the rate of people in group A (HR=0.5).

The LogRank and Mantel-Haenszel methods

Both, the LogRank and Mantel-Haenszel methods usually give nearly identical results, unless in situations when several subjects die at the same time or when the hazard ratio is far from 1.0.

The Mantel-Haenszel method reports hazard ratios that are further from 1.0 (so the reported hazard ratio is too large when the hazard ratio is greater than 1.0, and too small when the hazard ratio is less than 1.0):
(1) Compute the total variance, V;
(2) Compute $K = \frac{O1 - E1}{V}$, where $O1$ - is the total observed number of events in group1, $E1$ is the total expected number of events in group1. You'd get the same value of $K$ if you used the other group;
(3) The hazard ratio equals $e^K$;
(4) The 95% confidence interval of the hazard ratio is: ($e^{K - \frac{1.96}{\sqrt{V}}}$, $e^{K + \frac{1.96}{\sqrt{V}}}$).
The logrank method (referred to as O/E method) reports values that are closer to 1.0 than the true Hazard Ratio, especially when the hazard ratio is large or the sample size is large. When there are ties, both methods are less accurate. The logrank methods tend to report hazard ratios that are even closer to 1.0 (so the reported hazard ratio is too small when the hazard ratio is greater than 1.0, and too large when the hazard ratio is less than 1.0):
(1) As part of the Kaplan-Meier calculations, compute the number of observed events (deaths, usually) in each group ($O_a$ and $O_b$), and the number of expected events assuming a null hypothesis of no difference in survival ($E_a$ and $E_b$);
(2) The hazard ratio then is: $HR= \frac{\frac{O_a}{O_a}}{\frac{O_b}{E_b}}$;
(3) The standard error of the natural logarithm of the hazard ratio is: $\sqrt{\frac{1}{E_a} + \frac{1}{E_b}}$;
(4) The lower and upper limits of the 95% confidence interval of the hazard ratio are: $e^{\frac{O_a-E_a}{V} \pm 1.96 \sqrt{frac{1}{E_a} + \frac{1}{E_b}}$.

Survival analysis goals

  • Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients;
  • To compare time-to-event between two or more groups, such as treated vs. placebo Myocardial infarction (MI) patients in a randomized controlled trial;
  • To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?

Terminology

  • Time-to-event: The time from entry into a study until a subject has a particular outcome
  • Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before they die or have another outcome of interest. They are (censored) counted as alive or disease-free for the time they were enrolled in the study. If dropout is related to both outcome and treatment, dropouts may bias the results.
  • Two-variable outcome: time variable: $t_i$ = time of last disease-free observation or time at event; censoring variable: $c_i=1$ if had the event; $c_i =0$ no event by time $t_i$.
  • Right Censoring ($T>t$): Common examples: termination of the study; death due to a cause that is not the event of interest; loss to follow-up. We know that subject survived at least to time $t$.
  • Example: Suppose subject 1 is enrolled at day 55 and dies on day 76, subject 2 enrolls at day 87 and dies at day 102, subject 3 enrolls at day 75 but is lost (censored) by day 81, and subject 4 enrolls in day 99 and dies at day 111. The Figure below shows these data graphically. Note of varying start times. This figure is generated using the SOCR Y Interval Chart.
SMHS SurvivalAnalysis Fig1.png
The next Figure shows a plot of every subject's time since their baseline time collection (right censoring).
SMHS SurvivalAnalysis Fig2.png

Survival distributions

  • $T_i$, the event time for an individual, is a random variable having certain probability distribution;
  • Different models for survival data are distinguished by different choice of distribution for the variable $T_i$;
  • Parametric survival analysis is based on Waiting Time distributions (e.g., exponential probability distribution);
  • Assume that times-to-event for individuals in a dataset follow a continuous probability distribution (for which we may have an analytic mathematical representation or not). For all possible times $T_i$ after baseline, there is a certain probability that an individual will have an event at exactly time $T_i$. For example, human beings have a certain probability of dying at ages 1, 26, 86, 100, denoted by $P(T=1)$, $P(T=26)$, $P(T=86)$, $P(T=100)$, respectively, and these probabilities are obviously vastly different from one another;
  • Probability density function f(t): In the case of human longevity, $T_i$ is unlikely to follow a unimodal normal distribution, because the probability of death is not highest in the middle ages, but at the beginning and end of life (bimodal?). The probability of the failure time occurring at exactly time $t$ (out of the whole range of possible $t$’s) is: $f(t)=log_{∆t→0}\frac{P(t≤T<t+∆t)}{∆t}$.
  • Example: Suppose we have the following hypothetical data (Figure below). People have a high chance of dying in their 70’s and 80’s, but they have a smaller chance of dying in their 90’s and 100’s, because few people make it long enough to die at these ages.
SMHS SurvivalAnalysis Fig3.png


7) References http://mirlyn.lib.umich.edu/Record/004199238 http://mirlyn.lib.umich.edu/Record/004232056 http://mirlyn.lib.umich.edu/Record/004133572




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif