Parametric survival model sas-

Survival analysis models factors that influence the time to an event. Ordinary least squares regression methods fall short because the time to event is typically not normally distributed, and the model cannot handle censoring, very common in survival data, without modification. Nonparametric methods provide simple and quick looks at the survival experience, and the Cox proportional hazards regression model remains the dominant analysis method. This seminar introduces procedures and outlines the coding needed in SAS to model survival data through both of these methods, as well as many techniques to evaluate and possibly improve the model. Particular emphasis is given to proc lifetest for nonparametric estimation, and proc phreg for Cox regression and model evaluation.

Parametric survival model sas

Parametric survival model sas

This is reinforced by the three significant tests of equality. Previously, we graphed the survival functions of males in females in the WHAS dataset and suspected that the survival experience after heart attack modek be different between the two genders. The graph for bmi at top right looks better behaved now Daddy babymakers smaller residuals at the lower end of bmi. It is called the proportional Parametric survival model sas model because the ratio of hazard rates between two groups with fixed covariates will stay constant over time in this model. Previously we suspected that the effect of bmi on the log hazard rate may not be purely linear, so surviival would be wise survial investigate further. The Parametric survival model sas covariates, including the additional graph for the quadratic effect for bmi all look reasonable. The hazard function is also generally higher for the two lowest BMI categories.

Large breasts in body paint. Primary Sidebar

Thus far in this seminar we have only dealt with covariates with values fixed across follow up time. After fitting a model it is good practice to assess the influence of observations in your data, to check if any outlier has a disproportionately large impact on the model. The assess statement with the ph option provides an easy method to assess the proportional Parametric survival model sas assumption both graphically and mmodel for many covariates at once. In practice, the choice of which Parametrkc distribution to use is done by comparing the model fit for a variety of different Parametric survival model sas. Covariates Annals of family med permitted to change value between intervals. In estimating the baseline hazard function, a Cox model uses the so-called Aalen-Breslow estimator, which is a generalization of the non-parametric Nelson-Aalen estimator of moxel cumulative hazard function. It is calculated by integrating the hazard function over an interval of time:. Schwarz GE. It should go without saying that the choice should be driven by the desired outcome or sa fit to the data, and never by which gives a significant P value for the predictor of interest. Ann Stat.

Data that measure lifetime or the length of time until the occurrence of an event are called lifetime, failure time, or survival data.

  • Survival analysis models factors that influence the time to an event.
  • Data that measure lifetime or the length of time until the occurrence of an event are called lifetime, failure time, or survival data.
  • To handle these outcomes, as well as censored observations where the event was not observed during follow-up, survival analysis methods should be used.

Data that measure lifetime or the length of time until the occurrence of an event are called lifetime, failure time, or survival data. For example, variables of interest might be the lifetime of diesel engines, the length of time a person stayed on a job, or the survival time for heart transplant patients. The purpose of survival analysis is to model the underlying distribution of the failure time variable and to assess the dependence of the failure time variable on the independent variables.

You can fit models that have a variety of configurations with respect to the baseline hazard function, including the piecewise constant model and the cubic spline model.

Standard errors of the estimates are obtained by inverting the observed information matrix that is derived from the full likelihood. The LIFEREG procedure fits parametric models to failure time data that can be uncensored, right censored, left censored, or interval censored. The models for the response variable consist of a linear effect composed of the covariates and a random disturbance term. The distribution of the random disturbance can be taken from a class of distributions that includes the extreme value, normal, logistic, and, by using a log transformation, the exponential, Weibull, lognormal, log-logistic, and three-parameter gamma distributions.

A common feature of lifetime or survival data is the presence of right-censored observations due either to withdrawal of experimental units or to termination of the experiment. For such observations, you know only that the lifetime exceeded a given value; the exact lifetime remains unknown. The analysis methodology must correctly use the censored observations in addition to the uncensored observations.

The LIFETEST procedure computes nonparametric estimates of the survivor function either by the product-limit method also called the Kaplan-Meier method or by the lifetable method also called the actuarial method.

Cox's semiparametric model is widely used in the analysis of survival data to explain the effect of explanatory variables on hazard rates. Cox's semiparametric model is widely used in the analysis of survival data to estimate hazard rates when adequate explanatory variables are available. The following are highlights of the ICPHREG procedure's features: tests linear hypotheses about the regression coefficients computes customized hazard ratios estimates and plots the survival function and the cumulative hazard function for a new set of covariates creates a SAS data set that contains the predicted values enables you to include an offset variable in the model enables you to weight the observations in the input data supports BY group processing, which enables you to obtain separate analyses on grouped observations creates a SAS data set that corresponds to any output table automatically creates graphs by using ODS Graphics For further details, see ICPHREG Procedure.

LIFETEST Procedure A common feature of lifetime or survival data is the presence of right-censored observations due either to withdrawal of experimental units or to termination of the experiment.

The mean time to event or loss to followup is The distribution of the random disturbance can be taken from a class of distributions that includes the extreme value, normal, logistic, and, by using a log transformation, the exponential, Weibull, lognormal, log-logistic, and three-parameter gamma distributions. Density functions are essentially histograms comprised of bins of vanishingly small widths. Figure 3. New York: Springer; The LIFETEST procedure computes nonparametric estimates of the survivor function either by the product-limit method also called the Kaplan-Meier method or by the lifetable method also called the actuarial method.

Parametric survival model sas

Parametric survival model sas

Parametric survival model sas

Parametric survival model sas

Parametric survival model sas

Parametric survival model sas. ICLIFETEST Procedure

Density functions are essentially histograms comprised of bins of vanishingly small widths. Thus, we define the cumulative distribution function as:. As an example, we can use the cdf to determine the probability of observing a survival time of up to days. The above relationship between the cdf and pdf also implies:. It appears the probability of surviving beyond days is a little less than 0. We can estimate the hazard function is SAS as well using proc lifetest :.

As we have seen before, the hazard appears to be greatest at the beginning of follow-up time and then rapidly declines and finally levels off. Thus, at the beginning of the study, we would expect around 0. Also useful to understand is the cumulative hazard function, which as the name implies, cumulates hazards over time.

It is calculated by integrating the hazard function over an interval of time:. It is not at all necessary that the hazard function stay constant for the above interpretation of the cumulative hazard function to hold, but for illustrative purposes it is easier to calculate the expected number of failures since integration is not needed.

As time progresses, the Survival function proceeds towards it minimum, while the cumulative hazard function proceeds to its maximum. In other words, we would expect to find a lot of failure times in a given time interval if 1 the hazard rate is high and 2 there are still a lot of subjects at-risk. We see a sharper rise in the cumulative hazard right at the beginning of analysis time, reflecting the larger hazard rate during this period.

This seminar covers both proc lifetest and proc phreg , and data can be structured in one of 2 ways for survival analysis. First, there may be one row of data per subject, with one outcome variable representing the time to event, one variable that codes for whether the event occurred or not censored , and explanatory variables of interest, each with fixed values across follow up time. Both proc lifetest and proc phreg will accept data structured this way. The WHAS data are stuctured this way.

Notice there is one row per subject, with one variable coding the time to event, lenfol:. Covariates are permitted to change value between intervals. Additionally, another variable counts the number of events occurring in each interval either 0 or 1 in Cox regression, same as the censoring variable. As an example, imagine subject 1 in the table above, who died at 2, days, was in a treatment group of interest for the first days after hospital admission.

This subject could be represented by 2 rows like so:. This structuring allows the modeling of time-varying covariates , or explanatory variables whose values change across follow-up time. Data that are structured in the first, single-row way can be modified to be structured like the second, multi-row way, but the reverse is typically not true.

We will model a time-varying covariate later in the seminar. Any serious endeavor into data analysis should begin with data exploration, in which the researcher becomes familiar with the distributions and typical values of each variable individually, as well as relationships between pairs or sets of variables.

Within SAS, proc univariate provides easy, quick looks into the distributions of each variable, whereas proc corr can be used to examine bivariate relationships. Because this seminar is focused on survival analysis, we provide code for each proc and example output from proc corr with only minimal explanation. The mean time to event or loss to followup is All of these variables vary quite a bit in these data. Survival analysis often begins with examination of the overall survival experience through non-parametric methods, such as Kaplan-Meier product-limit and life-table estimators of the survival function.

Non-parametric methods are appealing because no assumption of the shape of the survivor function nor of the hazard function need be made. However, nonparametric methods do not model the hazard rate directly nor do they estimate the magnitude of the effects of covariates. In the code below, we show how to obtain a table and graph of the Kaplan-Meier estimator of the survival function from proc lifetest :.

Above we see the table of Kaplan-Meier estimates of the survival function produced by proc lifetest. For example, the time interval represented by the first row is from 0 days to just before 1 day. It is important to note that the survival probabilities listed in the Survival column are unconditional , and are to be interpreted as the probability of surviving from the beginning of follow up time up to the number days in the LENFOL column.

Subjects that are censored after a given time point contribute to the survival function until they drop out of the study, but are not counted as a failure. We see that the uncoditional probability of surviving beyond days is.

Graphs of the Kaplan-Meier estimate of the survival function allow us to see how the survival function changes over time and are fortunately very easy to generate in SAS:. The step function form of the survival function is apparent in the graph of the Kaplan-Meier estimate. When a subject dies at a particular time point, the step function drops, whereas in between failure times the graph remains flat. Censored observations are represented by vertical ticks on the graph. Notice the survival probability does not change when we encounter a censored observation.

Because the observation with the longest follow-up is censored, the survival function will not reach 0. Instead, the survival function will remain at the survival probability estimated at the previous interval.

The survival function is undefined past this final interval at days. The Nelson-Aalen estimator is a non-parametric estimator of the cumulative hazard function and is given by:. The interpretation of this estimate is that we expect 0. This matches closely with the Kaplan Meier product-limit estimate of survival beyond 3 days of 0. In very large samples the Kaplan-Meier estimator and the transformed Nelson-Aalen Breslow estimator will converge.

We obtain estimates of these quartiles as well as estimates of the mean survival time by default from proc lifetest. This reinforces our suspicion that the hazard of failure is greater during the beginning of follow-up time. One can also use non-parametric methods to test for equality of the survival function among groups in the following manner:.

In the graph of the Kaplan-Meier estimator stratified by gender below, it appears that females generally have a worse survival experience. This is reinforced by the three significant tests of equality. In the output we find three Chi-square based tests of the equality of the survival function over strata, which support our suspicion that survival differs between genders.

In a nutshell, these statistics sum the weighted differences between the observed number of failures and the expected number of failures for each stratum at each timepoint, assuming the same survival function of each stratum. In other words, if all strata have the same survival function, then we expect the same proportion to die in each interval. Standard nonparametric techniques do not typically estimate the hazard function directly. However, we can still get an idea of the hazard rate using a graph of the kernel-smoothed estimate.

We generally expect the hazard rate to change smoothly if it changes over time, rather than jump around haphazardly. To accomplish this smoothing, the hazard function estimate at any time interval is a weighted average of differences within a window of time that includes many differences, known as the bandwidth.

However, widening will also mask changes in the hazard function as local changes in the hazard function are drowned out by the larger number of values that are being averaged together. Below is an example of obtaining a kernel-smoothed estimate of the hazard function across BMI strata with a bandwidth of days:. The lines in the graph are labeled by the midpoint bmi in each group.

The hazard function is also generally higher for the two lowest BMI categories. The sudden upticks at the end of follow-up time are not to be trusted, as they are likely due to the few number of subjects at risk at the end.

The hazard function for a particular time interval gives the probability that the subject will fail in that interval, given that the subject has not failed up to that point in time. In regression models for survival analysis, we attempt to estimate parameters which describe the relationship between our predictors and the hazard rate. A common way to address both issues is to parameterize the hazard function as:. For such studies, a semi-parametric model, in which we estimate regression parameters as covariate effects but ignore leave unspecified the dependence on time, is appropriate.

The exponential function is also equal to 1 when its argument is equal to 0. This parameterization forms the Cox proportional hazards model. It is called the proportional hazards model because the ratio of hazard rates between two groups with fixed covariates will stay constant over time in this model. Because of this parameterization, covariate effects are multiplicative rather than additive and are expressed as hazard ratios, rather than hazard differences.

Instead, we need only assume that whatever the baseline hazard function is, covariate effects multiplicatively shift the hazard function and these multiplicative shifts are constant over time. Cox models are typically fitted by maximum likelihood methods, which estimate the regression parameters that maximize the probability of observing the given set of survival times.

We request Cox regression through proc phreg in SAS. Previously, we graphed the survival functions of males in females in the WHAS dataset and suspected that the survival experience after heart attack may be different between the two genders.

Perhaps you also suspect that the hazard rate changes with age as well. Below we demonstrate a simple model in proc phreg , where we determine the effects of a categorical predictor, gender, and a continuous predictor, age on the hazard rate:. The above output is only a portion of what SAS produces each time you run proc phreg. In particular we would like to highlight the following tables:. Handily, proc phreg has pretty extensive graphing capabilities. In this model, this reference curve is for males at age Here are the typical set of steps to obtain survival plots by group:.

The survival curves for females is slightly higher than the curve for males, suggesting that the survival experience is possibly slightly better if significant for females, after controlling for age. The estimated hazard ratio of. In our previous model we examined the effects of gender and age on the hazard rate of dying after being hospitalized for heart attack. For example, we found that the gender effect seems to disappear after accounting for age, but we may suspect that the effect of age is different for each gender.

We could test for different age effects with an interaction term between gender and age. Based on past research, we also hypothesize that BMI is predictive of the hazard rate, and that its effect may be non-linear. Finally, we strongly suspect that heart rate is predictive of survival, so we include this effect in the model as well. In the code below we fit a Cox regression model where we allow examine the effects of gender, age, bmi, and heart rate on the hazard rate. Here, we would like to introdue two types of interaction:.

We would probably prefer this model to the simpler model with just gender and age as explanatory factors for a couple of reasons. First, each of the effects, including both interactions, are significant. Second, all three fit statistics, -2 LOG L , AIC and SBC , are each points lower in the larger model, suggesting the including the extra parameters improve the fit of the model substantially.

We should begin by analyzing our interactions. Recall that when we introduce interactions into our model, each individual term comprising that interaction such as GENDER and AGE is no longer a main effect, but is instead the simple effect of that variable with the interacting variable held at 0.

It appears that for males the log hazard rate increases with each year of age by 0. We cannot tell whether this age effect for females is significantly different from 0 just yet see below , but we do know that it is significantly different from the age effect for males.

Notice in the Analysis of Maximum Likelihood Estimates table above that the Hazard Ratio entries for terms involved in interactions are left empty. SAS omits them to remind you that the hazard ratios corresponding to these effects depend on other variables in the model. Below, we show how to use the hazardratio statement to request that SAS estimate 3 hazard ratios at specific levels of our covariates.

In each of the tables, we have the hazard ratio listed under Point Estimate and confidence intervals for the hazard ratio. Confidence intervals that do not include the value 1 imply that hazard ratio is significantly different from 1 and that the log hazard rate change is significanlty different from 0.

We previously saw that the gender effect was modest, and it appears that for ages 40 and up, which are the ages of patients in our dataset, the hazard rates do not differ by gender. Graphs are particularly useful for interpreting interactions. We can plot separate graphs for each combination of values of the covariates comprising the interactions. Below we plot survivor curves across several ages for each gender through the follwing steps:. Thus far in this seminar we have only dealt with covariates with values fixed across follow up time.

With such data, each subject can be represented by one row of data, as each covariate only requires only value. Standard errors of the estimates are obtained by inverting the observed information matrix that is derived from the full likelihood. The LIFEREG procedure fits parametric models to failure time data that can be uncensored, right censored, left censored, or interval censored. The models for the response variable consist of a linear effect composed of the covariates and a random disturbance term.

The distribution of the random disturbance can be taken from a class of distributions that includes the extreme value, normal, logistic, and, by using a log transformation, the exponential, Weibull, lognormal, log-logistic, and three-parameter gamma distributions.

A common feature of lifetime or survival data is the presence of right-censored observations due either to withdrawal of experimental units or to termination of the experiment. For such observations, you know only that the lifetime exceeded a given value; the exact lifetime remains unknown. The analysis methodology must correctly use the censored observations in addition to the uncensored observations. The LIFETEST procedure computes nonparametric estimates of the survivor function either by the product-limit method also called the Kaplan-Meier method or by the lifetable method also called the actuarial method.

Survival analysis models factors that influence the time to an event. Ordinary least squares regression methods fall short because the time to event is typically not normally distributed, and the model cannot handle censoring, very common in survival data, without modification.

Nonparametric methods provide simple and quick looks at the survival experience, and the Cox proportional hazards regression model remains the dominant analysis method. This seminar introduces procedures and outlines the coding needed in SAS to model survival data through both of these methods, as well as many techniques to evaluate and possibly improve the model.

Particular emphasis is given to proc lifetest for nonparametric estimation, and proc phreg for Cox regression and model evaluation. Note: A number of sub-sections are titled Background. These provide some statistical background for survival analysis for the interested reader and for the author of the seminar! Provided the reader has some background in survival analysis, these sections are not necessary to understand how to run survival analysis in SAS.

These may be either removed or expanded in the future. Note: The terms event and failure are used interchangeably in this seminar, as are time to event and failure time. Click here to download the dataset used in this seminar. This study examined several factors, such as age, gender and BMI, that may influence survival time after heart attack.

Follow up time for all participants begins at the time of hospital admission after heart attack and ends with death or loss to follow up censoring. The variables used in the present seminar are:. The data in the WHAS are subject to right-censoring only.

That is, for some subjects we do not know when they died after heart attack, but we do know at least how many days they survived. Understanding the mechanics behind survival analysis is aided by facility with the distributions used, which can be derived from the probability density function and cumulative density functions of survival times. Integrating the pdf over a range of survival times gives the probability of observing a survival time within that interval. Here we see the estimated pdf of survival times in the whas set, from which all censored observations were removed to aid presentation and explanation.

In the graph above we see the correspondence between pdfs and histograms. Density functions are essentially histograms comprised of bins of vanishingly small widths.

Thus, we define the cumulative distribution function as:. As an example, we can use the cdf to determine the probability of observing a survival time of up to days. The above relationship between the cdf and pdf also implies:. It appears the probability of surviving beyond days is a little less than 0. We can estimate the hazard function is SAS as well using proc lifetest :. As we have seen before, the hazard appears to be greatest at the beginning of follow-up time and then rapidly declines and finally levels off.

Thus, at the beginning of the study, we would expect around 0. Also useful to understand is the cumulative hazard function, which as the name implies, cumulates hazards over time. It is calculated by integrating the hazard function over an interval of time:.

It is not at all necessary that the hazard function stay constant for the above interpretation of the cumulative hazard function to hold, but for illustrative purposes it is easier to calculate the expected number of failures since integration is not needed. As time progresses, the Survival function proceeds towards it minimum, while the cumulative hazard function proceeds to its maximum. In other words, we would expect to find a lot of failure times in a given time interval if 1 the hazard rate is high and 2 there are still a lot of subjects at-risk.

We see a sharper rise in the cumulative hazard right at the beginning of analysis time, reflecting the larger hazard rate during this period. This seminar covers both proc lifetest and proc phreg , and data can be structured in one of 2 ways for survival analysis.

First, there may be one row of data per subject, with one outcome variable representing the time to event, one variable that codes for whether the event occurred or not censored , and explanatory variables of interest, each with fixed values across follow up time. Both proc lifetest and proc phreg will accept data structured this way. The WHAS data are stuctured this way. Notice there is one row per subject, with one variable coding the time to event, lenfol:.

Covariates are permitted to change value between intervals. Additionally, another variable counts the number of events occurring in each interval either 0 or 1 in Cox regression, same as the censoring variable. As an example, imagine subject 1 in the table above, who died at 2, days, was in a treatment group of interest for the first days after hospital admission.

This subject could be represented by 2 rows like so:. This structuring allows the modeling of time-varying covariates , or explanatory variables whose values change across follow-up time. Data that are structured in the first, single-row way can be modified to be structured like the second, multi-row way, but the reverse is typically not true.

We will model a time-varying covariate later in the seminar. Any serious endeavor into data analysis should begin with data exploration, in which the researcher becomes familiar with the distributions and typical values of each variable individually, as well as relationships between pairs or sets of variables. Within SAS, proc univariate provides easy, quick looks into the distributions of each variable, whereas proc corr can be used to examine bivariate relationships.

Because this seminar is focused on survival analysis, we provide code for each proc and example output from proc corr with only minimal explanation. The mean time to event or loss to followup is All of these variables vary quite a bit in these data. Survival analysis often begins with examination of the overall survival experience through non-parametric methods, such as Kaplan-Meier product-limit and life-table estimators of the survival function.

Non-parametric methods are appealing because no assumption of the shape of the survivor function nor of the hazard function need be made. However, nonparametric methods do not model the hazard rate directly nor do they estimate the magnitude of the effects of covariates. In the code below, we show how to obtain a table and graph of the Kaplan-Meier estimator of the survival function from proc lifetest :. Above we see the table of Kaplan-Meier estimates of the survival function produced by proc lifetest.

For example, the time interval represented by the first row is from 0 days to just before 1 day. It is important to note that the survival probabilities listed in the Survival column are unconditional , and are to be interpreted as the probability of surviving from the beginning of follow up time up to the number days in the LENFOL column.

Subjects that are censored after a given time point contribute to the survival function until they drop out of the study, but are not counted as a failure. We see that the uncoditional probability of surviving beyond days is. Graphs of the Kaplan-Meier estimate of the survival function allow us to see how the survival function changes over time and are fortunately very easy to generate in SAS:. The step function form of the survival function is apparent in the graph of the Kaplan-Meier estimate.

When a subject dies at a particular time point, the step function drops, whereas in between failure times the graph remains flat. Censored observations are represented by vertical ticks on the graph. Notice the survival probability does not change when we encounter a censored observation. Because the observation with the longest follow-up is censored, the survival function will not reach 0. Instead, the survival function will remain at the survival probability estimated at the previous interval.

The survival function is undefined past this final interval at days. The Nelson-Aalen estimator is a non-parametric estimator of the cumulative hazard function and is given by:. The interpretation of this estimate is that we expect 0.

This matches closely with the Kaplan Meier product-limit estimate of survival beyond 3 days of 0. In very large samples the Kaplan-Meier estimator and the transformed Nelson-Aalen Breslow estimator will converge. We obtain estimates of these quartiles as well as estimates of the mean survival time by default from proc lifetest.

This reinforces our suspicion that the hazard of failure is greater during the beginning of follow-up time. One can also use non-parametric methods to test for equality of the survival function among groups in the following manner:. In the graph of the Kaplan-Meier estimator stratified by gender below, it appears that females generally have a worse survival experience. This is reinforced by the three significant tests of equality. In the output we find three Chi-square based tests of the equality of the survival function over strata, which support our suspicion that survival differs between genders.

In a nutshell, these statistics sum the weighted differences between the observed number of failures and the expected number of failures for each stratum at each timepoint, assuming the same survival function of each stratum. In other words, if all strata have the same survival function, then we expect the same proportion to die in each interval. Standard nonparametric techniques do not typically estimate the hazard function directly.

However, we can still get an idea of the hazard rate using a graph of the kernel-smoothed estimate. We generally expect the hazard rate to change smoothly if it changes over time, rather than jump around haphazardly.

To accomplish this smoothing, the hazard function estimate at any time interval is a weighted average of differences within a window of time that includes many differences, known as the bandwidth. However, widening will also mask changes in the hazard function as local changes in the hazard function are drowned out by the larger number of values that are being averaged together.

Below is an example of obtaining a kernel-smoothed estimate of the hazard function across BMI strata with a bandwidth of days:. The lines in the graph are labeled by the midpoint bmi in each group. The hazard function is also generally higher for the two lowest BMI categories. The sudden upticks at the end of follow-up time are not to be trusted, as they are likely due to the few number of subjects at risk at the end.

The hazard function for a particular time interval gives the probability that the subject will fail in that interval, given that the subject has not failed up to that point in time. In regression models for survival analysis, we attempt to estimate parameters which describe the relationship between our predictors and the hazard rate.

A common way to address both issues is to parameterize the hazard function as:. For such studies, a semi-parametric model, in which we estimate regression parameters as covariate effects but ignore leave unspecified the dependence on time, is appropriate.

The exponential function is also equal to 1 when its argument is equal to 0. This parameterization forms the Cox proportional hazards model. It is called the proportional hazards model because the ratio of hazard rates between two groups with fixed covariates will stay constant over time in this model.

Because of this parameterization, covariate effects are multiplicative rather than additive and are expressed as hazard ratios, rather than hazard differences. Instead, we need only assume that whatever the baseline hazard function is, covariate effects multiplicatively shift the hazard function and these multiplicative shifts are constant over time. Cox models are typically fitted by maximum likelihood methods, which estimate the regression parameters that maximize the probability of observing the given set of survival times.

Parametric survival model sas