Skip Navigation


Family Practice Advance Access originally published online on April 4, 2006
Family Practice 2006 23(4):407-413; doi:10.1093/fampra/cml012
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/4/407    most recent
cml012v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Thomsen, J. L
Right arrow Articles by Parner, E. T
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thomsen, J. L
Right arrow Articles by Parner, E. T
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Methods for analysing recurrent events in health care data. Examples from admissions in Ebeltoft Health Promotion Project

Janus L Thomsena and Erik T Parnerb

a Institute of Public Health, Department of General Practice, University of Aarhus Denmark
b Institute of Public Health, Department of Biostatistics, University of Aarhus Denmark

Correspondence to Janus Laust Thomsen, Department of General Practice, University of Aarhus, Vennelyst Boulevard 6, DK-8000 Aarhus C, Denmark; Email: janus.laust.thomsen{at}alm.au.dk

Received 9 October 2005; Accepted 8 March 2006.


    Abstract
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Background. Evaluation of health care contacts from first events alone often misses large amounts of potentially important data and may produce different results than evaluation of all data including recurrent events.

Objective. We aim to bring the different methodological approaches for analysing longitudinal health care data to the attention of researchers in primary care.

Methods. We used hospital admission data from the Ebeltoft Health Promotion Project, a randomized trial in primary care examining the effect of preventive health checks. Comparisons included three randomized groups: an intervention group receiving health checks, a group where intervention consisted of a health check followed by a health discussion with the GP and one control group.

Results. Both intervention groups had ~20% fewer hospital admissions than the control group over a 6 year period. If dependence among recurrent events is excluded, such a reduction amounts to a highly significant effect. Use of the standard Poisson distribution for analysing recurrent events and exclusion of their dependent structure causes data interpretation to be incorrect, because the model does not account for the extra variability between persons; the resulting 95% CIs would therefore be too small.

Conclusion. Analysis of health care contacts should embrace both first and recurrent events and it should use a model appropriate to these data. An individual rate model that includes a parameter of an unspecified individual event distribution frailty may be a natural choice when analysing longitudinal data of contacts to the health care system in broad terms.

Keywords. Poisson distribution, recurrence, statistics, epidemiological methods, health care utilization.


    Introduction
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Data comprising multiple events per person, like number of health care contacts over time compared between groups of individuals, are common in both clinical and epidemiological studies. A natural measure of the number of events is the ‘event rate’. One of the main methodological challenges is how to address the dependence of multiple events.

For example, it may empirically be expected that during a fixed time interval some persons will have no admissions, some will have one or two, while a small number may have many. Such persons obviously are not facing the same risk and admissions within individuals cannot be seen as independent events.

Although well established in the biostatistics literature1 and to some extent in the epidemiological literature,2 methods for overcoming these challenges are rarely encountered in the primary health care literature and only briefly mentioned in standard textbooks of epidemiology.3

This article discusses different methodological approaches for analysing longitudinal data using hospital admission data from the Ebeltoft Health Promotion Project.4 Comparisons included three randomized groups: an intervention group receiving health checks (Intervention 1), a group where intervention consisted of a health check followed by a health discussion with the GP (Intervention 2) and one control group. Both intervention groups had ~20% fewer hospital admissions than the control group over a 6 year period.4

A simple analysis of the rates in the three groups based on the Poisson distribution gave narrow confidence intervals (CIs) around the point estimates of ~20% fewer admissions in the intervention groups, indicating a clear effect of the interventions (P < 0.001). This paper explains why this Poisson analysis is inappropriate and it presents a method for analysing the recurrent events that captures the individual proneness for admissions by a so-called frailty variable.

Data from the Ebeltoft Health Promotion Project and the study design are described in detail in previous publications.5 Data were analysed using Stata Statistical Software: Release 7.0 (Stata Corporation 1999).


    First event or multiple events
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Analysis of rates of recurrent events are often handled with statistical approaches only appropriate for analysis of first event and it has therefore been suggested to replace analysis of repeated events with analysis of first events.6

Whether either approach is appropriate evidently depends on the scientific question posed. Rates of first events may be appropriate if such events affect study adherence or other behaviour and risk factors for disease, for example trials investigating diagnoses like specific cancers or stroke. Consideration of all events may be more relevant in other settings where events are more frequent and some patients studied may already have had previous events. In such situations restriction of analysis to first event would discard much relevant information and may be difficult in settings where patients have previous events.

The literature holds several examples of such reduction of data to first event.610 Glynn et al.8 thus illustrate the importance of investigating multiple events with data from a clinical trial of the effect of intervention in the form of regular intake of cranberry juice drink on bacteriuria in elderly women. After randomization, six urine samples were collected at roughly monthly intervals.10 The cranberry group had 18% fewer first events than the placebo group. This could be explained by coincidence in which case the study would have concluded that there was no statistically significant difference between the groups.8

However, analysis of all urine specimens collected throughout the study disclosed that 50% fewer of the urine samples in the cranberry group were positive compared with the placebo group. This difference in recurrent bacteriuria was highly statistically significant.8 The discrepancy between rates of first events and overall rates arose because women in the cranberry group were more likely in the long term to recover than women in the placebo group, and restriction of analyses to only first events would hence have obscured the effect of cranberry juice.

Analysis of time to first health care contact is not relevant when using health care utilization as a proxy for a population's morbidity because it omits much information about its total morbidity. Analysis of time to first event may also give misleading results in settings where intervention may cause admission rates to rise. Consideration of all admissions can accordingly change the estimates drastically.

The use of inappropriate statistical methods for analysis of the rate of recurrent events may be rooted in unfamiliarity with both the Poisson distribution and the extended Poisson models. This article therefore offers a review of the arguments for using the Poisson distribution for analysing longitudinal data.


    First event
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Models for the relative risk and incidence rate ratio for first event
In longitudinal data a risk of an event generally refers to the probability of that event occurring within a certain amount of time. If all subjects in a group are followed for the same period of time and the event only occurs one time per subject, the cumulative incidence proportion (CIP) is calculated as the number of subjects who experience the event divided by the total number of subjects in the group. The relative risk (RR) compares two independent CIPs. Statistical analysis of the CIP and the RR rests on a binomial data model, because an event either occurs or does not occur during the period under consideration. Table 1 shows the number of admissions and the time at risk with CIPs and RRs calculated by the binomial model risk for the three study groups in the Ebeltoft Health Promotion Study.


View this table:
[in this window]
[in a new window]
 
TABLE 1 The overall time at risk for hospital admissions was calculated from 21 September 1991 to 31 December 1997 and the value for total years at risk was adjusted for hospital admission time

 
Where all subjects cannot be followed over the same time-period, the standard procedure is to calculate the incidence rate which is defined as the number of events in a group divided by the sum of individual time at risk for subjects in the group, when the subjects are no longer at risk after the event (Fig. 1, upper panel). The incidence rate ratio (IRR) compares two incidence rates and was virtually identical to the RR for hospital admission (Table 2), because the individuals in the groups were followed for approximately the same amount of time.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
 
FIGURE 1 Upper panel shows example of data considering only first event. Dots mark an event and persons do not contribute to risk time after first event. Lower panel shows data with multiple events and each person may contribute with more admissions and more time at risk. Cross marks a person censured at the time of death

 

View this table:
[in this window]
[in a new window]
 
TABLE 2 Analysis of first event. The overall time at risk for hospital admissions was calculated from 21 September 1991 to 31 December 1997

 
The basic model usually associated with rates is the Poisson distribution, like the basic model for proportions is the binomial distribution and the basic model for the mean is the normal distribution. If we observe n individuals in a study with the probability p for one of two possible outcomes, the binomial model Bi(n,p) assumes that the number of individuals n is fixed and therefore does not depend on the sampling, i.e. a previous outcome in the same study. Furthermore, the probability of the event has to be the same for all individuals and the observations in the n individuals must be independent.

The argument for using the Poisson model as the basic model for rates in the longitudinal setting lies in its approximation to the binomial distribution known as the Law of Small Numbers.11 In its simple form the results show that if the sample size n is large and the probability of event p is small, then the binomial model and the Poisson model are almost identical. The expected number of events in the former, n x p, is therefore equal to the expected number in the latter. The expected number in the Poisson distribution is often written as T x {lambda}, where T is the sum of the observed time at risk of the individuals and {lambda} is the expected number of events per time unit observed. The Poisson approximation to the binomial distribution is good if the sample size is large.12 For small datasets as a rule of thumb, a good approximation is obtained when the sample size is above 20 and the probability of an event is below 5%. A very good approximation is obtained if the sample size is above 100 and the probability of an event is below 10%.12

Setting the observed number of events x equal to the expected number of events T x {lambda}, we obtain the well-known estimate of the rate (the unknown parameter) as the number of events divided by the sum of the observed time at risk, x/T. This estimate can also be derived by the more general method of maximum likelihood estimation. The likelihood function generally describes how the probability of the observed data (the likelihood) depends on the unknown parameter value. The maximum likelihood estimate is the parameter value that maximizes the likelihood of the observed data. The method of maximum likelihood is used both to construct estimates of the unknown parameters and to derive their statistical properties.

Rates are, however, used more generally than for analysing rare events. In the Poisson model we focus on the number of events which is set in relation to the cumulative time at risk. We may, however, also view the data as time to events as is done in survival analysis. In survival analysis the rate (hazard rate) refers to the risk of an event during a very brief period, assuming no event has occurred at the beginning of the period, and it can be interpreted as the risk of an event at a specific time. The hazard rate describes the rate in a very short interval of time as a function of time. The simplest scenario is when the hazard rate is constant. This model is termed the exponential model and it is closely connected to the Poisson model. One important similarity is that the likelihood functions in the two models are identical. This implies that the estimates of the rate derived in the two models and their statistical properties are identical. The two models are only approximately the same when the events are rare. When analysing rates for longitudinal event data, the exponential survival model is generally the underlying statistical model. Many extensions of rates to recurrent events can be formulated both in extension of the Poisson model and of the exponential survival model.

Standard Poisson regression
Simple comparisons of rates between two groups of subjects may not be valid because they may differ on important confounding variables. A common strategy in the presence of confounding is to present standardized rates, which are composed of weighted strata-specific rates, where the strata are formed by categories of the confounding variables. Regression approaches, however, will often be more efficient when several confounders have to be controlled for simultaneously. The results of the crude regression analysis of first events without controlling for confounding variables are shown in Table 2. Individuals with events before the study period are not excluded.

Stratification of rate ratios for hospital admissions at annual intervals in the Ebeltoft Health Promotion Study revealed an interesting trend with a significant fall 4 and 5 years after intervention launch.4 Rate ratios for hospital admissions in the intervention groups compared with the control group considering the entire follow-up period did not change when adjusting for time as a categorical variable (data not shown).

If the rate is not constant over time, we may include time as a categorical analytical variable. Time-dependent rates can also be handled within a Cox regression analysis. This will theoretically be approximately the same as dividing time into very small intervals in a Poisson regression analysis. Results for Cox regression of first event are presented in Table 2. Methods for Cox regression on recurrent events do exist,1,2 but for simplicity we focus on extensions of the Poisson model.


    Recurrent events
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
IRRs of recurrent events
When patients may have had previous events and more such events are considered, the analysis becomes fundamentally different from analysis of first events (Fig. 1, lower panel). When more events are studied for each individual, the models have to account for the extra variability between persons, as some persons will tend to have experienced many and other only few or no events.

In the Ebeltoft Health Promotion data, both intervention groups had ~20% fewer hospital admissions than the control group over a 6 year period. If dependence among recurrent events is excluded, such a reduction amounts to a highly significant effect: Rate ratios for intervention compared with control was 0.78 (95% CI: 0.69–0.89, P < 0.001) for group 1 and 0.81 (95% CI: 0.71–0.92, P < 0.001) for group 2. Use of the standard Poisson distribution for analysing recurrent events and exclusion of their dependent structure causes data interpretation to be incorrect, because the model does not account for the extra variability between persons; the 95% CIs and P-values would therefore usually become too small.7

Rates and rate ratios for recurrent events are shown in Table 3, where it is seen that there was no difference in rate ratio estimates between the first and recurrent events. This is, however, not always the case and, as mentioned above, the literature holds several examples of reduction of data to first events, which conceals information.610 The analysis in Table 3 is described in the following.


View this table:
[in this window]
[in a new window]
 
TABLE 3 Analysis of recurrent events. The overall time at risk for hospital admissions was calculated from 21 September 1991 to 31 December 1997 and adjusted for hospital admission time

 
Regression models with recurrent events
An important dimension of recurrent events is that the variability is greater than expected by the standard Poisson distribution, which is often referred to as over-dispersion. Figure 2 illustrates the different tendencies of events (frailties) to occur among individuals; the dots mark the individual rates, the dotted line the average rate in the group. The individual frailty variable Zi may be illustrated as the rate ratio between the individual rates compared with the average rate in the group.


Figure 2
View larger version (6K):
[in this window]
[in a new window]
 
FIGURE 2 Left panel illustrates variation in individual rates, described as rate ratios from the group average rate in the extended Poisson model. Right panel gives an example of the Gamma distribution, which is used to model the distribution of the frailty variable (mean one and variance {theta}) in the negative binomial distribution

 
The negative binomial distribution assumes that the frailty variable follows a Gamma distribution with mean one and variance {theta}.1 The parameter {theta} describes the heterogeneity between individuals. A large {theta} value indicates much heterogeneity between individuals and a small {theta} value less heterogeneity. If {theta} is very small, the negative binomial distribution is approximately equal to the Poisson distribution, i.e. the risk of admissions is approximately the same for all individuals. This model belongs to a group of models called extended Poisson models or frailty models and may be written as

Neg.Bi: Po({lambda}·T), where Z follows a gamma distribution.

The individual's event rate ({lambda}) is an average of the event rates in the group ({lambda}) multiplied by the relative size of the individual rate compared with the estimated average event rate in the group (Z). The variable T denotes the individuals time at risk.

Fitting the negative binomial distribution to hospital admission data (Table 3, first row) yields an apparently good visual agreement between observed data and expected values from the negative binomial distribution (Fig. 3), but the difference between the observed and the fitted values reached statistical significance (P < 0.001) due to the large number of data. However, frailty models like the negative binomial regression model seem to give a good description of our data. Other studies have also found that the negative binomial distribution provides a good description of both hospital admissions7 and consultations in general practice.13


Figure 3
View larger version (13K):
[in this window]
[in a new window]
 
FIGURE 3 Observed hospital admissions in the Ebeltoft Health Promotion Project and expected values from a negative binomial distribution

 
The negative binomial distribution model is appealing as it naturally accommodates the different probabilities for events across members of a population. Moreover, the distribution of the variation of individual rate ratios is depicted as a gamma distribution, which is mathematically convenient and a highly flexible distribution. However, inter-individual frailty variations could, indeed, follow other distributions. It may, however, require a large sample to distinguish between the negative binominal model and a Poisson model with a different frailty distribution.14

There may be other sources of additional variability than from the recurrent events among individuals. The individuals are clustered within GPs, and GPs may be another level of extra variability. In the present study we examined this by assigning a level to each practice and found no extra variability among practices. There was not data availably to allow stratification within GPs. In other settings correlation of events within practices or GPs may result in additional variation that needs to be accounted for in the analysis.

Other approaches to the analysis of recurrent events
Assumptions of the underlying distribution of the variation of the individual rate ratios may be avoided using a pseudo-likelihood method by Carroll and Ruppert,15 i.e.

Po({lambda}·T), where no assumption of Z is made.

This approach solves the challenges described earlier, i.e. it handles incomplete follow-up of some individuals and dependence between recurrent events using an individual frailty for which a special distribution is not required (Table 3, second row).

An alternative to the frailty models described above is to use the standard Poisson model with robust variance estimation. As previously discussed, the first event with its corresponding time at risk could be analysed using a standard Poisson regression model. So could the second event with its corresponding time at risk. Individuals who experience no (first) event would not contribute in the analysis of the second event. The analysis of first event, second event and so forth are, of course, correlated since they use data on the same individuals. This correlation can be taken into account using robust variance estimation, which estimates the correlation between observations on the same individuals and corrects for the correlation. It is, however, important to note that there is a selection of individuals with time; all individuals contribute in the first analysis, but only individuals experiencing at least one event contribute in the second analysis and so forth. Such selection may create bias,16 and some care should therefore be exercised when analysing recurrence with robust variance estimation: In the hospital admission data, the selection seemed to be balanced in the three randomized groups, and we obtained comparable parameter estimates (last row in Table 3).


    Conclusion
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Calculation of rate ratios under the assumption that data follow a specific mathematical distribution generally enhances the statistical precision of the estimate. It is, however, important that the theoretical model fits data adequately and affords straightforward and intuitive data interpretation.

An individual rate model that includes a parameter of an unspecified individual event distribution frailty may be a natural choice when analysing longitudinal data of contacts to the health care system in broad terms.

Evaluation of health care contacts from first events alone often misses large amounts of potentially important data and may produce different results than evaluation including all (recurrent) events. Analysis of health care contacts should therefore embrace both first and recurrent events and it should use a model appropriate to these data.


    Declaration
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
Funding: ETP is employed at the Institute of Public Health, Aarhus University; JLT is funded by a grant from the Danish National Research Foundation.

Ethical approval: permission to conduct register investigation in relation to the Ebeltoft Health Promotion Project was given by the Scientific Ethical Committee of Aarhus County (J. no. 1990/1966) and the Danish Data Protection Agency (J. no. 2001-41-0738).

Conflict of interest: none.


    Acknowledgments
 
We thank the Ebeltoft Health Promotion Project for providing data for the examples in this study.


    Notes
 
Thomsen JL and Parner ET. Methods for analysing recurrent events in health care data. Examples from admissions in Ebeltoft Health Promotion Project. Family Practice 2006; 23: 407–413.


    References
 Top
 Abstract
 Introduction
 First event or multiple...
 First event
 Recurrent events
 Conclusion
 Declaration
 References
 
1 Cook RJ and Lawless JF. (2002) Analysis of repeated events. Stat Methods Med Res 11:141–166.[Abstract/Free Full Text]

2 Sturmer T, Glynn RJ, Kliebsch U, Brenner H. (2000) Analytic strategies for recurrent events in epidemiologic studies: background and application to hospitalization risk in the elderly. J Clin Epidemiol 53:57–64.[CrossRef][ISI][Medline]

3 Rothman KJ and Greenland S. (1998) Modern Epidemiology 2nd edn. (Lippincott Williams & Wilkins, Philadelphia).

4 Thomsen JL, Karlsmose B, Parner ET, Thulstrup AM, Lauritzen T, Engberg M. (2006) Secondary health care contacts after preventive health screening—A randomized trial. Scand J Public Health (Epub ahead of print, DOI: 10.1080/14034940500307564).

5 Lauritzen T, Leboeuf-Yde C, Lunde IM, Nielsen KD. (1995) Ebeltoft project: baseline data from a five-year randomized, controlled, prospective health promotion study in a Danish population. Br J Gen Pract 45:542–547.[ISI][Medline]

6 Windeler J and Lange S. (1995) Events per person year—a dubious concept. BMJ 310:454–456.[Free Full Text]

7 Glynn RJ, Stukel TA, Sharp SM, Bubolz TA, Freeman JL, Fisher ES. (1993) Estimating the variance of standardized rates of recurrent events, with application to hospitalizations among the elderly in New England. Am J Epidemiol 137:776–786.[Abstract/Free Full Text]

8 Glynn RJ and Buring JE. (1996) Ways of measuring rates of recurrent events. BMJ 312:364–367.[Free Full Text]

9 Cumming RG, Kelsey JL, Nevitt MC. (1990) Methodologic issues in the study of frequent and recurrent health problems. Falls in the elderly. Ann Epidemiol 1:49–56.[Medline]

10 Avorn J, Monane M, Gurwitz JH, Glynn RJ, Choodnovskiy I, Lipsitz LA. (1994) Reduction of bacteriuria and pyuria after ingestion of cranberry juice. JAMA 271:751–754.[Abstract]

11 Hill G and Paine B. (2002) Horse kicks, anthrax and the Poisson model for deaths. Chronic Dis Can 23:77.[Medline]

12 Matsunawa T. (1982) Some strong {varepsilon}-equivalence of random variables. Ann Inst Statist Math 34:209–224.[Medline]

13 Kilpatrick SJ Jr. (1977) Consultation frequencies in general practice. Health Serv Res 12:284–298.[ISI][Medline]

14 Palmgren J. (1998) Poisson distribution. Encyclopaedia of Biostatistics 2nd Edn. John Wiley & Sons.[CrossRef]

15 Carroll RJ and Ruppert D. (1982) Robust estimation in heteroscedastic linear models. Ann Stat 10:429–441.

16 Olesen AV and Parner ET. (2006) Correcting for selection using frailty models. Stat Med (Epub ahead of print, DOI: 10.1002/sim.2298).


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/4/407    most recent
cml012v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Thomsen, J. L
Right arrow Articles by Parner, E. T
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thomsen, J. L
Right arrow Articles by Parner, E. T
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?