Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lam, C. L.
Right arrow Articles by Lam, D. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lam, C. L.
Right arrow Articles by Lam, D. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Family Practice Vol. 16, No. 2, 184-189
© Oxford University Press 1999

How does a change in the administration method affect the reliability of the COOP/WONCA Charts?

Cindy LK Lam, Ian J Laudera and Daniel TP Lam

General Practice Unit, 3rd Floor, Ap Lei Chau Clinic, 161 Main Street, Ap Lei Chau, Hong Kong and
a Department of Statistics, University of Hong Kong, Pokfulam, Hong Kong.

Lam CLK, Lauder IJ and Lam DTP. How does a change in the administration method affect the reliability of the COOP/WONCA Charts? Family Practice 1999; 16: 184–189.

Received 15 March 1998; Revised 27 August 1998; Accepted 19 November 1998.


    Abstract
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
Background. An interviewer is often needed to administer the COOP/WONCA Charts to Chinese patients, and this may affect the reliability of results.

Objectives. We aimed to find out the reliability of the COOP/WONCA Charts administered by an interviewer, and whether a change in the interviewer or administration method would affect the results.

Methods. We carried out a cross-sectional test–retest study on 487 Chinese adult patients attending a family medicine clinic in Hong Kong. The COOP/WONCA Charts were administered by the same interviewer, two different interviewers or self-completion and interviewer administration, on test and retest. The random, inter-observer and inter-method variances were compared with the inter-subject variance. The reliability coefficient of each COOP/WONCA Chart was calculated for each method of administration.

Results. Random errors could change the scores by 0.57–1.04, inter-observer variations could change the scores of four charts by 0.72–0.80, and a change in the method could change the physical fitness score by 1.79 and the daily activities score by 1.31, on a five-point scale. The reliability coefficients of the six COOP/WONCA Charts were 0.68–0.92 for one interviewer, 0.59–0.82 for two interviewers and 0.46–0.81 for two methods.

Conclusion. The Chinese COOP/WONCA Charts were reliable in detecting real differences when administered by an interviewer. A change in the method of administration significantly decreased the reliability of the results. The use of more than one method of data collection in the same survey should be discouraged.

Keywords. Chinese, COOP/WONCA Charts, functional health, reliability..


    Introduction
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
The Dartmouth COOP Functional Health Assessment Charts/WONCA (COOP/WONCA Charts) are a popular instrument for the measurement of functional status in primary care.1–3 They were first developed by Nelson et al. and later modified by the Classification Committee of the World Organization of National Colleges, Academies and Academic Associations of General Practitioners/Family Physicians (WONCA).1–3 There are six charts, one each on physical fitness, feelings, daily activities, social activities, change in health and overall health. Each chart is rated on a five-point scale with higher scores indicating worse functional status. The COOP/WONCA Charts have been translated and validated for many cultures, including the Chinese.3,4 They are commonly used for comparing health status between patient groups, monitoring changes in functional status over time and measuring the outcomes of interventions.

The COOP/WONCA Charts can be self-administered or administered by an interviewer. Scholten and Van Weel proposed self-completion to be the method of choice to avoid observer (interviewer) bias.2 However, this method is not feasible for people who are illiterate. Thirteen per cent of the general population and 43% of those aged 55 years or over in Hong Kong are illiterate; the rates are even higher in mainland China.5,6 The charts often need to be administered by an interviewer when they are applied to these Chinese populations, and this raises a concern for the reliability of results. Nelson et al. showed that the original Dartmouth COOP Charts had good 1-hour test–retest reliability when administered by one or more interviewers to American patients,1 but this has not been tested on the revised COOP/WONCA Charts, and the technical equivalence of self-completion and interviewer administration has never been assessed.

The aim of our study was to find out whether the COOP/WONCA Charts were reliable when administered to Chinese subjects by an interviewer. We also wanted to find out how a change in the interviewer or interviewing method would affect the scores.

Ideally, the same result should be obtained on repeat assessments of the same individual in the same situation irrespective of the observer or measurement method. Unfortunately, variations in measurements are inevitable due to random and replicative errors, even if the measurements have been taken by the same observer using the same method.7,8 The subjective nature of health status assessment makes it more liable to variations because people's perceptions may change with time and the environment. Different interviewers may lead to different responses because their attitudes, communication skills and personal preferences may influence a subject's perception. The interpretation of the questions and response choices could be different when they are self-administered or administered by an interviewer, leading to different results.

An observed difference or change over time could be the result of measurement variation.7,8 This has great implication when health assessment is used as an evaluative or outcome measure. We need to know the magnitude of the measurement errors before we can decide whether an observed difference is significant or not. An assessment instrument is reliable if any difference detected is predominantly due to a true difference between subjects or a real change over time. It is useless if measurement errors are greater than true differences.


    Subjects and methods
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
The study was carried out in a family medicine clinic that had two full-time and two part-time doctors serving a population of 5000 Chinese people in Hong Kong. Data collection was carried out in three phases; all adult patients (aged 18 years or over) attending the clinic during the specified survey periods were invited to take part and each patient could be included in only one phase of the study. Table 1Go shows the characteristics of the patient samples in the three phases of the study. We used a test–retest study design in that each subject answered the Chinese version of the COOP/WONCA Charts4 before and after his/her doctor consultation.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Subject characteristics
 
The first phase (two-interviewer) was designed for the assessment of the inter-observer variance (Vo). Eighty-four patients were randomly assigned to be interviewed by the same (n = 40) or two different (n = 44) interviewers on test and retest. Vo was estimated from the paired test–retest score variance of the two-interviewer group after controlling for the variance of the same-interviewer group.

The second phase surveyed 195 patients who said that they could read and write. They completed the charts first by self-completion and then the charts were administered by an interviewer (two-method sample). A change from self-completion to interviewer-administration involved a change in the observer as well as a change in the method. The inter-method variance (Vm) was estimated from the paired test–retest score variance of the two-method sample after controlling for the two-interviewer variance found in the first phase of the study.

The third phase surveyed 208 patients with the COOP/WONCA Charts administered by the same interviewer in both test and retest (one-interviewer sample). The data were used to assess the intra-observer random replicative variance (Vr) and the inter-subject variance (V). Vr was calculated from the differences between the paired test–retest scores, and V was obtained by excluding Vr from the total variance.

The standard technique of analysis of variance (ANOVA) was used to determine the variance components by equating the computed mean squares with their expected values from ANOVA theory.9 The standard F test for variance ratios was used to compare the different variance components at the 5% level of significance. Since variance is the square of standard deviation, the 95% CI of the score change was estimated to be ± 2 times the square-root of the variance.

We calculated the reliability coefficients of each COOP/WONCA Chart by dividing the true (inter-subject) variance by the total variance for one interviewer, two interviewers and two methods, respectively.7,8 The reliability coefficient is a measure of the reliability of the instrument in detecting true differences. The most widely accepted standard is 0.7 or more for group comparison,10 although Helmstadter has proposed a lower standard of 0.5.8

The Wilcoxon matched-pairs signed-ranks test of the SPSS for Windows program was used to test if there was any significant bias in the retest scores.


    Results
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
Table 2Go shows the inter-subject variance (V), intra-observer random replicative variance (Vr), inter-observer variance (Vo) and inter-method variance (Vm), and their corresponding 95% CI of score changes, for the six COOP/WONCA Charts. All the COOP/WONCA Charts were scored on a five-point scale. Random replicative errors could cause changes in the chart scores of 0.57–1.04. A change in the observer could cause additional changes of 0.72–0.80 in the scores of the physical fitness, daily activities, social activities and overall health charts. The random and observer variations together could change the scores up to 1.81 (daily activities chart) when there was a change in the interviewer. A change in the method of administration could further change the physical fitness score by 1.79 and the daily activities score by 1.31. The total measurement variations could change the physical fitness and daily activities scores by more than three when the administration method was changed.


View this table:
[in this window]
[in a new window]
 
TABLE 2 Variance and 95% CIa of score changes of the COOP/WONCA Charts; variance (95% CI of score changes)
 
Table 3Go shows the reliability coefficients of the COOP/WONCA Charts for the same interviewer, two interviewers and two methods, respectively. Five charts had coefficients greater than 0.7 and only one (change in health) chart was marginally below the standard when the charts were administered by the same interviewer. The reliability coefficients of three charts were below 0.7, but all were above 0.5 when they were administered by two interviewers. When two methods were used, the reliability coefficients of only two charts were above 0.7, three were between 0.5 and 0.7, and that of the daily activities chart was less than 0.5.


View this table:
[in this window]
[in a new window]
 
TABLE 3 Reliability coefficients of the COOP/WONCA Charts by the number of observers/methods
 
Table 4Go shows the paired differences in the test–retest scores of the COOP/WONCA Charts when they were administered by the same interviewer, two different interviewers or two different methods. The test–retest concordance (no change in score) rates were all above 75%, with few score changes of more than one when the COOP/WONCA Charts were administered by the same interviewer. There was a tendency for the retest scores to be better than the test scores for the feelings and daily activities charts when they were administered by the same interviewer. The two-interviewer concordance rates of most charts were lower (59–86%) than those achieved by the same interviewer, but there was no significant bias in the retest scores. The concordance rates between the scores of self-completion and interviewer administration were only moderate (44–78%), and there was a bias towards better retest (interviewer administration) scores on the physical fitness chart.


View this table:
[in this window]
[in a new window]
 
TABLE 4 Paired differences in the test–retest scores of the COOP/WONCA Charts
 

    Discussion
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
We used convenience samples of patients in a family medicine clinic because they were easily accessible and they represented the target population of the COOP/ WONCA Charts. As our samples included males and females from different age groups and educational backgrounds, we believe that our results could be generalized to other Chinese adult patients in primary care.

The differences in the mean age, educational level and sex ratio among the three samples were as expected: females and older people were less likely to be included in the two-method sample because more of them were illiterate. Any bias from the age and educational differences should have favoured results in the two-method sample who were younger and better educated, but this was not the case. Therefore, it was unlikely that these demographic differences had affected the reliability of the COOP/WONCA Charts.

We initially fixed the test–retest time-interval at 1 hour to be consistent with the study by Nelson et al.,1 but many subjects were unwilling to wait for an hour. We then allowed a flexible time interval between test and retest, but the two had to be separated by the doctor consultation. The relatively short time interval between test and retest could have inflated the reliability of the COOP/WONCA Charts, but the interviewers did not find patients remembering their answers. This was supported by the fact that the concordance rates of the two-method sample were the lowest for most of the charts although the mean test–retest time interval was the shortest.

Random replicative errors caused changes of no more than one in the COOP/WONCA Chart scores. A difference in the scores of one or more was likely to be a real difference if the COOP/WONCA Charts were administered by the same interviewer. We found that the reliability coefficients of some of the COOP/WONCA Charts decreased with a change in the interviewer or administration method. When there was a change in the interviewer, a difference in the score of one could be the result of measurement variation, although score changes of two or more were likely to be real. This implies that the health status of a patient could be monitored more reliably if there were personal continuity of care. On the other hand, one has to be aware of the tendency for patients to give more positive responses to some questions on repeated assessments by the same interviewer.

There was no significant bias in the retest scores when the charts were administered by two different interviewers. This means that measurement variation would not cause any net change in the mean COOP/WONCA scores of a group of people. The charts would be more reliable in detecting group differences than changes in an individual patient.

Our reliability coefficients were in general lower than those found in the US by Nelson et al.1 The reliability coefficients of the charts varied from 0.73 to 0.98 for the same interviewer and they were 0.50–0.98 for two interviewers. The reliability of the instrument might have been affected by the cultural tendency of the less-educated Chinese to give socially approved answers, as shown in an earlier survey with the Minnesota Multiphasic Personality Inventory 2 (MMPI-2) L scale.11,12 We cannot assume that a health measure that has been shown to be reliable in one culture will be so in another. The reliability of an instrument must be confirmed on the target population before it is applied cross-culturally.

We found that the physical fitness and daily activities scores could differ by up to three when they were obtained by two different methods. Our interviewers noticed that some subjects misinterpreted the physical fitness and daily activities charts as an assessment on what they actually did rather than what they could do. The meaning could be clarified when the charts were administered by an interviewer but not when they were self-completed. This might be the reason why the scores obtained by self-completion were worse than those obtained by interviewer administration.

It is disturbing to find that self-completion and interviewing could give markedly different results. This is particularly relevant to family practice in that we often use the two methods together to collect patient information in clinical practice and research. Evidence on the technical equivalence of these two methods is scarce and conflicting. Some studies showed that there was little difference, but others found that interviewer administration was more reliable.13 Our study also showed that a change in the method of administration affected some results but not others, probably because some questions were more prone to misinterpretation. Self-completion is more liable to give missing, inconsistent or inaccurate data, but an interviewer may be a barrier to honest responses. One method may be more suitable than the other for certain types of information. The effect of the method of data collection on the quality of information deserves more attention and research.


    Conclusions
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
The COOP/WONCA Charts were reliable in detecting true differences between Chinese subjects when they were administered by the same interviewer. The reliability decreased but it was still within acceptable standards when the charts were administered by different interviewers. The reliability of three of the charts was quite low when they were administered by both self-completion and an interviewer. Misinterpretation of the questions could be a problem in self-completion of the charts. Interviewer administration will be the method of choice when the COOP/WONCA Charts are applied to the Chinese, until we have more data confirming the reliability of self-completion.

We recommend the use of a single interviewer in the administration of the COOP/WONCA Charts to the Chinese wherever possible. When more than one interviewer is used, one must be aware that the inter-observer errors and differences in scores of less than two need to be interpreted with caution. As self-completion and interviewer administration could give very different results for the same individual, the two methods should not be used together in the same survey and it may not be appropriate to compare data collected by different methods.

We found that a change in the method of administration caused significant changes in the COOP/ WONCA scores despite the simplicity of the instrument. The method of administration may have an even greater effect on the results of longer and more complex health surveys. The reliability of any instrument and method of administration need to be confirmed on the target population before they are applied to clinical practice or research, otherwise the results could be misleading. This is particularly important when cross-cultural adaptation is necessary.


    Acknowledgments
 
This study was funded by a research grant from the Committee on Research and Conference Grants, University of Hong Kong. We would like to thank Ms Cyrina Chan and Ada Au for helping us with the data collection.


    References
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 Conclusions
 References
 
1 Nelson E, Wasson J, Kirk J et al. Assessment of function in routine clinical practice: description of the COOP chart method and preliminary findings. J Chron Dis 1987; 40 (Suppl 1): 55S–63S.

2 Scholten JHG, Van Weel C. Functional status assessment. In Family Practice. Lelystad: Meditekst, 1992.

3 Van Weel C, Konig-Zahn C, Touw-Otten FWMM, Van Duijn NP, Meyboom-de Jong B. Measuring Functional Health Status With The COOP/WONCA Charts. NCH series No. 7. Groningen: Northern Centre of Health Care Research (NCH), 1995.

4 Lam CLK, Van Weel C, Lauder IJ. Can the Dartmouth COOP/ WONCA Charts be used to assess the functional status of Chinese patients? Fam Pract 1994; 11: 85–94.[Abstract/Free Full Text]

5 Census and Statistics Department, Hong Kong. Hong Kong Social and Economic Trends 1982–1992. Hong Kong: Government Printer, 1993.

6 Asian Development Bank. Key Indicators of Developing Asian and Pacific Countries. Vol. 24. Manila: The Bank, 1993.

7 Kerlinger FN. Foundations Of Behavioral Research. 3rd edn. Orlando: Holt, Rinehart & Winston, 1986: Ch. 26.

8 Helmstadter GC. Principles Of Psychological Measurements. New York: Appleton Century Crofts, 1964: Ch. 3.

9 Mongomery DC. The Design and Analysis of Experiments. 3rd edn. New York: Wiley, 1991.

10 Nunnally JC. Psychometric Theory. 3rd edn. New York: McGraw Hill, 1994.

11 Butcher JN. Introduction to the MMPI-2. In: Butcher JN (ed.). MMPI-2 in Psychological Treatment. New York: Oxford University Press, 1990: 5–20.

12 Lam CLK, Chan MS, Poon V. Health survey tools—what work and what don't? (abstract). In Irish College of General Practitioners, People and Their Family Doctors—Partners in Care. Book of Abstracts of the 15th WONCA World Conference, 1998, Dublin, Ireland. Oxford: Alden Press, 1998: 247.

13 Cella DF, Lloyd SR, Wright BD. Cross-cultural instrument equating: current research and future directions. In Spilker B (ed.). Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd edn. Philadelphia: Lippincott-Raven, 1996: 707–715.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lam, C. L.
Right arrow Articles by Lam, D. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lam, C. L.
Right arrow Articles by Lam, D. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?