When is screening the population for a disease appropriate




















What was the probability that the screening test would correctly indicate disease in this subset? The probability is simply the percentage of diseased people who had a positive screening test, i. I could interpret this by saying, "The probability of the screening test correctly identifying diseased subjects was Specificity focuses on the accuracy of the screening test in correctly classifying truly non-diseased people.

It is the probability that non-diseased subjects will be classified as normal by the screening test. Table - Illustration of the Specificity of a Screening Test. I could interpret this by saying, "The probability of the screening test correctly identifying non-diseased subjects was Question: In the above example, what was the prevalence of disease among the 64, people in the study population?

Compute the answer on your own before looking at the answer. One problem is that a decision must be made about what test value will be used to distinguish normal versus abnormal results. Unfortunately, when we compare the distributions of screening measurements in subjects with and without disease, we find that there is almost always some overlap, as shown in the figure to the right.

Deciding the criterion for "normal " versus abnormal can be difficult. There may be a very low range of test results e.

However, where the distributions overlap, there is a "gray zone" in which there is much less certainly about the results. If we move the cut-off to the left, we can increase the sensitivity, but the specificity will be worse. If we move the cut-off to the right, the specificity will improve, but the sensitivity will be worse. Altering the criterion for a positive test "abnormality" will always influence both the sensitivity and specificity of the test.

ROC curves provide a means of defining the criterion of positivity that maximizes test accuracy when the test values in diseased and non-diseased subjects overlap. As the previous figure demonstrates, one could select several different criteria of positivity and compute the sensitivity and specificity that would result from each cut point.

In the example above, suppose I computed the sensitivity and specificity that would result if I used cut points of 2, 4, or 6. If I were to do this for the example above, by table would look something like this:.

I could then plot the true positive rate the sensitivity as a function of the false positive rate 1-specificity , and the plot would look like the figure below. Note that the true positive and false positive rates obtained with the three different cut points criteria are are shown by the three blue points representing true positive and false positive rates using the three different criteria of positivity.

This is a receiver-operator characteristic curve that assesses test accuracy by looking at how true positive and false positive rates change when different criteria of positivity are used.

If the diseased people had test values that were always greater than the test values in non-diseased people, i. The closer the ROC curve hugs the left axis and the top border, the more accurate the test, i. The diagonal blue line illustrates the ROC curve for a useless test for which the true positive rate and the false positive rate are equal regardless of the criterion of positivity that is used - in other words the distribution of test values for disease and non-diseased people overlap entirely.

So, the closer the ROC curve is to the blue star, the better it is, and the closer it is to the diagonally blue line, the worse it is. This provides a standard way of assessing test accuracy, but perhaps another approach might be to consider the seriousness of the consequences of a false negative test.

For example, failing to identify diabetes right away from a dip stick test of urine would not necessarily have any serious consequences in the long run, but failing to identify a condition that was more rapidly fatal or had serious disabling consequences would be much worse.

Consequently, a common sense approach might be to select a criterion that maximizes sensitivity and accept the if the higher false positive rate that goes with that if the condition is very serious and would benefit the patient if diagnosed early.

Here is a link to a journal article describing a study looking at sensitivity and specificity of PSA testing for prostate cancer. BMC Family Practice , ]. In the video below Dr.

David Felson from the Boston University School of Medicine discusses sensitivity and specificity of screening tests and diagnostic tests. When evaluating the feasibility or the success of a screening program, one should also consider the positive and negative predictive values. These are also computed from the same 2 x 2 contingency table, but the perspective is entirely different. One way to avoid confusing this with sensitivity and specificity is to imagine that you are a patient and you have just received the results of your screening test or imagine you are the physician telling a patient about their screening test results.

If the test was positive, the patient will want to know the probability that they really have the disease, i. Conversely, if it is good news, and the screening test was negative, how reassured should the patient be?

The availability of a simple, accurate and inexpensive test has lead many states, including New York State, to require PKU screening for all newborns. Navigation menu. Why do we do screening? What are sensitivity and specificity? What is predictive value? What criteria should be considered for an effective screening program? Life-threatening diseases, such as breast cancer, and those known to have serious and irreversible consequences if not treated early, such as congenital hypothyroidism, are appropriate for screening.

Treatment of diseases at their earlier stages should be more effective than treatment begun after the development of symptoms. For example, cancer of the uterine cervix develops slowly, taking more than a decade for the cancer cells to progress to a phase of invasiveness.

During this preinvasive stage, the cancer is usually asymptomatic but can be detected by screening using the Pap smear. Treatment is more effective during this stage than when the cancer has become invasive.

On the other hand, lung cancer has a poor prognosis regardless of the stage at which treatment is initiated. Early diagnosis and treatment appear to prolong life little more than therapy after symptoms have developed. The distinction between benefits to the community and to individuals needs to be borne in mind when considering recommendations to participate in organised population screening programs. Participants in screening programs are ostensibly healthy people, so a program should, at the very least, be able to demonstrate evidence of an overall benefit to the community and a minimum of risk that certain individuals may be disadvantaged by the program [1].

Not only is it important that information on the effectiveness of screening programs be available, it should also be disseminated widely. Regular monitoring and evaluation of screening programs is also vital to ensure that effectiveness is maintained and improved where possible. It is essential to recognise that an organised population approach to screening, which ultimately achieves a net health benefit to a community, can result in adverse outcomes for some individuals.

There is a risk that people who receive false negative results may experience delays in diagnosis and treatment. Some may develop a false sense of security and ignore warning symptoms. Increasingly, false negative results can give rise to legal action by people whose cancers appear to have been missed. A false positive result can mean that people without the disease undergo follow-up testing that may be uncomfortable, expensive, and, in some cases, potentially harmful.

Rarely, this can lead to unnecessary treatment. There may be psychological consequences such as anxiety for both the patient and their family. For example, a woman with a false positive mammogram undergoing surgical investigation e. A person undergoing a colonoscopy as a result of a false positive faecal occult blood test faces the possibility of a bowel perforation during the procedure.

This risk might be as high as one in [4]. There are concerns that false negative results can give rise to legal action by people whose cancers appear to have been missed. This must be communicated effectively to the potential participants in a screening program to allow informed consideration of their involvement before any test is done.

Communications must address differences in literacy and language competency to ensure that individuals are properly informed.

Implementation of possible screening programs will be influenced by consideration of the equal distribution of limited resources across the whole community for maximum benefit. Resources allocated to a screening program will lower resources available for other health needs. The average age at the time of diagnosis is about Based on this and other research team's findings, a nationwide screening program was proposed and has been endorsed by several organizations [e.

As noted by Diedrerich :. Many pulmonary nodules even in smokers are due to benign lesions such as granulomas and hamartomas. In short, although this test is highly sensitive, it has a low specificity. Table 5 provides estimates of the false positive rate benign nodules discovered by CT scans as reported in several studies — even those that favor routine screening of this population subgroup.

The calculated or reported false positive rates shown in Table 5 vary substantially among the studies; 13 some of the differences can be explained by different criteria for defining a positive e.

Despite this variability it is apparent that most reported estimates of false positive probabilities are quite high.

The study by van Klavern et al. The actual test and decision criteria developed by these investigators differed from others. Specifically, they used a mathematical model to evaluate a non-calcified nodule according to its volume or volume-doubling time.

Another concern of critics of the NLST is that it might be difficult to generalize the results to community practices. Silvestri , for example, wrote:.

Participants in the NLST were enrolled in tertiary care hospitals with expertise in all aspects of cancer care. As a result, few patients required invasive testing and radiographic follow-up was sufficient for many patients. However, community radiologists without expertise in evaluating lung nodules may feel compelled to advise invasive testing for a screening-detected nodule.

Most scans Variation in how nodules are managed could lead to a substantial increase in transthoracic needle aspiration of lung nodules, unnecessary surgery, additional morbidity and even mortality for some persons who never had cancer to begin with. From the data given in Table 5 , it is clear that a conservative estimate of the false positive probability is at least 0. Thus, even for the potentially high risk group of elderly heavy cigarette smokers included in the screening trials, the Positive Predictive Value of the test is not likely to be high.

Ruano-Ravia et al. Kovalchik et al. Based on model predictions, they divided the study population into quintiles based on a predicted 5-year risk of lung cancer. They analyzed the NLST data and found:. Screening with low-dose CT prevented the greatest number of deaths from lung cancer among participants who were at highest risk and prevented very few deaths among those at lowest risk.

These findings provide empirical support for risk-based targeting of smokers for such screening. This finding highlights the importance of identifying the target population that is likely to benefit most from the screening procedure. Overdiagnosis is another factor to consider in assessing the merits of LDCT cancer screening. This is because although screening has a high sensitivity and potential to detect aggressive tumors, screening will also detect indolent tumors that otherwise might not cause immediate clinical symptoms.

Patz et al. Depending upon what is done in terms of follow up in the event of a positive screening test result, the impact of false positives could be substantial. Although hemorrhage was rare, complicating 1. In contrast, the risk of any pneumothorax was Patients aged 60—69 years as opposed to younger or older patients , smokers and those with chronic obstructive pulmonary disease had higher risk of complications. It is apparent form these results that the consequences of false positives are potentially material.

The MEDCAC reviews and evaluates medical literature, technology assessments, and examines data and information on the effectiveness and appropriateness of medical items and services that are covered under Medicare, or that may be eligible for coverage under Medicare. The decision may be made on policy grounds but, from a scientific perspective, the ultimate outcome is likely to hinge on the judgment of the key parameters prevalence in the population, the high false positive rate, and ultimately the low PPV Nelson, ; Phend, ; US Preventive Services Task Force, As a related example, we were asked by ECFIA a trade association of manufacturers of high temperature insulating wools to comment on the suitability of routine use of LDCT scans in a medical surveillance program for workers of all ages including both smokers and non-smokers engaged in the manufacture of refractory ceramic fiber RCF and other high temperature insulating wools in France.

The available results of a mortality study of these workers in two US plants does not indicate any increase over baseline cancer rates LeMasters et al. This is because most of the employed population is substantially younger than those included in the NLST indeed, the retirement age in France is 60—62 depending upon what age the employee entered the workforce and not all employees are smokers, let alone heavy smokers.

Taking this as an estimate applicable to the French population prevalence 14 and using the sensitivity and specificity values from Table 4 , the positive Predictive Value of CT lung cancer screening is approximately 0.

Obviously it would be much lower for young men and non-smokers and higher among those nearing retirement and heavy smokers. This means that the a posteriori probability regret that a subject who tests positive in a single CT scan does not have lung cancer is approximately 0. Despite the high probability that a subject with a positive test does not have lung cancer, these subjects would be subject to whatever follow-up procedures might accompany such a test result.

Members of this group would, at a minimum, suffer some mental distress and would be subject to follow-up CT scans and possibly invasive procedures. This screening test would clearly be inappropriate for this group. And, Bach et al. For individuals who have accumulated fewer than 30 pack-years of smoking or are either younger than 55 years or older than 74 years, or individuals who quit smoking more than 15 years ago, and for individuals with severe comorbidities that would preclude potentially curative treatment, limit life expectancy or both, we suggest that CT screenings should not be performed.

Thus, regardless of whether one believes that LDCT is an appropriate screening test for the population of older smokers, it is not justified for a population with much lower prevalence or those who are not likely to benefit from a correct diagnosis, evaluation and treatment. For example, mammography and clinical breast examination have been proposed for screening for breast cancer. As Elmore et al. If a woman undergoes annual screening beginning at the age of 40, she will have had 60 opportunities for a false positive result by the age of 70, with 30 mammograms and 30 clinical breast examinations.

The cumulative lifetime risk from her having a result from a screening test that requires further workup, even though no breast cancer is present, is not known…It is important to determine the cumulative risk of false positive tests, because women are advised to have breast-cancer screening every 1—2 years over several decades of their lifetimes, and false positive rates can provoke anxiety, increase costs and cause morbidity.

Thus, in evaluating periodic screening, it is necessary to measure or calculate cumulative probabilities. Care must be taken because the results of multiple tests may not be independent events.

As the PPV of a screening test depends critically on the prevalence of the disease in the population it is important to identify criteria to define a population group or subgroup with a high disease incidence to begin with. As noted above, this is why the LDCT program was limited to older smokers. Lung cancer rates increase with age and the vast majority of lung cancers occur in smokers. This is potentially a reasonable population subgroup for screening.

To illustrate the selection of a relevant population subgroup, we use an example from a study of breast cancer screening. Kerlikowske et al. They segmented the population into women of various age groups with and without a family history of breast cancer. Figure 5 shows a bar chart of the estimated PPVs for these groups. These investigators found that five times as many cancers per first-screening mammographic examinations were diagnosed in women aged 50 years or older compared with women aged less than 50 years.

The highest PPVs for mammography were older women with a family history of breast cancer. This finding guided their recommendation. Positive predictive value from mammography for women in various age groups with and without a family history of cancer according to data provided in Kerlikowske et al. Possible criteria for defining a population subgroup include various demographic factors age, gender, race and country , known risk factors e.

For screening to be highly effective, the prevalence in the population should be as high as is practicable. Harper et al. It is noted above that there may be opportunities to design a screening test that has different combinations of sensitivity and specificity. If so, there are opportunities to design the test to possess characteristics that are superior in terms of the combination of possible consequences of false positives and false negatives.

For example in the LDCT test, the threshold for size mm of the nodules or other characteristics e. Thompson et al.

Figure 6 shows a typical receiver operating characteristic ROC 16 curve for a prostate specific antigen PSA test administered to men aged 70 or more. Receiver operating characteristic curve of prostate specific antigen PSA test, based on data from Thompson et al. Numbers shown are the specific cutoff on the PSA test result.

The area under the curve AUC in this case is 0. Each subject is tested and a specific PSA score determined. Subjects were also administered digital rectal examinations and biopsies — those with positive biopsies were used as the gold standard for assessment of disease status.

Each cutoff score resulted in a partitioning of subjects into those who tested positive and those who tested negative. Knowing the actual disease status of the subjects enabled calculation of the sensitivity and specificity of the test.

The ROC curve plots the calculated sensitivity against the false positive error 1 — Sp. Thus, each plotted point on the curve represents a different possible screening test with its own sensitivity and specificity. By considering the consequences of false positives and false negatives, it is possible to determine a cutoff value for the PSA test that is optimal in some sense.

The value for the PSA tests studied by Thompson et al. Figure 7 shows the ROC curve topmost curve for this possible screening test. The dashed line in Figure 7 shows the ROC curve that would occur under chance alone. Receiver operating characteristic curves of prostate specific antigen PSA test, based on data from Thompson et al. The ROC curve is just one piece of the puzzle, but this type of analysis shows that it is possible to design a screening test with several alternative combinations of sensitivity and specificity.

A complete specification of a screening test includes the intrinsic test characteristics sensitivity, selectivity and cost and ROC curve if multiple tests are possible , characteristics of the subject population including opportunities for segmenting the population to identify high risk groups , the key derived quantities PPV and NPV and the consequences of false positives and negatives.

Screening tests have the potential to be a cost effective means for identifying subjects with early stage and thus potentially more treatable disease before symptoms develop and therefore, for saving lives. The ideal screening test would discriminate perfectly between those who have or do not have the disease and be inexpensive and not invasive.

In practice, screening tests exhibit false positives and false negatives — errors with consequences that need to be carefully considered when evaluating the advantages and disadvantages of the test.

The predictive value of the test depends in part on the technical parameters of the test, including the sensitivity and specificity, but also on the prevalence of the disease in the population.

For this reason, it is necessary to be able to define the population to be tested so that the prevalence is high. This is why mammography is appropriate only for older women and those with a family history of breast cancer and why lung CT scans are not appropriate for screening the general population. With some screening tests it is possible to alter the test decision criterion to alter the balance between sensitivity and specificity in which case it may be possible to develop an optimal screening test.

Nonetheless, screening of asymptomatic populations is not always appropriate and could do more harm than good. We appreciate the constructive comments offered by two anonymous reviewers. Their comments have improved this manuscript.

In addition, depending upon the population under study, some diseases sometimes termed pseudo diseases are detected that do not affect mortality because the subject may die from another disease or event.

This is termed overdiagnosis refer Black, for more detail. Many studies, however, are conducted on few individuals and it is important to understand the consequences in terms of the likely precision of the estimates.

For example, Elmore at al. Moreover, the 7. The probability of contacting cancer through age 60 or 62 when workers will retire is certainly lower. Thus, this estimate probably overstates the actual prevalence for the worker cohort. These were first used by scientists in Britain during World War II as the abilities of radar receiver operators were being assessed based on their ability to differentiate signal e.

The term was later borrowed by statisticians assessing screening tests. This paper represents independent research and the authors are solely responsible for the content. National Center for Biotechnology Information , U. Inhalation Toxicology. Inhal Toxicol. Published online Sep Utell 2. Daniel Maxim. Mark J. Author information Article notes Copyright and License information Disclaimer. Corresponding author.

Daniel Maxim: moc. Address for correspondence: Dr. E-mail: moc. This article has been corrected. See Inhal Toxicol. This article has been cited by other articles in PMC. Abstract Screening tests are widely used in medicine to assess the likelihood that members of a defined population have a particular disease.

Keywords: Benefits and limitations, positive and negative predicted value, prevalence, screening tests, sensitivity, specificity. Introduction A screening test sometimes termed medical surveillance is a medical test or procedure performed on members subjects of a defined 1 asymptomatic population or population subgroup to assess the likelihood of their members having a particular disease.

Definitions In its simplest form, the screening test has only two outcomes: positive suggesting that the subject has the disease or condition or negative suggesting that the subject does not have the disease or condition. Table 1. Logical possibilities for true disease state and screening test outcome.

Open in a separate window. Table 2. Examples of screening and diagnostic tests and possible Gold Standards. CDC Achkar et al. Figure 1. Table 3. Common sources of bias in study design. A numerical example All of these screening test characteristics are determined by testing a particular population using one or more screening tests and recording the number of subjects that fall into the various categories shown in Table 1.



0コメント

  • 1000 / 1000