In our test, we selected α = 0.05 and reject H0 if the observed sample mean exceeds 93.92 (focusing on the upper tail of the rejection region for now). The formula produces the minimum sample size to ensure that the margin of error in a confidence interval will not exceed E. In planning studies, investigators should also consider attrition or loss to follow-up. The effect size is the difference in the parameter of interest that represents a clinically meaningful difference. 2. the average acceptable run length if such a shift occurs before an out-of-control signal is generated. While each test involved details that were specific to the outcome of interest (e.g., continuous or dichotomous) and to the number of comparison groups (one, two, more than two), there were common elements to each test. However, it is more often the case that data on the variability of the outcome are available from only one group, usually the untreated (e.g., placebo control) or unexposed group. 7 min read How many is enough? Nevertheless, the study was stopped after an interim analysis. Now, suppose that the alternative hypothesis, H1, is true (i.e., μ ≠ 90) and that the true mean is actually 94. Had we assumed a standard deviation of 15, the sample size would have been n=35. From the Epi Info™ main page, select StatCalc. In sample size computations, investigators often use a value for the standard deviation from a previous study or a study performed in a different but comparable population. (Yuk!) The critical value (93.92) is indicated by the vertical line. 4 Enter the expected frequency (an estimate of the true prevalence, e.g.80% ± your minimum standard). In order to evaluate the properties of the screening test (e.g., the sensitivity and specificity), each pregnant woman will be asked to provide a blood sample and in addition to undergo an amniocentesis. Hyattsville, MD : US Government Printing Office; 2005. by feces infusion versus antibiotic therapy. The number of pounds lost will be computed for each child. The sample sizes are computed as follows: A major issue is determining the variability in the outcome of interest (σ), here the standard deviation of HDL cholesterol. If the new drug shows a 5 unit reduction in mean systolic blood pressure, this would represent a clinically meaningful reduction. Samples of size n1=33 and n2=33 will ensure that the test of hypothesis will have 80% power to detect this difference in the proportions of patients who are cured of C. diff. A sample size of 364 stents will ensure that a two-sided test with α=0.05 has 90% power to detect a 0.05, or 5%, difference in jthe proportion of defective stents produced. The investigators feel that a 30% increase in flu among those who used the athletic facility regularly would be clinically meaningful. Gestational weight gain and pregnancy outcome in terms of gestation at delivery and infant birth weight: a comparison between adolescents under 16 and adult women. ], The point estimate for the population mean is the sample mean and the margin of error is. The formulas shown below produce the number of participants needed with complete data, and we will illustrate how attrition is addressed in planning studies. In order to determine the sample size needed, the investigator must specify the desired margin of error. Cancer Epidemiology Biomarkers & Prevention. β is shown in the figure above as the area under the rightmost curve (H1) to the left of the vertical line (where we do not reject H0). In fact, it is the objective of the current study to estimate the prevalence in Boston. The challenge becomes the desired sample size to meet this 80% power. A two sided test will be used with a 5% level of significance. This calculator allows you to evaluate the properties of different statistical designs when planning an experiment (trial, test) utilizing a Null-Hypothesis Statistical Test to make inferences. Here we are planning a study to generate a 95% confidence interval for the unknown population proportion, p. The equation to determine the sample size for determining p seems to require knowledge of p, but this is obviously this is a circular argument, because if we knew the proportion of successes in the population, then a study would not be necessary! The formula above gives the number of participants needed with complete data to ensure that the margin of error in the confidence interval does not exceed E. We will illustrate how attrition is addressed in planning studies through examples in the following sections. From the figure above we can see what happens to β and power if we increase α. National Center for Health Statistics. Birth weights in infants clearly have a much more restricted range than weights of female college students. To facilitate interpretation, we will continue this discussion with as opposed to Z. Suppose, for example, we increase α to α=0.10.The upper critical value would be 92.56 instead of 93.92. Sample Size Determination Questions and Answers Test your understanding with practice problems and step-by-step solutions. Power is the probability that a test correctly rejects a false null hypothesis. An investigator wants to plan a clinical trial to evaluate the efficacy of a new drug designed to increase HDL cholesterol (the "good" cholesterol). Try to work through the calculation before you look at the answer. Note that there is an alternative formula for estimating the mean of a continuous outcome in a single population, and it is used when the sample size is small (n<30). Based on prior experience with similar trials, the investigator expects that 10% of all participants will be lost to follow up or will drop out of the study. Recall that the confidence interval formula to estimate prevalence is: Assuming that the prevalence of breast cancer in the sample will be close to that based on national data, we would expect the margin of error to be approximately equal to the following: Thus, with n=5,000 women, a 95% confidence interval would be expected to have a margin of error of 0.0018 (or 18 per 10,000). The investigator must enroll 258 participants to be randomly assigned to receive either the new drug or placebo. For example, suppose we want to estimate the mean birth weight of infants born to mothers who smoke cigarettes during pregnancy. Power is defined as 1- β = P(Reject H0 | H0 is false) and is shown in the figure as the area under the rightmost curve (H1) to the right of the vertical line (where we reject H0 ). We first compute the effect size by substituting the proportions of patients expected to be cured with each treatment, p1=0.6 and p2=0.9, and the overall proportion, p=0.75: We now substitute the effect size and the appropriate Z values for the selected a and power to compute the sample size. The study reported a standard deviation in weight lost over 8 weeks on a low fat diet of 8.4 pounds and a standard deviation in weight lost over 8 weeks on a low carbohydrate diet of 7.7 pounds. stical power: (a) the significance level (α), (b) the magnitude or size of the treatment effect (effect size), and (c) the sample size (n). Then substitute the effect size and the appropriate z values for the selected alpha and power to comute the sample size. The following example demonstrates how to calculate a sample size for a cohort or cross-sectional study. This is the first choice you need to make in the interface. We describe a novel strategy for power and sample size determination developed for studies utilizing investigational technologies with limited available preliminary data, specifically of imaging biomarkers. Again the issue is determining the variability in the outcome of interest (σ), here the standard deviation in pounds lost over 8 weeks. In the planned study, participants will be asked to fast overnight and to provide a blood sample for analysis of glucose levels. This is a situation where investigators might decide that a sample of this size is not feasible. Clostridium difficile (also referred to as "C. difficile" or "C. How many women 19 years of age and under must be enrolled in the study to ensure that a 95% confidence interval estimate of the mean birth weight of their infants has a margin of error not exceeding 100 grams? Data from the participants in the pilot study can be used to compute a sample standard deviation, which serves as a good estimate for σ in the sample size formula. In planning the study, the investigator must consider the fact that some women may deliver prematurely. In participants who attended the seventh examination of the Offspring Study and were not on treatment for high cholesterol, the standard deviation of HDL cholesterol is 17.1. The margin of error is so wide that the confidence interval is uninformative. An alternative is to conduct a matched case-control study rather than the above unmatched design. (Do the computation yourself, before looking at the answer.). Statistical power is a fundamental consideration when designing research experiments. In fact, the investigators enrolled 38 into each group to allow for attrition. National data suggest that 12% of infants are born prematurely. • The larger the sample size, the higher will be the degree of accuracy, but this is limited by the availability of resources. The application will show three different sample size estimates according to three different statistical calculations. If a study is planned where different numbers of patients will be assigned or different numbers of patients will comprise the comparison groups, then alternative formulas can be used. What is sample size and why is it important? Of 16 patients in the infusion group, 13 (81%) had resolution of C. difficile–associated diarrhea after the first infusion. Wechsler H, Lee JE, Kuo M, Lee H. College Binge Drinking in the 1990s:A Continuing Problem Results of the Harvard School of Public Health 1999 College Health, 2000; 48: 199-210. • It can be determined using formulae, readymade tables and computer softwares. How precisely can we estimate the prevalence with a sample of size n=5,000? Chirayath M. Suchindran, in Encyclopedia of Social Measurement, 2005. In studies where the plan is to estimate the mean difference of a continuous outcome based on matched data, the formula for determining sample size is given below: where Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%), E is the desired margin of error, and σd is the standard deviation of the difference scores. Again, these sample sizes refer to the numbers of participants with complete data. The areas in the two tails of the curve represent the probability of a Type I Error, α= 0.05. An investigator is planning a clinical trial to evaluate the efficacy of a new drug designed to reduce systolic blood pressure. p is the proportion of successes in the population. Recall from the module on Hypothesis Testing that, when we performed tests of hypothesis comparing the means of two independent groups, we used Sp, the pooled estimate of the common standard deviation, as a measure of variability in the outcome. [Note: We always round up; the sample size formulas always generate the minimum number of subjects needed to ensure the specified precision.] Samples of size n 1 =324 and n 2 =324 will ensure that the test of hypothesis will have 80% power to detect a 30% difference in the proportions of students who develop flu between those who do and do not use the athletic facilities regularly. The effect size is the difference in the parameter of interest (e.g., μ) that represents a clinically meaningful difference. Many times those that undertake a research project often find they are not aware of the differences between Qualitative Research and Quantitative Research methods. An investigator hypothesizes that there is a higher incidence of flu among students who use their athletic facility regularly than their counterparts who do not. The effect size represents the meaningful difference in the population mean - here 95 versus 100, or 0.51 standard deviation units different. In studies where the plan is to perform a test of hypothesis on the mean difference in a continuous outcome variable based on matched data, the hypotheses of interest are: where μd is the mean difference in the population. The rejection region is shown in the tails of the figure below. Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin|Madison November 3{8, 2011 Power 1 / 31 Experimental Design To this point in the semester, we have largely focused on methods to analyze the data that we have with little regard to the decisions on how to gather the data. If so, the known proportion can be used for both p1 and p2 in the formula shown above. Statistical Methods for Rates and Proportions. σ again reflects the standard deviation of the outcome variable. It is customary to calculate sample size based on power (Adcock, 1997). If that is unsuccessful, the infection has been treated by switching to another antibiotic. Now substitute the effect size and the appropriate z values for alpha and power to compute the sample size. In statistical hypothesis terms, power is the probability of rejecting the null hypothesis when it … 2003; 12: 604-609. Top Suppose that the collection and processing of the blood sample costs $250 per participant and that the amniocentesis costs $900 per participant. In the previous figure for H0: μ = 90 and H1: μ = 94, if we observed a sample mean of 93, for example, it would not be as clear as to whether it came from a distribution whose mean is 90 or one whose mean is 94. We now substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size. Example: Suppose one wishes to detect a simple corrleation r (r=0.4) of N observations. In sample size computations, investigators often use a value for the standard deviation from a previous study or a study done in a different, but comparable, population. In designing studies most people consider power of 80% or 90% (just as we generally use 95% as the confidence level for confidence interval estimates). An investigator wants to estimate the proportion of freshmen at his University who currently smoke cigarettes (i.e., the prevalence of smoking). A 95% confidence interval will be estimated to quantify the difference in weight lost between the two diets and the investigator would like the margin of error to be no more than 3 pounds. A cross-sectional study is planned to assess the mean fasting blood glucose levels in people who drink at least two cups of coffee per day. We will use that estimate for both groups in the sample size computation. The manufacturer wants to test whether the proportion of defective stents is more than 10%. Interested readers can see Fleiss for more details.4. The plan is to enroll participants and to randomly assign them to receive either the new drug or a placebo. Just as it is important to consider both statistical and clinical significance when interpreting results of a statistical analysis, it is also important to weigh both statistical and logistical issues in determining the sample size for a study. Usually, studies have a power of around 80%, which means that you accept the possibility that in 20% of the cases, the real difference was missed (you concluded there was no effect when there was one). Statistical power is the most commonly used metric for sample size determination. Therefore, before collecting data, it is essential to determine the … β and power are also related to the variability of the outcome and to the effect size. Sometimes it is difficult to estimate σ. Sample Size to Conduct Test of Hypothesis. However, in many studies, the sample size is determined by financial or logistical constraints. Therefore, a sample of size n=31 will ensure that a two-sided test with α =0.05 has 80% power to detect a 5 mg/dL difference in mean fasting blood glucose levels. How many women must be involved in the study to ensure that the estimate is precise? Notice that there is much higher power when there is a larger difference between the mean under H0 as compared to H1 (i.e., 90 versus 98). Buschman NA, Foster G, Vickers P. Adolescent girls and their babies: achieving optimal birth weight. Study D says it needs 40 subjects in each class to be confident of 80% power, but the study only has 35 subjects, so we hit the red STOP in the lower left quadrant. If a sample mean of 97 or higher is observed it is very unlikely that it came from a distribution whose mean is 90. If the null hypothesis is true (μ=90), then we are likely to select a sample whose mean is close in value to 90. The Cohort or Cross-Sectional window opens. In studies where the plan is to estimate the difference in means between two independent populations, the formula for determining the sample sizes required in each comparison group is given below: where ni is the sample size required in each group (i=1,2), Z is the value from the standard normal distribution reflecting the confidence level that will be used and E is the desired margin of error. Before presenting the formulas to determine the sample sizes required to ensure high power in a test, we will first discuss power from a conceptual point of view. Therefore, the manufacturer wants the test to have 90% power to detect a difference in proportions of this magnitude. A statistical test is much more likely to reject the null hypothesis in favor of the alternative if the true mean is 98 than if the true mean is 94. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. This leaves: Finally, square both sides of the equation to get: This formula generates the sample size, n, required to ensure that the margin of error, E, does not exceed a specified value. Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. Browse through all study tools. The second type of error is called a Type II error and it is defined as the probability we do not reject H0 when it is false. A two sided test will be used with a 5% level of significance. In studies where the plan is to estimate the mean of a continuous outcome variable in a single population, the formula for determining sample size is given below: where Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%), σ is the standard deviation of the outcome variable and E is the desired margin of error. National data suggest that 1 in 235 women are diagnosed with breast cancer by age 40. The inputs for the sample size formulas include the desired power, the level of significance and the effect size. With all other parameters equal to above specified, sampsize returns a sample size of 226 case-control pairs (total sample size 452). Antibiotic therapy sometimes diminishes the normal flora in the colon to the point that C. difficile flourishes and causes infection with symptoms ranging from diarrhea to life-threatening inflammation of the colon. The plan is to enroll patients who suffer from migraine headaches. Each patient will then undergo the acupuncture treatment. The probability of a Type II error is denoted β, and β = P(Do not Reject H0 | H0 is false), i.e., the probability of not rejecting the null hypothesis if the null hypothesis were true. The two major factors affecting the power of a study are the sample size and the effect size. Sample size refers to the number of participants or observations included in a study. The sample size is computed as follows: A sample of size n=16,448 will ensure that a 95% confidence interval estimate of the prevalence of breast cancer is within 0.10 (or to within 10 women per 10,000) of its true value. The formula for determining sample size to ensure that the test has a specified power is given below: where α is the selected level of significance and Z 1-α /2 is the value from the standard normal distribution holding 1- α/2 below it. Analysis of data from the Framingham Heart Study showed that the standard deviation of systolic blood pressure was 19.0. Studies that have either an inadequate number of participants or an excessively large number of participants are both wasteful in terms of participant and investigator time, resources to conduct the assessments, analytic efforts and so on. This may or may not be a reasonable assumption. If a study is planned where different numbers of patients will be assigned or different numbers of patients will comprise the comparison groups, then alternative formulas can be used. N (number to enroll) * (% following protocol) = desired sample size. Suppose one such study compared the same diets in adults and involved 100 participants in each diet group. When we run tests of hypotheses, we usually standardize the data (e.g., convert to Z or t) and the critical values are appropriate values from the probability distribution used in the test. Howell DC. Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. To plan this study, we can use data from the Framingham Heart Study. 1- β is the selected power, and Z 1-β is the value from the standard normal distribution holding 1- β below it. These data can be used to estimate the common standard deviation in weight lost as follows: We now use this value and the other inputs to compute the sample sizes: Samples of size n1=56 and n2=56 will ensure that the 95% confidence interval for the difference in weight lost between diets will have a margin of error of no more than 3 pounds. C-reactive protein, the metabolic syndrome and prediction of cardiovascular events in the Framingham Offspring Study. Fleiss JL. 2001; 27(2):163-171. A two sided test of hypothesis will be conducted, at α =0.05, to assess whether there is a statistically significant difference in pain scores before and after treatment. Compute the sample size required to estimate population parameters with precision. However, it is more often the case that data on the variability of the outcome are available from only one group, often the untreated (e.g., placebo control) or unexposed group. If the process produces more than 15% defective stents, then corrective action must be taken. Sample size estimates for hypothesis testing are often based on achieving 80% or 90% power. However, the investigators hypothesized a 10% attrition rate (in both groups), and to ensure a total sample size of 232 they need to allow for attrition. During a typical year, approximately 35% of the students experience flu. In the module on hypothesis testing for means and proportions, we introduced techniques for means, proportions, differences in means, and differences in proportions. The investigators planned to randomly assign patients with recurrent C. difficile infection to either antibiotic therapy or to duodenal infusion of donor feces. It is critical to understand that different study designs need different methods of sample size estimation. If data are available on variability of the outcome in each comparison group, then Sp can be computed and used to generate the sample sizes. For example, suppose a study is proposed to evaluate a new screening test for Down Syndrome. The values of p1 and p2 that maximize the sample size are p1=p2=0.5. How many subjects will be needed in each group to ensure that the power of the study is 80% with a level of significance α = 0.05? 43 In planning studies, we want to determine the sample size needed to ensure that the margin of error is sufficiently small to be informative. How to Calculate a Sample Size It is fairly easy to determine your desired sample size. The standard deviation of the outcome variable measured in patients assigned to the placebo, control or unexposed group can be used to plan a future trial, as illustrated. Using this estimate of p, what sample size is needed (assuming that again a 95% confidence interval will be used and we want the same level of precision)? Similar to the situation for two independent samples and a continuous outcome at the top of this page, it may be the case that data are available on the proportion of successes in one group, usually the untreated (e.g., placebo control) or unexposed group. Resolution of C. difficile infection occurred in only 4 of 13 patients (31%) receiving the antibiotic vancomycin. However, the estimate must be realistic. How many freshmen should be involved in the study to ensure that a 95% confidence interval estimate of the proportion of freshmen who smoke is within 5% of the true proportion? Recall from the module on confidence intervals that, when we generated a confidence interval estimate for the difference in means, we used Sp, the pooled estimate of the common standard deviation, as a measure of variability in the outcome (based on pooling the data), where Sp is computed as follows: If data are available on variability of the outcome in each comparison group, then Sp can be computed and used in the sample size formula. 42 43. A recent report from the Framingham Heart Study indicated that 26% of people free of cardiovascular disease had elevated LDL cholesterol levels, defined as LDL > 159 mg/dL.9 An investigator hypothesizes that a higher proportion of patients with a history of cardiovascular disease will have elevated LDL cholesterol. The figure above graphically displays α, β, and power when the difference in the mean under the null as compared to the alternative hypothesis is 4 units (i.e., 90 versus 94). We first compute the effect size by substituting the proportions of students in each group who are expected to develop flu, p1=0.46 (i.e., 0.35*1.30=0.46) and p2=0.35 and the overall proportion, p=0.41 (i.e., (0.46+0.35)/2): Samples of size n1=324 and n2=324 will ensure that the test of hypothesis will have 80% power to detect a 30% difference in the proportions of students who develop flu between those who do and do not use the athletic facilities regularly. A sample of size n=32 patients with migraine will ensure that a two-sided test with α =0.05 has 80% power to detect a mean difference of 10 points in pain before and after treatment, assuming that all 32 patients complete the treatment. 1- β is the selected power and Z 1-β is the value from the standard normal distribution holding 1- β below it , and ES is the effect size, defined as follows: where p0 is the proportion under H0 and p1 is the proportion under H1. Sample sizes of ni=44 heavy drinkers and 44 who drink few fewer than five drinks per typical drinking day will ensure that the test of hypothesis has 80% power to detect a 0.25 unit difference in mean grade point averages. An investigator wants to estimate the mean birth weight of infants born full term (approximately 40 weeks gestation) to mothers who are 19 years of age and under. The sample size computations depend on the level of significance, aα, the desired power of the test (equivalent to 1-β), the variability of the outcome, and the effect size.