Scientific Methods for Health Sciences - Design of Experiments
IV. HS 850: Fundamentals
Design of Experiment
1) Overview: Design of experiments is a systematic, rigorous approach to problem solving that applies principles and techniques at the data collection stage so as to ensure the generation of valid, supportable and defensible conclusions. Design of experiments can be used at the point of greatest leverage to reduce costs by speeding up the design process, reducing late engineering design changes and reducing product material and labor complexity. It is also powerful tools to achieve manufacturing costs savings by minimizing process variation and reducing rework, and the need for inspection. The lecture presents a general overview of DOE and an introduction to some fundamental concepts, objectives, steps and design guidelines to assist in conducting designed experiments.
2) Motivation: Experiment would be the natural way to implement a study and achieve the desired objectives. So the next question is how can these experiments and studies be realized, that is we need a blueprint for planning the study or experiment including ways to collect data and to control study parameters for accuracy and consistency. What are the key factors in a process? At what settings would the process deliver acceptable performance? What are the main and interaction effects in the process and what settings would bring out less variation in the output? Design of Experiment would be the answer to those questions. Experiments can be designed in many different ways to collect the information of which process inputs have a significant impact on the process output and what the target level of those inputs should be to achieve a desired output. There are four general problem areas in which design of experiment may be applied:
- Comparative: the designer is interested in assessing whether a change in a single factor has in fact resulted in a change/improvement to the process as a whole.
- Screening and characterizing: the designer is interested in understanding the process as a whole in the sense that they can have a ranked list of the importance of factors that can affect the process.
- Modeling: the designer is interested in functionally modeling the process with the output being a good fit mathematical function, and to have good estimates of the coefficients in that function..
- Optimizing: the designer is interested in determining the optimal settings of the process factors, that is to determine for each factor the level of the factor that optimizes the process response.
3.1) The most common components in design of experiments:
- Comparison: in some fields of study it’s not possible to have independent measurements to a traceable standard and comparisons between treatments are much more valuable and preferable. To make inference about effects, associations or predictions, one typically has to compare different groups subjected to distinct conditions.
- Randomization: the process of assigning individuals at random to groups in an experiment. It requires that we make allocation of (controlled variables) treatments to units using some random mechanism. Random does not mean haphazard and great care needs to be taken to make sure appropriate random methods are used.
- Experimental vs. observational studies: there are many situations where randomized experiments are impractical. Therefore, we cannot reduce causality or effects of various treatments on the response measurement. Observational studies are retrospective or prospective studies where the investigator doesn’t have control over randomization of treatments to subject or units. In these cases, the subjects or units fall naturally within a treatment group.
- Replication: all measurements, observations or data collection are usually subject to variation and uncertainty. They are repeated and full experiments are replicated to help identify the sources of variation to better estimate the true effects of treatments, to further strengthen the experiment’s reliability and to add to the existing knowledge of the topic.
- Blocking: the arrangement of experimental units into groups consisting of units that are similar to one another. It reduced known but irrelevant sources of variation between units and allows greater precision in the estimation of the source of variation under study.
- Orthogonality：it concerns the forms of contrasts that can be legitimately and efficiently carried out. With independence between contrasts, each orthogonal treatment provides different information to the others. The goal is to completely decompose the variance or the relations of the observed measurements into independent components.
- Factorial experiments: are more efficient at evaluating the effects and possible interactions of several factors. DOE is built on the foundation of the analysis of variance, which partitions the observed variance into components according to what factors the experiment must estimate or test.
- Placebo: is a sham or simulated medical intervention that has no direct health impact but may result in actual improvement of a medical condition or disorder. Of such sham effect is observed, it is called a placebo effect. Common placebos are inert tablets, sham surgery and other procedures based on false information. An example could be giving a patient a pill identical to the actual treatment pill but without treatment ingredients. Typically all patients are informed that some will be treated using the drug and some will receive the insert pill, however the patients are blinded as to whether they actually received the drug or the placebo. Such an intervention may cause the patient to believe the treatment will change their condition, which may produce a subjective perception of a therapeutic effect.
3.2) Components of DOE:
- Factors (inputs): including controllable and uncontrollable variables. The former refers to the factors that we can control like how big is the dose or how often is the treatment taken by the patients. The later refers to factors we have no power with like the factors from the environment: air condition, temperature or humidity. People are generally considered as noise factor, which is an uncontrollable factor that causes variability under normal operating conditions but we can control it during the experiment using blocking and randomization.
- Levels (settings of each factor): examples include particular level of dosage for evaluation.
- Response (output): consider the test on a new drug. The output could be the frequency patients having the struck or their need for drugs. Experiments often desire to avoid optimizing the process for on response at the expense of another and important outcomes are measured and analyzed to determine the factors and their setting that will provide the best overall outcome.
3.3) Objectives of DOE:
- Comparing alternatives: DOE allows us to make an informed decision that evaluates both the quality and the cost.
- Identify significant factors that affect the output: separating the vital few from the trivial many.
- Achieving an optimal process output.
- Reducing variability.
- Minimizing, maximizing or targeting an output.
- Improving process or product robustness to ensure the experiments fits with varying conditions.
- Balance tradeoffs between multiple quality characteristics that require optimization.
3.4) DOE guidelines: address the questions outlined above by stipulating factors to be tested, levels of the factors and structure and layout of experimental conditions. To sum up, DOE aims to come up with an experiment that can obtain the required information in a cost effective and reproducible manner.
- Unexplained variation, in addition to measurement error, can obscure the results. Errors can be unexplained variation that is either within or between experiment runs.
- Noise factors: uncontrollable factors that induce variation under normal operating conditions. For example multiple shifts, humidity or raw materials can be built into the experiment so that variation doesn’t lumped into unexplained error.
- Correlation. Consider two factors that vary together may be highly correlated without one causing the other or they may both the cause of a third factor.
- The combined effects or interactions. Consider growing rose, sufficient water will be benefit for its growth though too much water may be harmful for the rose. Factors may generate non-linear effects that are not additive, but these can only be studied with more complex experiments involving more than 2 level settings, such as quadratic or cubic.
3.5) DOE process:
3.6) Test of means – one factor experiment: One of the most common types of experiments is the comparison of two process methods, or two methods of treatment. One of the most straightforward methods to evaluate a new process method is to plot the results on an SPC chart that also includes historical data from the baseline process, with established control limits. Then apply the standard rules to evaluate out-of-control conditions to see if the process has been shifted. You may need to collect several subgroups worth of data in order to make a determination, although a single subgroup could fall outside of the existing control limits. An alternative way to control chart approach is to use F-test to compare the means of alternate treatments and this is done automatically with ANOVA (analysis of variance). Consider the following example where three treatments are analyzed with the following data:
|A Usual Route||B (alternate)||C(alternate)||Variance||Mean|
|Time in Minutes||27.0||26.0||29.5|
|Mean \((Y ̅)\)||29.4||28.3||26.6||1.99|
The F-test analysis is the basis for model evaluation of both single factor and multi-factor experiments. This analysis is commonly output as an ANOVA table by statistical analysis software as illustrated in the table below;
|AVONA - Analysis of Variance Table|
|Source||Sum of Squares||DF||Mean Square||F-Ratio||Probability||Significant|
0.0408: there is only 4.08% probability that a Model F-ratio this large could occur due to noise (random chance). In other words, the three routes differ significantly in terms of the time taken to reach home from work.
ANOVA: \(H_0: μ_1=⋯=μ_a vs.H_a\)at least one mean is different
|Source||Sum of Squares||DF||Mean Square||F-Ratio|
|Between Groups||\SS_<treatment>=∑_(i=1)^a n_i ((y_(i.) ) ̅-(y_(..) ) ̅ )^2 \)||a-1||\(MS_treatment=SS_treatment/(a-1)\)||F=MS_treatment/MS_error|
We reject the null hypothesis of equal treatment means if F_0>F_(α,a-1,a(n-1)) Note: a is the number of treatments, n_i is the size of sample in the i^th group, α is the level of significance, y_ij is the measurement from group i, observation index j, (y_(..) ) ̅ is the grand mean of all the observations, (y_(i.) ) ̅ is the grand mean f the i^th treatment group. ANOVA will be further studied in the section of ANOVA later.
- 4.1) This article (http://www.cancer.org/healthy/stayawayfromtobacco/index) presents an observational study of smoking effects on cancer. It presents various side effects of tobacco on human health and provided guide to quit smoking. This article illustrated the who study in ten sections where each section is fully developed in a clearly stated form with questions and answers format. The whole article is well organized and prepares people with enough knowledge of the reason behind quitting smoking as well as suggestions and programs help people quit smoking. This is a typical observational study.
- 4.2) This article (http://arc.aiaa.org/doi/abs/10.2514/2.3153?journalCode=ja) brought together an empirical drag prediction model plus design of experiment, response surface and data-fusion methods with computational fluid dynamics (CFD) to provide a wing optimization system. The system presented allows high-quality designs to be found using a full three-dimensional CFD code without the expense of direct searches. The meta-models built are shown to be more accurate than the initial empirical model or than simple response surfaces based on the CFD data alone. Data fusion is achieved by building a response surface kriging of the differences between the two drag prediction tools, which are working at varying levels of fidelity. It then uses kriging with empirical tool to predict the drags coming from the CFD code, which is much quicker to use than direct searches of the CFD.
- 4.3) This article (http://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10481253#.U6HGyBZRXKw) illustrated certain numerical approximations for finding one and two stage bioassay designs, which produce small posterior variance using a one-parameter logistic distribution. It discussed the use of two prior distribution: one for design and the other for inference with graphs for designing experiments when the prior distribution are normal. These graphs illustrate the importance of using additional dose levels when the variance of the prior distribution is large.
6) Problems 6.1) Suppose two researchers wanted to determine if aspirin reduced the chance of a heart attack. Researcher 1 studied the medical records of 500 patients. For each patient, he recorded whether the person took aspirin every day and if the person had ever had a heart attack. Then he reported the percentage of heart attacks for the patients who took aspirin every day and for those who did not take aspirin every day. Researcher 2 also studied 500 people. He randomly assigned half of the patients to take aspirin every day and the other half to take a placebo everyday. After a certain length of time, he reported the percentage of heart attacks for the patients who took aspirin every day and for those who did not take aspirin every day. Suppose that both researchers found that there is a statistically significant difference in the heart attack rates for the aspirin users and the non-aspirin users and that aspirin users had a lower rate of heart attacks. Can both researchers conclude that aspirin caused the reduction?
(a) No, only researcher 2 can conclude this.
(b) No, only researcher 1 can conclude this.
(c) Yes, because aspirin is known to reduce heart attacks.
(d) Yes, because aspirin users had a larger heart attack rate in both studies.
6.2) Suppose that you were hired as a statistical consultant to design a study to examine the impact of a new medicine vs. a current medicine on lowering blood pressure. 50 patients volunteer to participate in the study. What design will you recommend?
(a) Completely randomized design with two factors.
(b) Completely randomized design with two factors and single blind.
(c) Completely randomized design.
(d) Completely randomized design with two factors and double blind.
6.3 – 6.6 are based on the following: Hospital floors are usually covered by bare tiles. Carpets would cut down on noise but might be more likely to harbor germs. To study this possibility, investigators randomly assigned 8 of 16 available hospital rooms to have carpet installed. The others were left bare. Later, air from each room was pumped over a dish of agar. The dish was incubated for a fixed period, and the number of bacteria colonies was counted.
6.3) Select the appropriate statistical term for the 8 rooms left bare.
(b) Experimental Units
(c) Control Group
6.4) Select the appropriate statistical term for the 16 hospital rooms.
(c) Experimental Units
(d) Control Group
6.5) Select the appropriate statistical term for number of colonies in a dish.
(b) Control Group
(d) Experimental Units
6.6) Select the appropriate statistical term for number of colonies in a dish.
(c) Experimental Units
(d) Control Group
6.7) A psychologist is examining the effect of showing pictures on learning of words by seven-year-olds. The seven-year-olds are randomly assigned to two groups. The experimental group is shown the word along with the picture. The control group is shown only the word. At the end of the experiment, the subjects are given a test on the number of words they get right. This is an example of:
(a) A blind study
(b) An experiment with a design flaw
(c) A double blind study
(d) A well-designed experiment
6.8) Suppose that students A and B are working for the university. The registrar asks student A to calculate the mean and SD of the GPA's for the Fall 2005 freshmen class. He asks student B to design a sampling strategy to evaluate the attitude of the undergraduates at the university toward undergraduate teaching.
(a) Student A is doing descriptive statistics and student B is doing inferential statistics.
(b) Student A is doing inferential statistics and student B is doing descriptive statistics.
(c) Both students are doing descriptive statistics.
(d) Both students are doing inferential statistics.
6.9) At the Department of Statistics, we intend to examine the effect of using computers in Statistics 10 on the attitudes of students toward statistics. We offer ten lectures of Statistics 10 in an academic year. Five of these sections are randomly assigned to the experimental group and the other five are assigned to the control group. The experimental group will go to lecture, section, and computer lab. The control group will only go to lecture and section, but will not do the computer lab. The attitude of the students toward statistics is measured before and after the course. This study is:
(a) A double blind study
(b) A well-designed experiment
(c) A blind study
(d) Not a randomized experiment
6.10) An office manager wonders whether there is any relationship between drinking coffee before 10 am and alertness. He selects at random 3 days of the week, and in those days, he compared the alertness level of 25 employees who usually drink coffee before 10 am and 25 employees who do not usually drink coffee before 10 am. Is this an observational or experimental study?
(a) We need more information to decide
(b) This is an experimental study
(c) This is an observational study
(d) This is a combination of experimental and observational study
6.11) A major car manufacturing company intends to find out if cars get better millage with premium instead of regular unleaded gasoline. They also would like to know if the size of the car has any effect on fuel economy. 96 volunteers who are similar in age, experience and style of driving participate in the study. The drivers are randomly assigned to the premium and regular groups. The drivers assigned to the premium and regular groups are then randomly assigned to drive a small, medium, or large car. All of the drivers are asked to keep a driving log. What is the design used for this study?
(a) randomized block design
(b) Completely randomized two factor experiment
(c) Completely randomized experiment with one factor
(d) Completely randomized experiment with matching
6.12) For this research situation, decide what statistical procedure would most likely be used to answer the research question posed. Assume all assumptions have been met for using the procedure. Is ethnicity related to political party affiliation (Republican, Democrat, Other)?
(a) Test the difference in means between two paired or dependent samples.
(b) Use a chi-squared test of association.
(c) Test one mean against a hypothesized constant.
(d) Test the difference between two means (independent samples).
(e) Test for a difference in more than two means (one way ANOVA).
6.13) In a large mid-western university with 30 different departments, the university is considering eliminating standardized scores from their admission requirements. The university wants to find out whether the students agree with this plan. They decide to randomly select 100 students from each department, send them a survey, and follow up with a phone call if they do not return the survey within a week. What kind of sampling plan did they use?
(a) Stratified random sampling
(b) Simple random sampling
(c) Cluster sampling
(d) Multi-stage sampling
7) References http://mirlyn.lib.umich.edu/Record/004199238
Answers: a, d, c, c, c, a, d, a, d, c, a, b, a
- SOCR Home page: http://www.socr.umich.edu
Translate this page: