SMHS DesignOfExperiments

Jump to: navigation, search

Scientific Methods for Health Sciences - Design of Experiments


Design of experiments is a systematic, rigorous approach to problem solving that applies principles and techniques during the data collection stage so as to ensure the generation of valid, supportable and defensible conclusions. Design of experiments can be used at the point of greatest leverage to reduce costs by speeding up the design process, reducing late engineering design changes and reducing product material and labor complexity. It also includes powerful tools to achieve manufacturing costs savings by minimizing process variation, rework and the need for inspection. This chapter presents a general overview of DOE and an introduction to some fundamental concepts, objectives, steps and design guidelines to assist in conducting designed experiments.


Experiment would be the natural way to implement a study and achieve the desired objectives. So the next question is how can these experiments and studies be realized? We need a blueprint for planning the study or experiment, including ways to collect data and to control study parameters for accuracy and consistency. What are the key factors in a process? At what settings would the process deliver acceptable performance? What are the main and interaction effects in the process, and what settings would bring out less variation in the output?

Design of Experiment is the way to the answer these questions. Experiments can be designed in many different ways to collect information about which process inputs have a significant impact on the process output and what the target level of those inputs should be to achieve a desired output. There are four general problem areas in which DOE may be applied:

  • Comparative: The designer is interested in assessing whether a change in a single factor has in fact resulted in a change/improvement to the process as a whole.
  • Screening and characterizing: The designer is interested in understanding the process as a whole in the sense that they can rank the importance of factors that affect the process.
  • Modeling: The designer is interested in functionally modeling the process with the output being a good fit mathematical function, and to have good estimates of the coefficients in that function.
  • Optimizing: The designer is interested in determining the optimal settings of the process factors so as to determine the level of each factor that optimizes the process response.


The most common components in design of experiments (DOE):

  • Comparison: In some fields of study, it’s not possible to have independent measurements to a traceable standard, making comparisons between treatments much more valuable and preferable. To make inference about effects, associations or predictions, one typically has to compare different groups that are subjected to distinct conditions.
  • Randomization: The process of assigning individuals at random to groups in an experiment. It requires that we make allocation of (controlled variables) treatments to units using some random mechanism. Random does not mean haphazard and great care needs to be taken to ensure that appropriate random methods are used.
  • Experimental vs. observational studies: There are many situations in which randomized experiments are impractical. Therefore, we cannot reduce causality or effects of various treatments on the response measurement. Observational studies are retrospective or prospective studies where the investigator doesn’t have control over the randomization of treatments to subject or units. In these cases, the subjects or units fall naturally within a treatment group.
  • Replication: All measurements, observations or data collection are usually subject to variation and uncertainty. They are repeated and full experiments are replicated to help identify the sources of variation. This helps to better estimate the true effects of treatments, to further strengthen the experiment’s reliability, and to add to the existing knowledge of the topic.
  • Blocking: The arrangement of experimental units into groups consisting of units that are similar to one another. It reduces known but irrelevant sources of variation between units and allows greater precision in the estimation of the source of variation under study.
  • Orthogonality: This concerns the forms of contrasts that can be legitimately and efficiently carried out. With independence between contrasts, each orthogonal treatment provides different information to the others. The goal is to completely decompose the variance or the relations of the observed measurements into independent components.
  • Factorial experiments: These are more efficient at evaluating the effects and possible interactions of several factors. DOE is built on the analysis of variance, which partitions the observed variance into components according to which factors the experiment must estimate or test.
  • Placebo: This is a sham or simulated medical intervention that has no direct health impact, but which may result in actual improvement of a medical condition or disorder. It is also called a placebo effect. Common placebos are inert tablets, sham surgery and other procedures based on false information. An example could be giving a patient a pill identical to the actual treatment pill but without treatment ingredients. Typically all patients are informed that some will be treated using the drug and some will receive the insert pill; however the patients are blinded as to whether they actually received the drug or the placebo. Such an intervention may cause the patient to believe the treatment will change their condition, which may produce a subjective perception of a therapeutic effect.

Components of DOE

  • Factors (inputs): Include controllable and uncontrollable variables. The former refers to factors that we can control (e.g., the size of a dose or the frequency with which a treatment is taken by patients). The latter refers to factors we cannot control (e.g., environmental factors like air condition, temperature or humidity). Human beings are generally considered a noise factor, which is an uncontrollable factor that causes variability under normal operating conditions; yet we can control these factors during the experiment using blocking and randomization.
  • Levels (settings of each factor): One example would be the particular level of dosage for evaluation.
  • Response (output): Consider testing a new drug. The output could be the frequency with which patients require treatment with drugs. Experiments should avoid optimizing the process for one response at the expense of another, and important outcomes are measured and analyzed to determine the factors and their settings in a way that will provide the best overall outcome.

Objectives of DOE

  • Comparing alternatives (i.e., DOE allows us to make an informed decision that evaluates both the quality and the cost)
  • Identifying significant factors that affect the output (i.e., separating the vital few from the trivial many)
  • Achieving an optimal process output
  • Reducing variability
  • Minimizing, maximizing or targeting an output
  • Improving process or product robustness to ensure the experiments fits with varying conditions
  • Balance tradeoffs between multiple quality characteristics that require optimization

DOE guidelines

  • DOE guidelines address the questions outlined above by stipulating factors to be tested, as well as the levels of the factors and the structure and layout of the experimental conditions. To sum up, DOE aims to come up with an experiment that can obtain the required information in a cost effective and reproducible manner.
  • Unexplained variation, in addition to measurement error, can obscure the results. Errors can be unexplained variations that occur either within or between experiment runs.
  • Noise factors are uncontrollable factors that induce variation under normal operating conditions. For example, multiple shifts, humidity or raw materials can be built into the experiment so that variations will not be lumped together with unexplained errors.
  • Correlation: Two factors that vary together may be highly correlated without one causing the other. They may also both be the causes of a third factor.
  • The combined effects or interactions: Consider growing roses. Sufficient water will be required for their growth, though too much water may also be harmful for them. Factors may generate non-linear effects that are not additive, but these can only be studied with more complex experiments involving more than 2 level settings (e.g., quadratic or cubic experiments).

DOE process

SMHS Design of Experiment Fig 1 DOE Process Gallaway 07232014.jpg

Test of means – one factor experiment: One of the most common types of experiments is the comparison of two process methods, or of two methods of treatment. One of the most straightforward methods to evaluate a new process method is to plot the results on an SPC chart that also includes historical data from the baseline process with established control limits. Then we apply the standard rules to evaluate out-of-control conditions to see if the process has been shifted. One may need to collect several subgroups worth of data in order to make a determination, although a single subgroup could fall outside of the existing control limits. An alternative way to control chart approach is to use an F-test to compare the means of alternate treatments. This is done automatically with ANOVA (analysis of variance). Consider the following example in which three treatments are analyzed with the following data:

A Usual Route B (alternate) C(alternate) Variance Mean
Time in Minutes 27.0 26.0 29.5
31.03 33.0 25.0
28.5 26.5 28.5
26.0 27.5 25.5
27.5 29.0 24.0
29.0 27.5 24.0
33.0 26.5 28.0
35.0 27.0 26.0
28.0 28.0 25.5
29.0 32.0 26.5
Mean \((\bar Y)\) 29.4 28.3 26.6 1.99
Variance \(s^2\) 7.9 5.7 3.0 5.51

The F-test analysis is the basis for model evaluation of both single factor and multi-factor experiments. This analysis is commonly output as an ANOVA table by statistical analysis software as illustrated in the table below;

AVONA - Analysis of Variance Table
Source Sum of Squares DF Mean Square F-Ratio Probability Significant
Between Groups 39.80 2 19.90 3.61 0.0408 Yes
Within Groups 148.90 27 5.51
Total 188.70 29

0.0408: there is only 4.08% probability that a Model F-ratio this large could occur due to noise (random chance). In other words, the three routes differ significantly in terms of the time taken to reach home from work.

ANOVA: \(H_0: μ_1=⋯=μ_a\) vs. \(H_a\)at least one mean is different

Source Sum of Squares DF Mean Square F-Ratio
Between Groups $ SS_{treatment}= ∑_{i=1}^{a} n_i (\bar y_{l.}-\bar y..)^2 $ $a-1$ $ MS_{treatment}=\frac {SS_{treatment}}{a-1}$ $ F =\frac {MS_{treatment}}{MS_{error}}$
Within Groups $SS_{error} = SS_{total} - SS_{treatment}$ $N-a$ $MS_{error}=\frac{SS_{error}} {N-a}$
Total $SS_{treatment} = ∑_{i=1}^{a} ∑_{j=1}^{n_{i}} (\bar y_{ij}-\bar y..)^2 $ $N-1$

We reject the null hypothesis of equal treatment means if $F_0>F_{α,a-1,a(n-1)}.$

Note: a is the number of treatments, $n_i$ is the size of sample in the $i^{th}$ group, α is the level of significance, $y_{ij}$ is the measurement from group $i$, observation index $j$, $\bar{y}_{..}$ is the grand mean of all the observations, $\bar{y}_{i.}$ is the grand mean f the $i^{th}$ treatment group. This approach is discussed further in the ANOVA section.


  • This article presents an observational study of smoking effects on cancer. It presents various side effects of tobacco on human health and provides a guide to quit smoking. This article illustrates the entire study in ten sections, with each section being fully developed in a clearly stated in a questions and answers format. The whole article is well organized and will prepare people with enough knowledge of the reason behind quitting smoking as well as suggestions and programs to help people achieve this goal. This is a typical observational study.
  • This article brought together an empirical drag prediction model plus design of experiment, response surface, and data-fusion methods with computational fluid dynamics (CFD) to provide a wing optimization system. The system presented here allows high-quality designs to be found using a full three-dimensional CFD code without the expense of direct searches. The meta-models built are shown to be more accurate than either the initial empirical model or simple response surfaces based on the CFD data alone. Data fusion is achieved by building a response surface kriging of the differences between the two drag prediction tools, which are working at varying levels of fidelity. It then uses kriging with empirical tools to predict the drags coming from the CFD code, which is much quicker to use than direct searches of the CFD.
  • This article illustrated certain numerical approximations for finding one and two stage bioassay designs, which produce small posterior variance using a one-parameter logistic distribution. It discussed the use of two prior distribution: one for design and the other for inference with graphs in designing experiments when the prior distributions are normal. These graphs illustrate the importance of using additional dose levels when the variance of the prior distribution is large.



Suppose two researchers want to determine if aspirin reduces the chance of a heart attack. Researcher 1 studied the medical records of 500 patients, and she recorded whether each patient took aspirin every day and if the person had ever had a heart attack. Then she reported the percentage of heart attacks for the patients who took aspirin every day and for those who did not take aspirin every day.

Researcher 2 also studied 500 people. She randomly assigned half of the patients to take aspirin every day and the other half to take a placebo instead. After a certain length of time, she reported the percentage of heart attacks for the patients who took aspirin every day and for those who did not take aspirin every day.

Now suppose that both researchers found that there is a statistically significant difference in the heart attack rates for the aspirin users and the non-aspirin users, and that aspirin users had a lower rate of heart attacks. Can both researchers conclude that aspirin caused the reduction?

(a) No, only researcher 2 can conclude this.
(b) No, only researcher 1 can conclude this.
(c) Yes, because aspirin is known to reduce heart attacks.
(d) Yes, because aspirin users had a larger heart attack rate in both studies.

Suppose that you were hired as a statistical consultant to design a study to examine the impact of a new medicine vs. a current medicine on lowering blood pressure. 50 patients volunteer to participate in the study. What design will you recommend?

(a) Completely randomized design with two factors.
(b) Completely randomized design with two factors and single blind.
(c) Completely randomized design.
(d) Completely randomized design with two factors and double blind.

The next four questions are based on the following:

Hospital floors are usually covered by bare tiles. Carpets would cut down on noise but might be more likely to harbor germs. To study this possibility, investigators randomly assigned 8 of 16 available hospital rooms to have carpets installed, while the others were left bare. Later, air from each room was pumped over a dish of agar. The dish was incubated for a fixed period, and the number of bacteria colonies was counted.

  • 1. Select the appropriate statistical term for the 8 rooms left bare.
    (a) Treatments
    (b) Experimental Units
    (c) Control Group
    (d) Response
  • 2. Select the appropriate statistical term for the 16 hospital rooms.
    (a) Response
    (b) Treatments
    (c) Experimental Units
    (d) Control Group
  • 3. Select the appropriate statistical term for number of colonies in a dish.
    (a) Treatments
    (b) Control Group
    (c) Response
    (d) Experimental Units
  • 4. Select the appropriate statistical term for number of colonies in a dish.
    (a) Treatments
    (b) Response
    (c) Experimental Units
    (d) Control Group

A psychologist is examining the effect of showing pictures on learning of words by seven-year-olds. The seven-year-olds are randomly assigned to two groups. The experimental group is shown the word along with the picture. The control group is shown only the word. At the end of the experiment, the subjects are given a test on the number of words they get right. This is an example of:

(a) A blind study
(b) An experiment with a design flaw
(c) A double blind study
(d) A well-designed experiment

Suppose that students A and B are working for the university. The registrar asks student A to calculate the mean and SD of the GPA's for the Fall 2005 freshmen class. He asks student B to design a sampling strategy to evaluate the attitude of the undergraduates at the university toward undergraduate teaching.

(a) Student A is doing descriptive statistics and student B is doing inferential statistics.
(b) Student A is doing inferential statistics and student B is doing descriptive statistics.
(c) Both students are doing descriptive statistics.
(d) Both students are doing inferential statistics.

At the Department of Statistics, we intend to examine the effect of using computers in Statistics 10 on the attitudes of students toward statistics. We offer ten lectures of Statistics 10 in an academic year. Five of these sections are randomly assigned to the experimental group and the other five are assigned to the control group. The experimental group will go to lecture, section, and computer lab. The control group will only go to lecture and section, but will not do the computer lab. The attitude of the students toward statistics is measured before and after the course. This study is:

(a) A double blind study
(b) A well-designed experiment
(c) A blind study
(d) Not a randomized experiment

An office manager wonders whether there is any relationship between drinking coffee before 10 am and alertness. He selects at random 3 days of the week, and in those days, he compared the alertness level of 25 employees who usually drink coffee before 10 am and 25 employees who do not usually drink coffee before 10 am. Is this an observational or experimental study?

(a) We need more information to decide
(b) This is an experimental study
(c) This is an observational study
(d) This is a combination of experimental and observational study

A major car manufacturing company intends to find out if cars get better millage with premium instead of regular unleaded gasoline. They also would like to know if the size of the car has any effect on fuel economy. 96 volunteers who are similar in age, experience and style of driving participate in the study. The drivers are randomly assigned to the premium and regular groups. The drivers assigned to the premium and regular groups are then randomly assigned to drive a small, medium, or large car. All of the drivers are asked to keep a driving log. What is the design used for this study?

(a) randomized block design
(b) Completely randomized two factor experiment
(c) Completely randomized experiment with one factor
(d) Completely randomized experiment with matching

For this research situation, decide what statistical procedure would most likely be used to answer the research question posed. Assume all assumptions have been met for using the procedure. Is ethnicity related to political party affiliation (Republican, Democrat, Other)?

(a) Test the difference in means between two paired or dependent samples.
(b) Use a chi-squared test of association.
(c) Test one mean against a hypothesized constant.
(d) Test the difference between two means (independent samples).
(e) Test for a difference in more than two means (one way ANOVA).

In a large Midwestern university with 30 different departments, the university is considering eliminating standardized scores from their admission requirements. The university wants to find out whether the students agree with this plan. They decide to randomly select 100 students from each department, send them a survey, and follow up with a phone call if they do not return the survey within a week. What kind of sampling plan did they use?

(a) Stratified random sampling
(b) Simple random sampling
(c) Cluster sampling
(d) Multi-stage sampling


Translate this page:

Uk flag.gif

De flag.gif

Es flag.gif

Fr flag.gif

It flag.gif

Pt flag.gif

Jp flag.gif

Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Fi flag.gif

इस भाषा में
In flag.gif

No flag.png

Kr flag.gif

Cn flag.gif

Cn flag.gif

Ru flag.gif

Nl flag.gif

Gr flag.gif

Hr flag.gif

Česká republika
Cz flag.gif

Dk flag.gif

Pl flag.png

Ro flag.png

Se flag.gif