Ideally all variables other than those included in an experiment are held constant or blocked out in a controlled fashion. However, sometimes a variable that one knows will create an important effect, such as ambient temperature or humidity, cannot be controlled. In such cases it pays to collect measurements run by run. Then the results can be analyzed with and without this ‘covariate.’
Douglas Montgomery provides a great example of analysis of covariance in section 15.3 of his textbook Design and Analysis of Experiments. It details a simple comparative experiment aimed at assessing the breaking strength in pounds of monofilament-fiber produced by three machines. The process engineer collected five samples at random from each machine, measuring the diameter of each (knowing this could affect the outcome) and testing them out. The results by machine are shown below with the diameters, measured in mils (thousandths of an inch), provided in the parentheses:
The data on diameter can be easily captured via a second response column alongside the strength measures. Montgomery reports that “there is no reason to believe that machines produce fibers of different diameters.” Therefore, creating a new factor column, copying in the diameters and regressing out its impact on strength leads to a clearer view of the differences attributed to the machines.
I will now show you the procedure for handling a covariate with Stat-Ease software. However, before doing so, analyze the experiment as planned and save this work so you can do a before and after comparison.
Figure 1 illustrates how to insert a new factor. As seen in the screenshot, I recommend this be done before the first controlled factor.
Figure 1: Inserting a new factor column for the covariate entered initially as a response
The Edit Info dialog box then appears. Type in the name and units of measure for the covariate and the actual range from low to high.
Figure 2: Detailing the covariate as a factor, including the actual range
Press “Yes” to confirm the change in actual values when the warning pops up.
Figure 3: Warning about actual values.
After the new factor column appears, the rows will be crossed out. However, when you copy over the covariate data, the software stops being so ‘cross’ (pun intended).
Press ahead to the analysis. Include only the main effect of the covariate in your model. The remainder of the terms involving controlled factors may go beyond linear if estimable. As a start, select the same terms as done before adding the covariate.
In this case, the model must be linear due to there being only one factor (machine) and it being categorical. The p-value on the effect increases from 0.0442 (significant at p<0.05) with only the machine modeled—not the diameter—to 0.1181 (not significant!) with diameter included as a covariate. The story becomes even more interesting by viewing the effects plots.
Figure 4: No covariate.
Figure 5: With covariate accounted for.
You can see that the least significant difference (LSD) bars decrease considerably from Figure 4 to Figure 5 without and with the covariate; respectively. That is a good sign—the fitting becomes far more precise by taking diameter (the covariate) into account. However, as Montgomery says, the process engineer reaches “exactly the opposite conclusion”—Machine 3 looking very weak (literally!) without considering the monofilament diameter, but when doing the covariate analysis, it becomes more closely aligned with the other two machines.
In conclusion, this case illustrates the value of recording external variables run-by-run throughout your experiment whenever possible. They then can be studied via covariate analysis for a more precise model of your factors and their effects.
This case is a bit tricky due to the question of whether fiber strength by machine differs due to them producing differing diameters, in which case this should be modeled as the primary response. A far less problematic example would be an experiment investigating the drying time of different types of paint in an uncontrolled environment. Obviously, the type of paint does not affect the temperature or humidity. By recording ambient conditions, the coating researcher could then see if they varied greatly during the experiment and, if so, include the data on these uncontrolled variables in the model via covariate analysis. That would be very wise!
PS: Joe Carriere, a fellow consultant at Stat-Ease, suggested I discuss this topic—very appealing to me as a chemical process engineer. He found the monofilament machine example, which I found very helpful (also good by seeing agreement in statistical results between our software and the one used by Montgomery).
PPS: For more advice on covariates, see this topic Help.
Here's the latest Publication Roundup! In these monthly posts, we'll feature recent papers that cited Design-Expert® or Stat-Ease® 360 software. Please submit your paper to us if you haven't seen it featured yet!
Implementation of the QbD Approach to the Analytical Method Development and Validation for the Estimation of the Treprostinil Injection Dosage Form by RP-HPLC
ACS Omega, 2025
Authors: Narasimha Raju Alluri, Mallikharjuna Rao Bandlamudi, Sujatha Kuppusamy, Shabna Roupal Morais
Mark's comments: This one hits the spot for me by deploying a solid central composite design, being succinct in presenting only the most relevant statistics on significance and model fits, showing compelling 3D pictures, and providing the data via a link to supplementary material. It is great to see how response surface methods done with Stat-Ease software led to a “novel, precise, sensitive, stable, and cost-effective” analytical method. Well done!
Be sure to check out this important study, and the other research listed below!
Here's the latest Publication Roundup! In these monthly posts, we'll feature recent papers that cited Design-Expert® or Stat-Ease® 360 software. Please submit your paper to us if you haven't seen it featured yet!
Material-sparing degradation-kinetics model for thermolabile drug stability assessment during twin-screw melt granulation – insights with gabapentin
International Journal of Pharmaceutics, Volume 674, 15 April 2025, 125421
Authors: Adwait Pradhan, Fengyuan Yang, Kapish Karan, Thomas Durig, Quyen Schwing, Brian Haight, Mark Costello, Mark Anderson, Feng Zhang
Mark's comments: "It was a pleasure to help Adwait, et al apply response surface methods (RSM) for process optimization of twin-screw melt granulation to mininimze degradation of life-enhancing drugs such as gabapentin."
Be sure to check out this important study, and the other research listed below!
Next in our 40th anniversary “Ask an Expert” blog series is John "Jay" Davies, who's an absolute rock star when it comes to teaching and implementing DOE. He's lent us his expertise before - see this talk from our 2022 Online DOE Summit - and he shared an anecdote with our statistical experts about how he approaches switching to DOE methods when working with new groups in the Army. He kindly agreed to let us publish it as part of this series.
For the past 14 years, I’ve been a Research Physicist with the U.S. Army DEVCOM Chemical Biological Center at Aberdeen Proving Grounds, MD as a member of the Decontamination Sciences Branch, which specializes in developing techniques/chemistries to neutralize chemical warfare agents. I’m dedicated to applied statistical analysis ranging from multi-laboratory precision studies to design of experiments (DOE). The Decontamination Sciences Branch has been integrating DOE methods into many of their chemical agent decontamination research programs.
I’m happy to report that the DOE methods here at the Chemical Biological Center are really catching on. I collaborated on 24 DOEs from 2014 through 2021. Then, in 2021-22, we completed 26 DOEs across 10 different programs. I’ve been doing a lot of mixture-process DOEs with the Bio Sciences groups for synthetic biology and bio manufacturing applications, and once the other groups saw the information that we were getting from just a single day’s worth of data, they too wanted to try DOE.
Lately, I’ve changed the formula that I use for the initial consultations when visiting a group that has expressed an interest in DOE but has never used DOE before. Previously we’d go right into their project, and I’d tell them how we might construct a DOE for their application. However, I’ve found that it’s too much of a culture shock if we go right into talking about what a DOE for their application might look like. Instead, especially if I’m working with a group that has no DOE experience at all, I now devote about 1 hour to discuss DOE methods in general before we even mention their actual application. In this discussion, I reveal the major differences that they are going to see with a DOE, which are:
Recently, I was following this format with a group that had never used DOE before. We had a great back-and-forth dialog as I went through the bullets above and explained a bit about each point. They asked many questions and were really following along. Then, after about an hour we got into their application and I just sketched out a prototype quadratic mixture-process DOE that I thought would give them a good idea of what the initial DOE might look like, with 30 samples in total. I then went over what some simulated outputs for the DOE generated prediction model might look like. At this point one of them stopped me and with a very perplexed look on his face, said “hold on, hold on…wait a minute here. Are you telling us that if we run just those 30 samples, we would be able to predict the optimal formulation and the optimal process setting for this system?“
This scientist had been following along, asking questions and really absorbing the information in the past hour as we walked through the DOE basics, but I could see that at that moment things were just sinking in. He realized the ramifications of what we had just discussed – typically, this group might have had to run several hundreds of samples to characterize similar systems, but with DOE they would only need about 30 samples. I responded to his question saying, “Yes that’s exactly what I’m telling you. We’ve run dozens of these mixture-process DOEs, many of them much more complex than this system, and they do work.” This individual, a mid-career researcher, then responded, “How is it possible that we have not heard of this stuff before?” I told him, “I can’t give you a good answer to that one.”
And there you have it! Let us know if you want to talk about saving time & money with DOE: our statistical experts and first-in-class software make it easier than ever.
DISTRIBUTION STATEMENT A. Approved for Public Release: Distribution is Unlimited
Here's the latest Publication Roundup! In these monthly posts, we'll feature recent papers that cited Design-Expert® or Stat-Ease® 360 software. Please submit your paper to us if you haven't seen it featured yet!
Mark's comment: Make sure to check out article #10, where the authors deploy response surface methods (with lots of impressive 3D plots!) to produce an eco-friendly material for civil engineering.