Stat-Ease Blog

Blog

Salvaging a designed experiment via covariate analysis

posted by Mark Anderson on May 16, 2025

Ideally all variables other than those included in an experiment are held constant or blocked out in a controlled fashion. However, sometimes a variable that one knows will create an important effect, such as ambient temperature or humidity, cannot be controlled. In such cases it pays to collect measurements run by run. Then the results can be analyzed with and without this ‘covariate.’

Douglas Montgomery provides a great example of analysis of covariance in section 15.3 of his textbook Design and Analysis of Experiments. It details a simple comparative experiment aimed at assessing the breaking strength in pounds of monofilament-fiber produced by three machines. The process engineer collected five samples at random from each machine, measuring the diameter of each (knowing this could affect the outcome) and testing them out. The results by machine are shown below with the diameters, measured in mils (thousandths of an inch), provided in the parentheses:

  1. 36 (20), 41 (25), 39 (24), 42 (25), 49 (32)
  2. 40 (22), 48 (28), 39 (22), 45 (30), 44 (28)
  3. 35 (21), 37 (23), 42 (26), 34 (21), 32 (15)

The data on diameter can be easily captured via a second response column alongside the strength measures. Montgomery reports that “there is no reason to believe that machines produce fibers of different diameters.” Therefore, creating a new factor column, copying in the diameters and regressing out its impact on strength leads to a clearer view of the differences attributed to the machines.

I will now show you the procedure for handling a covariate with Stat-Ease software. However, before doing so, analyze the experiment as planned and save this work so you can do a before and after comparison.

Figure 1 illustrates how to insert a new factor. As seen in the screenshot, I recommend this be done before the first controlled factor.


Design-Expert software screenshot showing the right-click menu for a factor.

Figure 1: Inserting a new factor column for the covariate entered initially as a response

The Edit Info dialog box then appears. Type in the name and units of measure for the covariate and the actual range from low to high.


Edit factor info dialogue box

Figure 2: Detailing the covariate as a factor, including the actual range

Press “Yes” to confirm the change in actual values when the warning pops up.


Warning box for changing actual values to coded values.

Figure 3: Warning about actual values.

After the new factor column appears, the rows will be crossed out. However, when you copy over the covariate data, the software stops being so ‘cross’ (pun intended).

Press ahead to the analysis. Include only the main effect of the covariate in your model. The remainder of the terms involving controlled factors may go beyond linear if estimable. As a start, select the same terms as done before adding the covariate.

In this case, the model must be linear due to there being only one factor (machine) and it being categorical. The p-value on the effect increases from 0.0442 (significant at p<0.05) with only the machine modeled—not the diameter—to 0.1181 (not significant!) with diameter included as a covariate. The story becomes even more interesting by viewing the effects plots.


Effect plot for Strength without covariate.

Figure 4: No covariate.

Effect plot for Strength with covariate.

Figure 5: With covariate accounted for.

You can see that the least significant difference (LSD) bars decrease considerably from Figure 4 to Figure 5 without and with the covariate; respectively. That is a good sign—the fitting becomes far more precise by taking diameter (the covariate) into account. However, as Montgomery says, the process engineer reaches “exactly the opposite conclusion”—Machine 3 looking very weak (literally!) without considering the monofilament diameter, but when doing the covariate analysis, it becomes more closely aligned with the other two machines.

In conclusion, this case illustrates the value of recording external variables run-by-run throughout your experiment whenever possible. They then can be studied via covariate analysis for a more precise model of your factors and their effects.

This case is a bit tricky due to the question of whether fiber strength by machine differs due to them producing differing diameters, in which case this should be modeled as the primary response. A far less problematic example would be an experiment investigating the drying time of different types of paint in an uncontrolled environment. Obviously, the type of paint does not affect the temperature or humidity. By recording ambient conditions, the coating researcher could then see if they varied greatly during the experiment and, if so, include the data on these uncontrolled variables in the model via covariate analysis. That would be very wise!

PS: Joe Carriere, a fellow consultant at Stat-Ease, suggested I discuss this topic—very appealing to me as a chemical process engineer. He found the monofilament machine example, which I found very helpful (also good by seeing agreement in statistical results between our software and the one used by Montgomery).

PPS: For more advice on covariates, see this topic Help.