Design of Experiments

Passive data collection leads to a number of problems in statistical modeling. Observed changes in a response variable may be correlated with, but not caused by, observed changes in individual factors (process variables). Simultaneous changes in multiple factors may produce interactions that are difficult to separate into individual effects. Observations may be dependent, while a model of the data considers them to be independent.

Designed experiments address these problems. In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters.

For example, a simple model of a response y in an experiment with two controlled factors x₁ and x₂ might look like this:

$y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{1} x_{2} + ε$

Here ε includes both experimental error and the effects of any uncontrolled factors in the experiment. The terms β₁x₁ and β₂x₂ are main effects and the term β₃x₁x₂ is a two-way interaction effect. A designed experiment would systematically manipulate x₁ and x₂ while measuring y, with the objective of accurately estimating β₀, β₁, β₂, and β₃.