Skip to content

MANE 6313

Week 12, Module A

Student Learning Outcome

  • Select an appropriate experimental design with one or more factors,
  • Select an appropriate model with one or more factors,
  • Evaluate statistical analyses of experimental designs,
  • Assess the model adequacy of any experimental design, and
  • Interpret model results.

Module Learning Outcome

Describe linear regression.


Introduction to Linear Regression

  • We are interested in a relationship between a single dependent variable or response \(y\) that depends on \(k\) independent or regressor variables.

  • We assume that there is some mathematical function \(y=\phi(x_1,x_2,\ldots,x_k)\). In general, we don't know this function

  • We'll use low order polynomial equations as an approximating function. This is called empirical modeling.

  • What are methods that we can determine if there is a relationship between two (or more) variables?


Relationship between two or more variables

Example 12.8

Example is from Walpole, Meyers and Meyers (1998)1


Linear regression models

  • In general they look like
\[ y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_{11}x_1^2+\beta_{22}x_2^2+\beta_{12}x_1x_2+\varepsilon \]
  • This model is linear in the parameters \(\beta\)

  • See graphical explanation from Ott.

Figure 11.2

Taken from Longnecker and Ott (2001)2


Estimation of Parameters

  • Parameter estimates are derived using least squares. Goal is to minimize the squared error.
  • Parameter estimation can be done algebraically or using linear algebra. Montgomery focuses on linear algebra formulation.
  • In general, the matrix formulation is used. Model is defined to be
\[ \mathbf{y=X\beta+\varepsilon} \]
  • The least squares estimates are found by minimizing
\[ L=\sum_i\varepsilon_i^2=\varepsilon^\prime\varepsilon=\mathbf{(y-X\beta)^\prime(y-X\beta)} \]
  • The least squares estimates must satisfy
\[ \frac{\partial L}{\partial\mathbf{\beta}}\big|_\mathbf{\hat{\beta}}=-2\mathbf{X^\prime y+2X^\prime X\hat{\beta}}=0 \]

  • Which leads to the solution
\[ \mathbf{\hat{\beta}=(X^\prime X)^{-1}X^\prime y} \]
  • We can define the predicted response to be
\[ \mathbf{\hat{y}=X\hat{\beta}} \]
  • The residuals are defined to be
\[ \mathbf{e=y-\hat{y}} \]
  • Thus the sum of squares errors can be shown to be
\[ \begin{aligned} SS_E&=&(\mathbf{y-X\hat{\beta}})^\prime (\mathbf{y-X\hat{\beta}})\\ &=&\mathbf{y^\prime y - \hat{\beta}^\prime X^\prime y} \end{aligned} \]

Coding Variables

  • From the example, there were two ways to represent the same problem, coded and uncoded

  • Why use coded variables3

    • Computational ease and increased accuracy in estimating the model coefficients

    • Enhanced interpretability of the coefficient estimates in the model.

  • Internally most statistical software codes for estimating parameters


Plot: Example 12.8

Example 12.8 Scatter Plot


Regression: Example 12.8

Example 12.8 lm() Output


Values from lm() function

Source: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm

Values for lm() documentation



  1. Walpole, Myers, and Myers (1998). Probability and Statistics for Engineers and Scientists, 6th edition. Prentice-Hall. 

  2. Longnecker and Ott (2001). An Introduction to Statistical Methods and Data Analysis, 5th edition. Duxbury. 

  3. Khuri, and Cornell (1987). Response Surfaces: Designs and Analyses. Dekker