MANE 6313

Week 12, Module A

Student Learning Outcome

Select an appropriate experimental design with one or more factors,
Select an appropriate model with one or more factors,
Evaluate statistical analyses of experimental designs,
Assess the model adequacy of any experimental design, and
Interpret model results.

Module Learning Outcome

Describe linear regression.

Introduction to Linear Regression

We are interested in a relationship between a single dependent variable or response \(y\) that depends on \(k\) independent or regressor variables.
We assume that there is some mathematical function \(y=\phi(x_1,x_2,\ldots,x_k)\). In general, we don't know this function
We'll use low order polynomial equations as an approximating function. This is called empirical modeling.
What are methods that we can determine if there is a relationship between two (or more) variables?

Relationship between two or more variables

Example 12.8

Example is from Walpole, Meyers and Meyers (1998)¹

Linear regression models

In general they look like

\[ y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_{11}x_1^2+\beta_{22}x_2^2+\beta_{12}x_1x_2+\varepsilon \]

This model is linear in the parameters \(\beta\)

See graphical explanation from Ott.

Figure 11.2

Taken from Longnecker and Ott (2001)²

Estimation of Parameters

Parameter estimates are derived using least squares. Goal is to minimize the squared error.
Parameter estimation can be done algebraically or using linear algebra. Montgomery focuses on linear algebra formulation.
In general, the matrix formulation is used. Model is defined to be

\[ \mathbf{y=X\beta+\varepsilon} \]

The least squares estimates are found by minimizing

\[ L=\sum_i\varepsilon_i^2=\varepsilon^\prime\varepsilon=\mathbf{(y-X\beta)^\prime(y-X\beta)} \]

The least squares estimates must satisfy

\[ \frac{\partial L}{\partial\mathbf{\beta}}\big|_\mathbf{\hat{\beta}}=-2\mathbf{X^\prime y+2X^\prime X\hat{\beta}}=0 \]

Which leads to the solution

\[ \mathbf{\hat{\beta}=(X^\prime X)^{-1}X^\prime y} \]

We can define the predicted response to be

\[ \mathbf{\hat{y}=X\hat{\beta}} \]

The residuals are defined to be

\[ \mathbf{e=y-\hat{y}} \]

Thus the sum of squares errors can be shown to be

\[ \begin{aligned} SS_E&=&(\mathbf{y-X\hat{\beta}})^\prime (\mathbf{y-X\hat{\beta}})\\ &=&\mathbf{y^\prime y - \hat{\beta}^\prime X^\prime y} \end{aligned} \]

Coding Variables

From the example, there were two ways to represent the same problem, coded and uncoded
Why use coded variables³
- Computational ease and increased accuracy in estimating the model coefficients
- Enhanced interpretability of the coefficient estimates in the model.
Internally most statistical software codes for estimating parameters

Plot: Example 12.8

Example 12.8 Scatter Plot

Regression: Example 12.8

Example 12.8 lm() Output

Values from lm() function

Source: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm

Values for lm() documentation

Walpole, Myers, and Myers (1998). Probability and Statistics for Engineers and Scientists, 6th edition. Prentice-Hall. ↩
Longnecker and Ott (2001). An Introduction to Statistical Methods and Data Analysis, 5th edition. Duxbury. ↩
Khuri, and Cornell (1987). Response Surfaces: Designs and Analyses. Dekker ↩