Skip to content

MANE 3332.05

Lecture 18

Agenda

  • Midterm exams are not graded; still contacting students who missed
  • Linear Combination Practice Problems (assigned 10/28, due 10/30)
  • Linea Combination Quiz (assigned 10/30, due 11/4)
  • Complete Chapter Six and Start Chapter 7
  • Attendance
  • Questions?

Handouts


Chapter 6, continued

Calculating Quantiles

reference for calculating quantiles


Quantile Example

Quantile Example


Exploratory Data (Graphical) Analysis

  • Exploratory data analysis (EDA) is the use of graphical procedures to analyze data.

  • John Tukey was a pioneer in this field and invented several of the procedures

  • Tools include stem-and-leaf diagrams, box plots, time series plots and digidot plots


Stem and Leaf Diagram

  • Excellent tool that maintains data integrity

  • The stem is the leading digit or digits

  • The leaf is the remaining digit

  • Make sure to include units

  • R Code

stem(midterm$MidtermExam)

Stem and Leaf Example

  • R output of a Stem and Leaf diagram

Stem and Leaf Plot of Midterm Exam Scores


Histogram

  • A histogram is a barchart displaying the frequency distribution information

  • There are three types of histograms: frequency, relative frequency and cumulative relative frequency

  • R code

hist(midterm$MidtermExam)

Histogram Example

  • R output of histogram

Histogram of Midterm Exam Scores


Boxplot

  • Graphical display that simultaneously describes several important features of a data set such as center, spread, departure from symmetry and outliers

  • Requires the calculation of quantiles (quartiles)

Box Plot 1

Box plot with explanation


Box Plot 2

examples of boxplots


Box Plot 3

  • R code for Box Plot
boxplot(midterm$MidtermExam,xlab='Score',main='Boxplot of Midterm Exam Scores')
  • R Box Plot output

Boxplot of Midterm Exam Scores


Time Series Plot

  • A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say \(x\)) and the horizontal axis denotes time

  • Excellent tool for detecting:

    • trends,

    • cycles,

    • other non-random patterns


Time Series Plot in R

Time Series Plot


Probability Plotting

  • Probability plotting is a graphical method of determining whether sample data conform to a hypothesized distribution

  • Used for validating assumptions

  • Alternative to hypothesis testing


Construction

  1. Sort the data from smallest to largest, . $$ x_{(1)},x_{(2)},\ldots,x_{(n)} $$

  2. Calculate the observed cumulative frequency \((j-0.5)/n\)

For the normal distribution find \(z_j\) that satisfies

\[ \frac{j-0.5}{n}=P(Z\leq z_j)=\Phi(z_j) \]
  1. Plot \(z_j\) versus \(x_{(j)}\) on special graph paper

Usage

  • If the data plots as a straight line, the assumed distribution is correct

normal probability plots from textbook, figure 6.21 on page 215


Probability Plot Example 1 in R

Normal Probability Plot


Probability Plot Example 2

  • Difficulty from example one is how close to straight is "good enough"
  • Add confidence bands to normal probability plot
    • Requires package car to be added to R
    • If all points are within the band, we are 95% confident that the sample is from a normal distribution. However if one or more points are not within band, the data is not from a normal distribution

QQ Plot with band


Multivariate Data

Matrix of Scatter Plot in R

Scatter Plots


Covariance in R

Covariance Matrix


Correlation

Correlation Matrix


Chapter 7 Overview

  • Chapter 7 contains a detailed explanation of point estimates for parameters

  • Much of this chapter is of a highly statistical nature and will not be covered in this course

  • Key concepts we will discuss are:

    • Statistical inference

    • Statistic

    • Sampling distribution

    • Point estimator

    • Unbiased estimate

    • MVUE estimator

    • Central limit theorem

    • Sampling distributions


Statistical Inference

  • Montgomery gives the following description of statistical inference.

    The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. There methods utilize the information contained in a sample from the population in drawing conclusions. This chapter begins our study of the statistical methods used for inference and decision making.

  • Statistical inference may be divided into two major areas: parameter estimation and hypothesis testing


Point Estimate

  • Montgomery states that "In practice, the engineer will use sample data to compute a number that is in some sense a reasonable value (or guess) of the true mean. This number is called a point estimate."

  • Discuss examples

  • A formal definition of a point estimate is

    A point estimate of some population parameter \(\theta\) is a single numerical value \(\hat{\theta}\) of a statistic \(\hat{\Theta}\). The statistic \(\hat{\Theta}\) is called the point estimate.

  • Notice the use of the "hat" notation to denote a point estimate


Statistic

  • Point estimate requires a sample of random observations, say \(X_1,X_2,\ldots,X_n\)

  • Any function of the sampled random variables is called a statistic

  • The function of the random variables is itself a random variable

  • Thus, the sample mean \(\bar{x}\) and the sample variance \(s^2\) are both statistics and random variables


Properties of point estimators

  • We would like point estimates to be both accurate and precise

  • An unbiased estimator addresses the accuracy criteria

  • A minimum variance unbiased estimator addresses the precision criteria


Unbiased Estimator

  • The point estimator \(\hat{\Theta}\) is an unbiased estimator for the parameter \(\theta\) if
\[ E\left(\hat{\Theta}\right)=\theta \]
  • If the point estimator is not unbiased, then the difference
\[ E\left(\hat{\Theta}\right)-\theta \]

is called the bias of the estimator \(\hat{\Theta}\)


MVUE

  • Montgomery gives the following definition of a minimum variance unbiased estimator (MVUE)

    If we consider all unbiased estimators of \(\theta\), the one with the smallest variance is called the minimum variance unbiased estimator

  • An import fact is that the sample mean \(\bar{x}\) is the MVUE for \(\mu\) when the data comes from a normal distribution


Accuracy vs. Precision

graph of accuracy vs. precision


Sampling Distribution

  • The probability distribution of a statistic is called a sampling distribution

Central Limit Theorem

  • Definition of the Central Limit Theorem is

    If \(X_1,X_2,\ldots,X_n\) is a random sample of size \(n\) taken from a population (either finite or infinite) with mean \(\mu\) and finite variance \(\sigma^2\), and if \(\overline{X}\) is the sample mean, the limiting form of the distribution of

\[ Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}} \]

as \(n\rightarrow\infty\), is the standard normal distribution

  • Important result because for sufficiently large \(n\), the sampling distribution of \(\overline{X}\) is normally distribution

  • This is a fundamental result that will be used extensively in the next four chapters of the textbook.