Skip to content

MANE 3332.05

Lecture 1, September 2

Agenda

  • Discuss Syllabus
  • Me Talk
  • Grit Lesson
  • Lecture - Chapter 1
  • Call roll

Handouts


Statistics and Statistical Thinking

"The field of statistics deals with the collection, presentation, analysis and use of data to make decisions, solve problems, and design products and processes."

image


Statistics

  • Moore (Ostle, Turner, Hicks and McElrath 1996) defines statistics as "the science of gaining information in the face of uncertainty."

  • Generally the field of statistics is divided into two major branches: descriptive and inferential


Descriptive Statistics

  • Devore and Farnum (1999) define descriptive statistics in the following manner.

    an investigator who has collected data may wish simply to summarize and describe important features of the data. This entails using methods from descriptive statistics. Some of these methods are graphical in nature--the construction of histograms, boxplots, and scatter plots are primary examples. Other descriptive methods involve calculation of numerical summary measures, such as means, standard deviations and correlation coefficients.

  • Chapter 6 of the textbook emphasize descriptive statistics


image


Inferential Statistics

  • Devore and Farnum (1999) gives the following description of inferential statistics

    Having obtained a sample from a population, an investigator would frequently like to use sample information to draw some type of conclusion (make an inference of some sort) about the population. That is, the sample is a means to an end rather than an end in itself. Techniques for generalizing from a sample to a population are gathered within the branch of our discipline called inferential statistics.

  • The field of inferential statistics can be further subdivided into two general areas: estimation and hypothesis testing

  • Chapters 4 - 14 of the textbook focus on inferential statistics


Statistical Thinking

  • The textbook points out that statistical methods are used to help us describe and understand variability

  • Variability is the differences in successive observations of a system or phenomenon

  • Vining (1998) gives the following definition of statistical thinking

    Only by "thinking statistically" can engineers truly address the problems inherent in the variability in real data. When we think statistically, we come to know that all decisions based on real data involve risk and uncertainty. Good decisions require us to quantify this risk. As we become more mature in our thinking, we understand that there are sources or causes of variability. Discovering these sources and removing them are often the keys to engineering success.


Population vs. Sample

The concept of populations, samples, parameters and statistics is very important. The following definitions are taken from Ostle, Turner, Hicks and McElrath (1996)

  • Population: the totality of all possible values (measurements, counts, and so on) of a particular characteristic for a specific group of objects

  • Population parameters: a numerical descriptive measure of a population characteristics

  • Sample: a portion of the population that is selected according to some rule or plan

  • Sample statistics: a numerical descriptive measure of a particular characteristic based upon the sample values


Enumerative vs. Analytical Studies

  • Deming introduced the concept of enumerative versus analytical studies

  • Enumerative study is one in which a sample is used to make inference on the current population. This is the safest use of statistical estimation.

  • Analytical study is one in which inference is applied to future populations. There is nothing wrong with this approach. However, you must be aware that there is an inherent assumption of stability


Data

  • Most statistical methods are data-driven

  • Data are almost always a sample from a population or populations

  • Engineering data are usually collected in 3 ways:

    • A retrospective study based on historical data,

    • An observational study,

    • A designed experiment


"Happenstance" Data

  • Box, Hunter and Hunter (1978) group retrospective studies and observational studies as "happenstance" data

  • They point out the following dangers:

    1. Inconsistent data

    2. Range of variables limited by control

    3. Semiconfounding of effects

    4. Nonsense correlation -- beware the lurking variable

    5. Serially correlated errors

    6. Dynamic relationships

    7. Feedback

  • So why use happenstance data?


Designed Experiments

  • "In a designed experiment, the engineer makes deliberate or purposeful changed in controllable variables (called factors) of the system, observes the resulting system output, and then makes a decision or an inference about which variables are responsible for the changed that he or she observes in the output performance."

  • "An important distinction between a designed experiment and either an observational or retrospective study is that the different combinations of the factors of interest are applied randomly to a set of experimental units."

  • Box, Hunter and Hunter (1978) present a table (shown below) that clarifies how designed experiments avoid the problems that occur in the analysis of happenstance data


image


Data Collected Over Time

  • Often data is collected over time (either retrospective, observational or designed experiments)

  • Most elementary statistical techniques assume that the observations are independent (not always a good assumption)

  • The correct term for data collected over time is a time series

  • Time series analysis does not assume that the observations are independent over time


Models

  • "Models play an important part in engineering analysis"

  • From the statistical point of view, we will divide models into two categories: mechanistic and empirical

  • All models have these characteristics:

    • One or more observed outcomes that we wish to understand or predict

    • These outcomes are referred to as the response or dependent variable(s)

    • The set of variables (factors) that influence response variables is called the independent or regressor variables

    • A functional relationship between the dependent and independent variables


Mechanistic Models

  • Mechanistic models are based upon our understanding of the physical systems affecting the response variable

  • An engineer uses his knowledge, experience and training in mathematics, physics, chemistry and engineering to develop a mathematical expression that defines the response variable as a function of the regressor variable

  • Textbook uses Ohm's law; consider a wind generator

  • Statistical techniques can augment mechanistic models


Empirical Models

  • There are many situations in which engineers and scientist do not have a clear understanding of the physical systems of a phenomenon. However, there is some knowledge that a set of regressor variables influences a response variable(s)

  • An empirical model is not build upon explicit knowledge of the physical phenomenon

  • Empirical variables do not prove or disprove that a variable has an affect on the response variable(s)


Regression Models

  • The most popular method of developing empirical models is the use of linear regression models

  • It is assumed that the response variable can be modelled by a low-order polynomial equation of the regressor variables

  • Very common approach

  • Consider the example from Lawson and Erjavec (2001)


image


image


image


image


image


image


image


image


image