Title: Fundamentals of Statistics
1Fundamentals of Statistics
2Statistics?
- A collection of quantitative data from a sample
or population. - The science that deals with the collection,
tabulation, analysis, interpretation, and
presentation of quantitative data.
3Statistic types
- Deductive or descriptive statistics
- describe and analyze a complete data set
- Inductive statistics
- deal with a limited amount of data (sample).
- Conclusions probability?
4Population
- A population is any entire collection of people,
animals, plants or things from which we may
collect data. - It is the entire group we are interested in,
which we wish to describe or draw conclusions
about. - For each population there are many possible
samples.
5Sample
- A sample is a group of units selected from a
larger group (population). - By studying the sample it is hoped to draw valid
conclusions about population. - The sample should be representative of the
general population. - The best way is by random sampling.
6Parameter
- A parameter is a value, usually unknown (and
which therefore has to be estimated), used to
represent a certain population characteristic. - For example, the population mean is a parameter
that is often used to indicate the average value
of a quantity.
7Statistics
Parameters ???????2
Inferential Statistics
POPULATION
Deductive
SAMPLE
Statistics x, s, s2
Inductive
8Inferential Statistics
- Statistical Inference makes use of information
from a sample to draw conclusions (inferences)
about the population from which the sample was
taken.
9Types of data
- Variables data
- quality characteristics that are measurable
values. - measurable and normally continuous
- may take on any value - eg. weight in kg
- Attribute data
- quality characteristics that are observed to be
either present or absent, conforming or
nonconforming. - countable and normally discrete integer - eg 0,
1, 5, 25, , but cannot 4.65
10Accurate and Precise
- Data life of light bulb 995.6 h
- The value of 995.632 h, is too accurate
unnecessary - Keyway spec lower limit 9.52 mm, upper limit
9.58 mm data collected to the nearest 0.001 mm,
and rounded to nearest 0.01 mm.
11Accurate and Precise
- Measuring instruments may not give a true reading
because of problems due to accuracy and
precision. - Data 0.9532, 0.9534 0.953
- Data 0.9535, 0.9537 0.954
- If the last digit is 5 or greater, rounded up
12Describing the Data
- Graphical
- Plot or picture of a frequency distribution.
- Analytical
- Summarize data by computing a measure of central
tendensy and dispersion.
13Sampling Methods
- Sampling methods are methods for selecting a
sample from the population - Simple random sampling - equal chance for each
member of the population to be selected for the
sample. - Systematic sampling - the process of selecting
every n-th member of the population arranged in a
list. - Stratified sample - obtained by dividing the
population into subgroups and then randomly
selecting from each subgroups. - Cluster sampling - In cluster sampling groups are
selected rather than individuals. - Incidental or convenience sampling - Incidental
or convenience sampling is taking an intact group
(e.g. your own forth grade class of pupils)
14Frequency Distribution
- Consider the following set of data which are the
high temperatures recorded for 30 consequetive
days. - We wish to summarize this data by creating a
frequency distribution of the temperatures.
15To create a frequency distribution
- Identify the highest and lowest values (51 43).
- Create a column with variable, in this case temp.
- Enter the highest score at the top, and include
all values within the range from highest score to
lowest score. - Create a tally column to keep track of the
scores. - Create a frequency column.
- At the bottom of the frequency column record the
total frequency.
16To create a frequency distribution
- Identify the highest and lowest values (51 43).
- Create a column with variable, in this case temp.
- Enter the highest score at the top, and include
all values within the range from highest score to
lowest score. - Create a tally column to keep track of the
scores. - Create a frequency column.
- At the bottom of the frequency column record the
total frequency.
17Frequency Distribution
18Cummulative Frequency Distribution
- A cummulative freq distribution can be created by
adding an additional column called "Cummulative
Frequency." - The cum. frequency for a given value can be
obtained by adding the frequency for the value to
the cummulative value for the value below the
given value. - For example The cum. frequency for 45 is 10
which is the cum. frequency for 44 (6) plus the
frequency for 45 (4). - Finally, notice that the cum. frequency for the
highest value should be the same as the total of
the frequency column.
19(No Transcript)
20Grouped frequency distribution
- In some cases it is necessary to group the values
of the data to summarize the data properly. - Eg., we wish to create a freq. distribution for
the IQ scores of 30 pupils. - The IQ scores in the range 73 to 139.
- To include these scores in a freq. distribution
we would need 67 different score values (139 down
to 73). - This would not summarize the data very much.
- To solve this problem we would group scores
together and create a grouped freq. distribution. - If data has more than 20 score values, we should
create a grouped freq. distribution by grouping
score values together into class intervals.
21To create a grouped frequency distribution
- select an interval size (7-20 class intervals)
- create a class interval column and list each of
the class intervals - each interval must be the same size, they must
not overlap, there may be no gaps within the
range of class intervals - create a tally column (optional)
- create a midpoint column for interval midpoints
- create a frequency column
- enter N sum value at the bottom of the
frequency column
22Grouped frequency
- Look at the following data of high temperatures
for 50 days. - The highest temperature is 59 and the lowest
temperature is 39. - We would have 21 temperature values.
- This is greater than 20 values so we should
create a grouped frequency distribution.
23(No Transcript)
24Cumulative grouped frequency distribution
25To create a histogram from this frequency
distribution
- Arrange the values along the abscissa (horizonal
axis) of the graph - Create a ordinate (vertical axis) that is
approximately three fourths the length of the
abscissa, to contain the range of scores for the
frequencies. - Create the body of the histogram by drawing a bar
or column, the length of which represents the
frequency for each age value. - Provide a title for the histogram.
26High temperatures for 50 days
Frequency
Temperatures
27- Histograms
- Constructing a Histogram for Discrete Data
- First, determine the frequency and relative
frequency of each x value. - Then mark possible x value on a horizontal scale.
28Cara Menyediakan Histogram -Grouped Data
- Tentukan nilai perbezaan, R nilai terbesar
nilai terkecil atau R Xh - Xl - Dapatkan bilangan turus histogram,
- Kira lebar turus, h R/t
- Nilai permulaan turus nilai terkecil data
(h/2) atau Xl (h/2) - Lukis histogram.
29- Histograms
- Constructing a Histogram for Continuous Data
equal class width -
Number of classes ?
Data
Relative frequency
30Bar Graph
- A bar graph is similar to a histogram except that
the bars or columns are seperated from one
another by a space rather than being contingent
to one another. - The bar graph is used to represent categorical or
discrete data, that is data at the nominal or
ordinal level of measurement. - The variable levels are not continuous.
31Bar Graph
32Descriptive statistics
- Measures of Central Tendency
- Describes the center position of the data
- Mean, Median, Mode
- Measures of Dispersion
- Describes the spread of the data
- Range, Variance, Standard deviation
33Measures of central tendency Mean
- Arithmetic mean x
- where xi is one observation, ? means add up
what follows and N is the number of observations - So, for example, if the data are 0,2,5,9,12 the
mean is (025912)/5 28/5 5.6
34Mean for a Population
35Mean for a Sample
- Ungrouped data
- Grouped data
n number of observed values
n sum of the frequencies h number of cells or
number observed values Xi cell midpoint
36Example - ungrouped data
- Resistance of 5 coils 3.35, 3.37, 3.28, 3.34,
3.30 ohm. - The average
37Example - grouped data
- Frequency Distributions of the life of 320 tires
in 1000 km
38Measures of Location
Data
Provided that data is in increasing order
e.g. data 2, 2, 3, 4, 15
- Median is less sensitive to outliers.
39Median - mode
- Median the observation in the middle of
sorted data - Mode the most frequently occurring value
40Median and mode
100 91 85 84 75 72 72 69 65
Mode
Median
Mean 79.22
41Median
- Grouped data
- Lm lower boundary with the median
- cfm cumulative freq. all cells below Lm
- fm freq. median
- i cell interval
42Median - Grouped technique
- Use data from table above (Frequency
Distributions of the life of 320 tires in 1000
km). - The halfway point (320/2 160) is reached in the
cell with midpoint value of 37.0 and a lower
limit of 35.6. - The cumulative frequency is 4365163 is 154,
the cell interval is 3, and the frequency of the
median cell is 58 - Median 35.9 x 1000 km 35900 km.
43Measures of dispersion range
- The range is calculated by taking the maximum
value and subtracting the minimum value.
2 4 6 8 10 12 14
Range 14 - 2 12
44Measures of dispersion variance
- Calculate the deviation from the mean for every
observation. - Square each deviation
- Add them up and divide by the number of
observations
45Variance for a population
- The formula for the variance for a population
using the deviation score method is as follows - The mean 28/7 4
- The population variance
46Measures of dispersion standard deviation
- The standard deviation is the square root of the
variance. - The variance is in square units so the standard
deviation is in the same units as x.
47Standard Deviation for a Sample
- General formula/ungrouped data
- For computation purposes
48Standard Deviation for a Sample
49Example- ungrouped data
- Sample Moisture content of kraft paper are 6.7,
6.0, 6.4, 6.4, 5.9, and 5.8 . - Sample standard deviation, s 0.35
50Calculating the Sample Standard Deviation -
Grouped technique
- Standard deviation for a grouped sample
- Average
Table Car speeds in km/h
51Skewness
- a3 0, symmetrical
- a3 gt 0 (positive), the data are skewed to the
right, means that long the long tail is to right - a3 lt 0 (negative), skewed to the left, means that
long the long tail is to left
52Kurtosis
- Leptokurtic (more peaked) distribution
- Platykurtic (flatter) distribution
- Mesokurtic (between these 2 distribution normal
distribution. - For example,
- if a normal distribution, mesokurtic, has a4 3,
- a4 gt 3 is more peaked than normal
- a4 lt 3 is less peaked than normal.
53Example
That data are skewed to the left
54Standard deviation and curve shape
- If ? is small, there is a high probability for
getting a value close to the mean. - If ? is large, there is a correspondingly higher
probability for getting values further away from
the mean.
55The Normal Curve
- The normal curve or the normal frequency
distribution or Gaussian distribution is a
hypothetical distribution that is widely used in
statistical analysis. - The characteristics of the normal curve make it
useful in education and in the physical and
social sciences.
56Characteristics of the Normal Curve
- The normal curve is a symmetrical distribution of
data with an equal number of data above and below
the midpoint of the abscissa. - Since the distribution of data is symmetrical the
mean, median, and mode are all at the same point
on the abscissa. - In other words, mean median mode.
- If we divide the distribution up into standard
deviation units, a known proportion of data lies
within each portion of the curve.
57- 34.13 of data lie between ? and 1? above the
mean (?). - 34.13 between ? and 1? below the mean.
- Approximately two-thirds (68.28 ) within 1? of
the mean. - 13.59 of the data lie between one and two
standard deviations - Finally, almost all of the data (99.74) are
within 3? of the mean.
58The normal curve
- If x follows a bell-shaped (normal) distribution,
then the probability that x is within - 1 standard deviation of the mean is 68
- 2 standard deviations of the mean is 95
- 3 standard deviations of the mean is 99.7
59Standardized normal value, Z
- When a score is expressed in standard deviation
units, it is referred to as a Z-score. - A score that is one standard deviation above the
mean has a Z-score of 1. - A score that is one standard deviation below the
mean has a Z-score of -1. - A score that is at the mean would have a Z-score
of 0. - The normal curve with Z-scores along the abscissa
looks exactly like the normal curve with standard
deviation units along the abscissa.
60Z-value
- Deviation IQ Scores, sometimes called Wechsler IQ
scores, are a standard score with a mean of 100
and a standard deviation of 15. - What percentage of the general population have
deviation IQs lower than 85? - So an IQ of 85 is equivalent to a z-value of 1.
- So 50 - 34.13 15.87 of the population has
IQ scores lower than 85.
61Frequency Polygon
- A frequency polygon is what you may think of as a
curve. - A frequency polygon can be created with interval
or ratio data. - Let's create a frequency polygon with the data we
used earlier to create a histogram.
62To create a frequency polygon
- Arrange the values along the abscissa (horizonal
axis). - Arrange the lowest data on the left the highest
on the right. - Add one value below the lowest data and one above
the highest data. - Create a ordinate (vertical axis).
- Arrange the frequency values along the abscissa.
- Provide a label for the ordinate (Frequency).
- Create the body of the frequency polygon by
placing a dot for each value. - Connect each of the dots to the next dot with a
straight line. - Provide a title for the frequency polygon.
63To create a frequency polygon
64(No Transcript)