Thursday, March 25, 2010

Statistical Terminology: Why different statistics are used


Descriptive Statistics

Mean: a measure of the general tendency, calculated by adding all the scores and dividing by the number of scores. The advantage over other measures of tendency is that it uses all scores; disadvantage is that it's influenced by extreme scores. It should be used with interval or ratio levels of measurement.
Mode: A measure of central tendency which is the most frequently occurring score in a distribution.
Median: A measure of central tendency that is the middle score of a set , placed in size order. Advantage: not influenced by extremes; Disadvantage: it doesn't use the arithmetic values of the intervals between scores. It requires an ordinal level of measurement and should be used instead of the mean when the distribution of data on an interval scale is skewed.
Standard Deviation: The square root of the variance. It measures the dispersion of the scores around the mean in a normal distribution.
Variance: A measure of dispersion that is eq. to the sum of the squares divided by the degrees of freedom.
Range: Simplest measure of dispersion; measuring the difference b/n the highest and lowest scores. Sometimes the value 1 is added to the difference to reflect that the range is inclusive of the end points.

Inferential Statistics

t-test: finds the difference b/n means b/n two groups. In a one-sample t-test, observed mean is compared to expected mean of the population.
ANOVA (Analysis of Variance): tests for sig. diff. b/n the means. Same results as t-test.
Repeated Measures ANOVA: successive observations from the same source (dog, or person, such as in a longitudinal study). Another e.g. is when the same person is given a treatment of vodka then tested on memory, then comes back in 5 days and is given a placebo.
Correlation: measure of the relation between two or more variables. Correlation coefficients range from -1.00 (perfect negative correlation - negative correlation!) to +1.00 (perfect positive correlation). A correlation of 0.00 means A LACK OF CORRELATION.
Multiple Regression: analyzes the relationship between several independent (or predictor) variables and a dependent (or criterion) variable.
Factor Analysis: reduces the number of variables, and then detect the structure of relationships between varaibles -- that is, to classify the variables. E.g., if we are set to measure people's satisfaction, and ask in a survey how much they like their hobbies and also how much time they spend on their hobbies, the two varaiable will probably have a HIGH correlation, which mean we are probably being redundant with both variables - best to condense.
One can summarize the correlation between two variables in a scatterplot. A regression line can then be fitted that represents the "best" summary of the linear relationship between the variables. If we could define a variable that would approximate the regression line in such a plot, then that variable would capture most of the "essence" of the two items. Subjects' single scores on that new factor, represented by the regression line, could then be used in future data analyses to represent that essence of the two items. In a sense we have reduced the two variables to one factor.
Discriminant Analysis: used to discriminate between two or more naturally-occurring groups. E.g., a school researcher may try to find out which variables differ b/n groups of students who decide to go to college and those who don't, and figure out which variables (SAT score, GPA, etc) predict each.
Chi Square Test: A test that uses the chi-square statistic to test the fit between a theoretical frequency distribution and a frequency distribution of observed data for which each observation may fall into one of several classes.
f(x) = {1/[2/2 * (/2)]} * [x(/2)-1 * e-x/2]

= 1, 2, ..., 0 <>

where
is the degrees of freedom
e is the base of the natural logarithm, sometimes called Euler's e (2.71...)
(gamma) is the Gamma function


No comments:

Post a Comment