correlation→Correlation analysis is a statistical method used in research to measure the strength of the linear relationship between two variables and compute their association
Descriptive statistics such as→measure central tendency (central value) (mean, median, mode) and measure of dispersion (range, standard deviation) are commonly used in analysis of univariate data
Methods of studying correlation
Scatter Diagram
Karl Pearson's co-efficient of correlation (covariance methods)
Scatter diagram→is the graphical representation of the pairs of data ( X ; Y) in an orthogonal co ordinate system (A graph that displays the relationship between two variables.)
if for the increasing values x of the variable X there is a definite displacement of the values y of the variable Y we then say that there is a correlation
Simple correlation coefficient (r)
It is also called Pearson's correlation or product moment correlation co efficient.
It measure the nature and strength between two quantitative variable (A measure of the linear association between two variables.)
Assumptions for Karl Pearson correlation analysis
Variables are related to each other i.e. not independent
Data measured on Interval or Ratio variable
There exist linear relationship between variables
Variable Normally distributed absence of outliers
Both variables must have same number of observations
In statistics, particularly in regression analysis, homoscedasticity refers to the assumption that the variance of the error terms (residuals) is constant across all levels of the independent variable(s). This means the "noise" or random disturbance in the relationship between the independent and dependent variables is uniform
r indicates...,
strength of relationship (strong weak or none)
Nature/ direction of relationship
positive (direct)- variables move in same direction
negative (inverse )- variables move in opposite directions
r ranges in value from -1.0 to + 1.0
r=1 is a perfect positive linear correlation
r= 0 is no linear correlation
r= -1 is a perfect negative correlation
Example
if r= 0.694 we may conclude that there positive moderate relationship between X and Y
Sample correlation co efficient = r = cov (xy)/ Sx*Sy
Properties of Karl Pearson correlation coefficient
Correlation coefficient is pure number i.e. it does not have unit
The range of correlation coefficient is -1 to +1 i.e. -1 ≤ r ≤+1
Correlation between two variables is know as simple correlation or zero order correlation
Correlation coefficient is independent of both origin and scale
if rxy =0 the X and Y is not linearly related. (may be curvilinear relationship)
X and Y independent variables then rxy= 0 but vice versa may or may not true
Correlation co efficient (r) between X and Y is geometric mean of two regression coefficient byx and bxy i.e. r= √(byx*bxy
The sign of regression coefficients and correlation coefficient are always same (both regression co efficient have same sign)
arithmetic mean of two regression coefficient byx and bxy is greater than or equal to correlation coefficient between X and Y.
Correlation coefficient is symmetric in nature .i.e. rxy=ryx
Limitations of Karl pearson Correlation coefficient
Linearity
can't describe non-linear relationships
e.g., relation between anxiety & Performance
truncation of range
underestimate strength of relationship if you can't see full range of x value
no proof of causation
Third variable problem
cloud be 3rd variable causing change in both variables directionality : can't be sure which way causality "flow"
Types of Correlation (on the basis of direction of change)
Positive correlation
Negative correlation
perfectly positive
perfectly negative
zero correlation
Types of correlation (on the basis of number of variables)
Simple correlation (only 2 variables)
Partial correlation (Effect of only two is studies while others are kept constant)
Multiple correlation (More than 2 variables)
Correlation can be simple (two variables) or multiple (three or more variables).
Types of correlation (on the basis of proportion)
Linear correlation (amount of change in constant ratio)
Non- linear correlation
Simple correlation coefficient (r) : Correlation between two variables
Multiple Correlation coefficient (R) : Correlation Between more than two variables
Least squares fit, Properties and examples
Polynomial regression: Use of orthogonal polynomials
Spearman's correlation coefficient
is the statistical measure of the strength of relationship between paired data it is denoted by -1 ≤ rs ≤+1
its interpretation is similar to that of Pearson's, e.g. the are closer is to the strong the monotonic relationship
A non-parametric measure of the monotonic relationship (between two variables where, as one variable increases, the other either consistently increases or consistently decreases, but not necessarily at a constant rate.)between two variables.
without any ties
$\rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$
$\rho$ = Spearman's rank correlation coefficient
$d_i$ = difference between the two ranks of each observation
if the rs of +1 indicates a perfect association of rank , a rs of zero indicates no association between ranks and a rs of -1 indicates a perfect negative association of ranks.
the closer rs is to zero the weaker the association between the ranks
The calculation of spearman's correlation coefficient and subsequent significance testing of it requires the following data assumption
Interval or ratio level or ordinal
Monotonically related
Absence of outliers
Both Variables must have same number of observations
Pearson's correlation there is no requirement of normality and hence it is non parametric statistic
Unit-2 Regression diagnostics
Autocorrelation
Correlation is degree of similarity between two different variables
Correlation of same variable at two different times Y t and Y t-k is called Autocorrelation. Autocorrelation is also called as serial correlation
Autocorrelation is the correlation of a time series with a lagged version of itself.
where k is the time gap being considered and is called the lag. Autocorrelation of lag 1 (i.e. k=1) is the correlation between values that are one time period apart
In regression analysis one of the assumption is that the error term et still have mean 0 and constant variance $$σ^2_{e}$$
For error term et have mean zero (i.e. e-bar=0) so autocorrelation for residual reduce to
incorrect function form (instead of linear may be Quadratic / non linear )
Independent variable of lag form of Y
Data may be manipulated i.e. interpolation or extrapolation of data.
Consequences of Autocorrelation
R-square interpretation may not accurate / holds true
narrow confidence interval for regression coefficients
Usual t-ratio and F- ratio tests provide misleading results
Predicted y value may have large variances
correlogram→A correlogram is a visual representation of the correlation between different variables. x = Lag
y= Autocorrelation values
Test of Autocorrelation
A plot of residual et against t
Durbin-Watson test
A Lagrange Multiplier test
Ljung-Box test
A correlogram
What is the Durbin Watson test
The Durbin Watson test is a measure of autocorrelation (also called serial correlation) of residuals from regression analysis. The Durbin Watson test looks for specific type of serial correlation the AR (1) process.
The Durbin-Watson test detects autocorrelation in the residuals of a regression model.
if et is residuals at time period t regressed by residual at time t-1 a
0 Comments