Search












Correlations
The correlation coefficient is the standard measure of association between two variables. If a scatterplot shows a linear relationship, the data can be summarized by a straight line. This line is called the regression line and describes how the variable y changes when the values of x change. Regression is used to predicted the value of y at certain levels of x. The method used here to fit the regression line, y = a + bx, is least squares. The parameters of the regression line are a and b, where a is the intercept and b is the slope. The strength of the relation is expressed in the correlation coefficient.

The correlation is used to measure the strength of linear association between two variables. One example of its use is the study of the relationship between height and weight of individuals in a population, and another is finding out how closely related systolic blood pressure and serum cholesterol are. The definition of the sample correlation coefficient is

,

where sxy is the covariance between two variables, and sx and sy are the standard deviations. The correlation coefficient assumes values between -1 and 1. A correlation of 1 indicates a strong positive relation between the variables because all observations are on the regression line; a value of -1 indicates a negative relation. A value close to zero occurs when there is no association between the two variables.

The correlations object gives a table of correlations between all variables. In the example below, using the Darwin fertilization data, gives a correlation of .33786.

CrossfertilizedSelffertilized
Crossfertilized 1.0
Selffertilized-0.33786 1.0