#### Correlation-Regression and covariance, Biology tutorial

Introduction:

Correlation and regression are additional regions of inferential statistics that comprise finding out whether a relationship among two or more numerical or quantitative variables exists. This is when two features are studied concurrently on each member of a population in order to observe whether they are associated.  For example: a researcher might be interested in determining the relationship among height and yield in okro plant or a zoologist might not to recognize whether the birth weight of a specific animal is associated to the life span. Thus, correlation and regression analyses are employed to measure association among two variables of a bivariate data.

Correlation:

Correlation is a statistical method which measures the degree or strength of linear relationship in a bivariate normal distribution where the two continuous variables drawn from the similar population are compared.

The correlation coefficient calculated from a sample data evaluates the strength and direction of a linear relationship among the two variables. The symbol, r, symbolize samples correlation coefficient. Whereas the Greek letter ρ (rho) is the population correlation coefficient.

r = [N ∑ XY - (∑X) (∑Y)]/√ [N ∑ X2 - (∑X)2][N ∑Y2 - (∑Y)2]

Where 'n' is the number of data pairs (x, y)

Or, r = (∑xy)/√(∑x2)(∑y2)

Method for calculating simple linear correlation:

1) Calculate the mean x and y, the corrected sum of squares, ∑x2 and ∑y2, and the corrected sum of cross products ∑xy, of the variables x & y.

2) Calculate the r value.

3) Compare the absolute value of the calculated r value to the tabular r values having (n - 2) degrees of freedom at 5% and 1% levels of the significance.

4) When the calculated r value is more than the tabular r value at 5% level however smaller than that of tabular r value at 1% level, then the simple linear correlation is important at 5% level of significance.

The range of 'r' value:

Range of values for the correlation coefficient is from -1 to +1 that is, when there is:

a) Strong positive linear relationship among the variables, that is, the value or r will be close to +1.

b) Strong negative linear relationship among the variables, that is, the value of r will be close to -1.

c) No linear relationship among the variables or just a weak relationship, that is, the value of r will be close to zero (0).

Regression:

Regression is a statistical process used to explain the nature of the relationships among variables, that is, negative or positive and linear or non linear.  It states us at what rate the two variables which are significantly correlated are related by means of a regression line.

A regression line (at times termed as the least-square line) is a line which best fits the point in a scatter diagram and it for all time passes via the point (X, Y). It allows one to approximate or protect the value of one variable from a specified value of the other variable. The general equation for a fitted regression line is represented as:

Y = a + bX

Where,

Y: Dependent variable on the vertical axis.

X: Independent variable on the horizontal axis.

a: Intercept

b: Slope of the regression line or the correlation coefficient of  Y on X.

The intercept 'a' is anticipated as: a = y - b x

Where: y = Mean of the dependent variable and x = Mean of the independent variable

The regression coefficient is determined as:

b = [n (∑xy) - (∑X)(∑Y)]/[n (∑x2) - (∑x)2]

Or, b = (∑xy)/(∑x2)

A dependent variable (Y) is the variable in regression which can't be manipulated or controlled.

An independent variable (X) is the variable in regression which can be manipulated or controlled.

Method for computing simple linear regression:

1) Calculate the mean x and y, that is, the corrected sum of squares, ∑x2 and ∑y2, and the corrected sum of cross products ∑xy, of the variables x and y.

2) Calculate the estimates of the regression parameters α and β as α = y - β x (that is, α and β are considered as the estimates of a and b instead of parameters).

Thus, β = (∑xy)/(∑x2)

3) Then replace the value of β in the linear equation: α = y - β x. Therefore the estimated linear regression will be y = α + β x

4) Out the observed points and sketch a graphical representation of the estimated linear regression equation above.

Plot the scatter diagram and calculate: yminimum = α + β xminimum and ymaximum = α + β xmaximum.

Plot the two points (xmin, ymin) and (xmax, ymax) and sketch the line among the two points.

Scatter diagram or Plot:

A scatter diagram is a graph of the ordered pairs (x, y) of numbers comprising of the independent variable X and the dependent variable Y. This is a visual manner to explain the nature of the relationship among x and y. That is, it enables us to observe whether there is any pattern between the points. The more dissimilar a pattern is, the more closely the two variables are associated in some manner.

Types of Correlation and Regression:

Correlation can be Simple or Multiple. In simple relationships, there are just two variables beneath study. For illustration a researcher might wish for to study the relationship among height and weight in a population of rats.

In Multiple relationships more than two variables are in study for illustration, a zoologist might wish for to examine the relationship among growth in snails and factors like different feed protein levels, quantity of feed given per day and hours of lighting per day.

The Regression can be linear or curvilinear regression. Linear could be either simple linear regression or multiple linear regressions. Curvilinear could be exponential, quadratic, logarithmic and so on.

Simple relationships can as well be Negative or Positive. A positive relationship exists if the two variables under study raise or reduce at the similar time. In a negative relationship as one variable rises the other variable reduces, and vice-versa.

In general, simple correlation and simple linear regression might be:

1) Positive correlation: If an increase in one variable is related to a greater or lesser level by an increase in the other.

2) Negative correlation: If an increase in one variable is related to a greater or lesser level by a decrease in the other.

3) Perfect correlation: If a change in one variable is precisely matched by a change in the other variable. When both increase altogether, it is perfect positive correlation: When one reduces as the other rises, it is perfect negative correlation.

4) High correlation: If a change in one variable is almost precisely matched by a change in the other.

5) Low correlation: If a change in one variable is to a small level matched by a change in the other.

6) Zero correlation: If the two variables are not in matched at all and there is no relationship among changes in one variable and modifies in the other.

Spurious Correlation:

If interpreting correlation, r, it is significant to realize that, there might be no direct connection at all between highly correlated variables. If this is so, the correction is known as spurious or nonsense correlation. It can occur in two manners:

a) There might be an indirect connection.

b) There might be a sequence of coincidences.

Covariance:

If the association of two variables is evaluated we can state of the resultant assessment as the covariance ('Cov') of the variables. The utilization of analysis of covariance assists to remove variability. Covariance can be evaluated by determining the average of the products of the deviations of each of the paired variables from the total mean of the relevant variable that is,

Cov (x, y) = y(xi - x) (yi - y)/n

Tutorsglobe: A way to secure high grade in your curriculum (Online Tutoring)