How to find a relationship hiding inside a cloud of points.
A researcher tracked how many hours per week teenagers spent exercising and their resting heart rates. She plotted each person as a single dot — hours on the horizontal axis, heart rate on the vertical. When she stepped back and looked at the whole picture, she saw something: as hours of exercise went up, resting heart rates tended to go down. That picture is a scatter plot, and the pattern she noticed is a correlation.
A scatter plot places two numerical variables against each other on a coordinate plane. Each person, object, or event becomes one point. The horizontal axis holds the independent variable — the one you think might be doing the influencing. The vertical axis holds the dependent variable — the one you think might be responding.
Here is what that exercise data might look like:
Once the points are plotted, you describe the association. There are three things to name: direction, form, and strength.
Direction tells you which way the pattern goes. A positive association means that as the -variable increases, the -variable tends to increase too. A negative association means that as increases, tends to decrease. The exercise and heart rate data shows a negative association. If there is no pattern at all, you say there is no association.
Form describes the shape of the pattern. If the points cluster around a straight line, the association is linear. If they bend or curve, it is nonlinear.
Strength describes how tightly the points cluster around that pattern. If the points hug the line closely, the association is strong. If they are spread out loosely, it is weak.
A complete description of a scatter plot combines all three: "There is a strong, negative, linear association between hours of exercise and resting heart rate."
To put an exact number on the strength and direction of a linear association, statisticians use the correlation coefficient, written . It always falls between and .
When is close to , the points fall nearly on a line with positive slope — strong positive linear association. When is close to , they fall nearly on a line with negative slope — strong negative linear association. When is close to , the linear pattern is weak or absent. The sign of tells you direction. The size of tells you strength.
For the Regents, your calculator computes for you. You will not calculate it by hand. But you need to interpret what it means. An -value of means a strong negative linear association. An -value of means a weak positive linear association.
When the association is linear, you can draw a line of best fit — also called a trend line or regression line. This line passes through the middle of the data, with roughly equal numbers of points above and below it. On the Regents, you may be asked to draw one by hand or to use a calculator to find its equation.
The line of best fit has the same form as any linear equation:
Here is the slope and is the -intercept, just like always. What changes is how you read them. The slope tells you how much changes for each one-unit increase in . In context, that matters. If the equation for the exercise data is , the slope means that for each additional hour of weekly exercise, resting heart rate drops by about beats per minute.
You can also use the equation to make predictions. Plug in an -value and compute the predicted . If you predict within the range of your data, that is called interpolation. If you predict outside the range of your data, that is extrapolation — and you should be cautious, because the pattern may not hold forever.
On Part II and Part III of the Algebra I Regents, scatter plot questions often ask you to write a sentence interpreting the slope in context. Writing just "the slope is " earns no credit. You need to say what the numbers mean using the variables described in the problem. A complete answer names both variables, states the direction of change, and includes units.