The Forgotten Tabs: Correlation Analysis
Next up in the ForgottenTabs series is the Correlation Analysis tab. The Correlation Analysis tab provides a correlation coefficient for any two variables in our dataset. To get these values, simply check the boxes next to the variables you’re interested in correlating. The resulting correlation coefficient can be either positive or negative, and generally if the value is greater than +/- .1, we say that those two variables are significantly correlated. Knowing how different variables are correlated can allow us to understand variable selection and create more accurate models.
Sometimes a high correlation value can explain why a variable may not have made its way into a final model if a similar variable did. An example of this is the correlation between the variables “SAT Math”, “SAT Verbal”, and “HS GPA”. As indicators of student success, you might guess that these variables have a positive correlation – so, you would expect that a student with a relatively high HS GPA will, in turn also have relatively high SAT scores, and vice versa. If we were to build a model that utilized these variables, however, we would typically get something like the following:
Note that each correlation coefficient is well above the general .1 threshold of significant correlation, meaning that these variables are, in fact, strongly correlated. This correlation is accounted for when we build our predictive models, so that if a change in one generally brings about a change in another, Veera Predict will pick the stronger predictor of the two and leave the other out.
-Caitlin Garrett, Statistical Analyst at Rapid Insight
Harness the power of many.
Create and share reports and datasets across the enterprise, and put analytical power in the hands of everyone. Veera creates a truly data-driven culture. Try it for yourself today.