Brushing Up on R-Squared
When was the last time you took a course in statistics?
For many of us, it’s been a few years… at least. When talking to customers about some of the statistical concepts that factor into predictive models, I’ve found that while many topics are “kind of familiar”, most take some explanation or revisiting. To help revisit the topics that are relevant to modeling, we’ve created a new Brown Bag Learning Series to explore statistical topics one at a time in bite-sized (15 minute) segments.
The first Brown Bag, given by Mike Laracy, focused on the R-squared statistic, aka the Coefficient of Determination. So, what is R-squared, and how can it help us? Here’s a quick crash course:
R-squared values can range from zero to one. An R-squared value of zero means that the model is not explaining any of the variance in Y. An R-squared value of one means that the model is perfectly explaining all of the variance in Y. So, in general, the closer R-squared is to one, the better the model is describing the inputted data.
The R-squared statistics tells you how well your model is fitting your data. Put another way, R-squared tells you how much of the variation in your Y-values (predicted values) can be explained by variation in your X-variables (predictor variables) based on a given model.
How is it calculated?
In its most basic form, the equation for R-squared could be described as explained variance divided by total variance, where “explained variance” means “explained by the model”. For a visual explanation of explained and unexplained variance, see the graphic below. Another way of expressing the total explained variance would be use the unexplained variance, so R-squared can also be written as one minus the unexplained variance divided by the total variance, as shown below:
More scientifically, R-squared can be computed as R2= 1 – SSE/CTSS, where SSE= the Sum of Squared Errors from the model, and CTSS = the Corrected Total Sum of Squares.
Wait, there’s more!
For more information about R-squared and a deeper definition of how R-squared is calculated, check out the recording of our first Brown Bag session, “A Crash Course on R2”.
PS: Upcoming Brown Bag sessions include “What the Heck is Multicollinearity?”, “Outliers and Their Impact on a Predictive Model”, and “Hypothesis Testing and Variable Significance”. Besides the Brown Bag series, we have lots of other education-based events coming up; you find them all here.
Harness the power of many.
Create and share reports and datasets across the enterprise, and put analytical power in the hands of everyone. Veera creates a truly data-driven culture. Try it for yourself today.