Segmentation: The Potential of Higher Predictive Power

james cousins
senior statistical analyst

If you build models on your data, you’ve probably considered segmenting your modeling effort. You also might have called it something different. A segmented approach to modeling involves building different models for different subgroups. And before we answer “how?” Let’s answer “why segment models?” The answer’s simple. A segmented approach to modeling can dramatically improve how well you capture the unique behaviors of your population.

Why Segment Your Model Populations?

Now, the crux of the issue is that models are incredibly helpful in exploring what drives key outcomes for your students or institution at large. But your students aren’t all identical. Different groups of students have different motivations and inspirations. One model may not capture the subtlety in your overall student population. These four questions are great guide-posts as you consider this approach.

  • How well can one model explain my entire population?
  • What groups might be misrepresented (or sub-optimally represented)?
  • Could I build models for each group?
  • Should I build models for each group?

I would never claim that the standard reaction to having completed a predictive model is “Can I do this more than once for the same overall group?” But, the insights gained from this extra effort can be really important. You work with data and you build models because you care about your outcomes, or at the very least, because you pull back the curtains on what your data can reveal.

Why Not Segment Your Model Populations?

So now we know “why” to segment your data when modeling. Why not? Time. Modeling takes time, and for many, it’s not feasible to build multiple models on the same outcome. Another really fascinating reason why you might not is that it might not be worth it. To justify splitting your modeling population, you’d have to see evidence that students respond differently to the same factors.

An example: perhaps you have resident and commuter students. The distance between a student’s permanent address and the institution probably won’t impact the students living on campus identically to the way it impacts commuters.

How to Segment Your Model Populations?

There are heaps of benefits to creating multiple models for your population segments and “the reasons not to” focus primarily on feasibility. But what if it didn’t take that much time, and there was an easy way to identify whether your efforts were worthwhile? You might have guessed by now; there is a way. You can ask anyone at Rapid Insight. I’d love to talk about how we can manage this. One of our customers, Dr. Sarah Caro, of the University of New Haven, has a real-world example. She’s giving key subgroups of her student population the dedicated understanding they deserve and she’s getting much more accurate results because of it.

Sarah presented a webinar about it, it’s a part of our Women In Data Science series.

To watch a recording of Sarah’s presentation, click the button below.

Sarah Caro, University of New Haven

Sarah Caro, PhD
Senior Research Analyst
University of New Haven

A Data Segmentation Approach to Modeling Freshman Retention
“One size fits all” doesn’t always work for predictive models. Sarah shares the approach she took to develop multiple freshman retention models targeted at unique segments of the University of New Haven’s student population.