Predicting Retention for Online Students: Where to Start

With the rise of enrollment in online programs and MOOCs, we’re seeing more and more students forego traditional classroom experiences in favor of more flexible online programs. With this shift comes a whole new set of guidelines for enrollment management, financial aid, and retention programs. Retention, in particular, has seen a significant downward trend as learning moves from in-person to online classrooms.

My interest lies in figuring out what variables might be worth including in an analysis attempting to predict online student retention. I did a bit of research and was hoping to find a list of variables online that had worked in the past but couldn’t find any comprehensive resource, so I’ve started to build my own. In the sections below, I’ve listed the type of information that I think would be worth analyzing broken out into four separate categories. Some of these are variables in and of themselves, and some can be broken down different ways; for example, “age” can be used by itself, but creating a “non-traditional age” flag is useful as well. Realistically, not all schools will have all of this information, so this list is meant to be a good starting point of what to shoot for when collecting data.

Also, if you have any variables to add (and I’m sure there are some I’ve missed), I’d love to hear about them in the comments.

Student Demographic Information

  • Socioeconomic status / financial aid information
    • FAFSA info, Pell eligibility, any scholarship or award info
    • Ethnicity
    • Minority Status
    • Gender
    • Home state
    • Distance from physical campus (if applicable)
    • Age; traditional or non-traditional?
    • Military background?
    • Have children?
    • Currently employed full-time?
    • First generation college student?
    • Legacy student? (Did a parent/grandparent/sibling attend?)

Student Online Learning History

  • Registered for classes online or in person?
  • How many days did they register before the start of the term?
  • Ever attended a class on-campus?
  • Do they plan to attend both online and on-campus classes?
  • Did they attend any type of orientation?
  • Number of previous online courses taken
    • First-time online learner?

Student Academic History

  • GPA
  • SAT/ACT scores
  • Degree hours completed
  • Degree hours attempted
  • Taking developmental courses?
  • Transfer student?
  • Degree program / major
  • Program level (Associate, Bachelors, Masters, etc.)
  • Number of program or major changes (if applicable)
  • Any previous degrees?

Course- and Program- Related

  • Amount of text vs. interactive content
  • Lessons with immediate feedback?
  • Any peer-to-peer forum for interaction?
  • Lessons in real time or recorded?
  • Amount of teacher interaction with students
    • Chat, email exchange, turn-around time on assignments

Closing notes:
Getting course-related data might be difficult, but the variables I listed above are derived from studies about how to improve online courses as being areas to focus on; my thinking is that the more engaged a student is, both with peers and instructors, the better their chances of online success are. If you have the data available, it would be worth trying to incorporate it into your model dataset to see whether or not it is predictive.

Rather than using retention as a y-variable when building these models, we typically create an attrition variable (exactly the opposite of retention) and use that as our y instead. This way, we’re getting more directly at the characteristics of a student who is likely to leave rather than stay.Typically when building attrition models, I create separate models for freshmen and upperclassmen. I’d suggest doing that here as well, since previous online coursework will probably be a good indicator of future online coursework. In that case, you’d want to take out many of the variables listed above when modeling freshmen retention.

Finally, it’s important to keep in mind that student success has different meanings for different institutions. You could be basing success on # of credits completed, transitions from semester to semester, or a particular GPA cutoff, among other indicators. When building these different types of student success models, you will probably need to tailor some of these variables to fit the model you’re building.

-Caitlin Garrett is a Statistical Analyst at Rapid Insight

Decentralize analytics.
Harness the power of many.

Create and share reports and datasets across the enterprise, and put analytical power in the hands of everyone. Veera creates a truly data-driven culture. Try it for yourself today.



Decentralize analytics. Harness the power of many.