What’s the deal with Clustering?

Information is very often stored either as a number on a spectrum or as a category. Predictive modeling benefits from both of these kinds of data. One neat, often under-used trick is using clusters to help your model capture behaviors even better. Simply put, imagine that a characteristic in your dataset, like SAT scores, has a pretty straightforward relationship- the higher the score, the more likely to retain.


But there are many cases where there will be deviations from this “pretty straightforward” relationship. These are the cases where a cluster will be a big help. The linear term itself can’t do much to improve the fit. Using a series of clusters, as specified below, we could fine-tune our model. The nice part is, with the clusters formed, computers take care of this correction for us.


The other nice part is that Analytics can help you take care of the clustering too, without needing to investigate cutoff points, or run the calculations yourself.

Here is a quick video on how the feature works within Rapid Insight Analytics.

This is the sort of thing we’re always happy to chat with people about, so if you want to know more about the details of modeling with clustering variables, leave us a comment or email us at information@rapidinsightinc.com! Do you have any current experiences with binning/clustering your data? Have you seen any interesting results?