Customer Retention analytics in Subscription Setting

Share This Post

In a subscription-based business model, customers generally pay for goods or services by first subscribing for those goods and services. The amount that they pay is usually based on quantity that they have consumed and the charges for the services are payable on a monthly, quarterly or annual period. The subscription-based business model not only exists both in the physical world, and in the online world as well. Usually, digital subscriptions are services by nature and no exchange of physical products takes place. The various video streaming service that exist today are good example of this such as, like Netflix and YouTube TV.

Naturally, the major challenge of having a subscription-based business is retaining customers and collecting recurring payments from them. Thus, having an understanding customers by using various analytics methods is the core focus of companies nowadays.

CRA or customer retention analytics helps companies to develop and execute a data-driven retention strategy to help them retain their subscribers.

To understand when will a customer leave a firm or unsubscribe to services (also known as customer churn), we have to distinguish between two types of settings, contractual and non-contractual. Customer churn is always relevant in contractual circumstances, often referred to as a subscription setting. Also, in a contractual setting, we note the exact time when customers become inactive towards a firm services. In the field of marketing analytics, many experts have created a new discrete-time duration model for the churn process using advanced statistics and techniques. First of all, the model makes reasonable assumptions regarding customer retention and at the end of each period, an individual cancels her/his contract with a probability that is unobserved (θ).

Let us take a simple case. Suppose an online content service provider acquires the 1000 initial customers (Table 1). The second column shows the amount of retained customers, whereas the last column shows how many customers leave at the end of that period.

Period	Customers	Churn
0	1000
1	869	131
2	743	126
3	653	90
4	593	60
5	551	42
6	517	34
7	491	26
8	0	491

Table 1 Example Data of Customer Retention[1]

The discrete-time model has two variants:

Supposing all customers have the similar “churn probability” θ
Considering the heterogeneity of every individual, θ varies across customers,

To streamline this, we first focus on the simple model and use T to mark the duration of a customer’s subscription with the business. The variable T has a shifted geometric distribution with parameter θ because the model supposes the cancellation decision customers make at the end of each period is a Bernoulli trial. So, the churn probability at the end of t^th period is, (for t =1, 2, 3, …)

P (T = t | q) = (1-q)^(t-1) x q . (eq.1)

Next, P(T > t | q) = (1-q)^t(eq.2). It means that there is a probability that customers still remain with the subscription until the t^th period.

Based on the statistical estimates, we need to figure out the “churn probability” θ that may create the observed data of subscription cancellations with maximum likelihood. The probability of the observed data of customer retention is as follows:

Combining the eq.1 and eq.2, we can obtain the likelihood function including model parameters (like q), L (parameters | data), that is the probability of observing the sample data, p (data | parameters). Thus, for a given dataset, we find those values that maximum the log transformed of L (parameters | data). With SAS, the procedure NLMIXED could be used to search the parameters under the framework of maximum likelihood estimation (MLE). In R, we can build the log-likelihood function and utilize the optimizer (i.e. optim) to locate the parameters maximizing the function.

Usually, the model assumes that all customers have the same q and thus shows poor capability for predicting customer churn in the future. We therefore turn to discuss the second type of variant, which is the heterogeneity in q (among different customers) that is captured by a Beta distribution with probability density function (pdf).

[1] Adapted from Berry and Linoff 2004

(eq.3)

Thus, similar to eq.1 and eq.2, we get the probability of customer churn at the end of t^th period (eq.3), and the probability of customer retention until the t^th period.

(eq.4) / (eq.5)

The second model where each customer’s q is captured by a Beta distribution is named as the shifted Beta-Geometric (sBG) model. The MLE method is also used to determine the parameters of a and b that makes the model fit the observed data in the best way possible.

In the consequent step, we will combine clustering method to uncover various customer segments, such as high-class and regular customers have totally different likelihoods to renew their contracts, and then fit the sBG models separately to improve the accuracy of predicting customer churn. A lot of experimental evidences have shown that the discrete-time model based on the probability of various individual behaviors can model the churn process accurately as compared to other common parametric regression models.