PistonHeads » Gassing Station » The Pie & Piston » Science!

Statistical modelling question

OP Posts Only

Author

Discussion

speedy_thrills

Original Poster

7,775 posts

250 months

[report]

[news]

Saturday 20th January 2018

Foremost I apologise if this is in the wrong place to post this question, I did consider the "finance" section but most of those questions pertained to personal finance.

I'm normally a "people manager" (although really I'm just a subject matter expert for a team) but I've been asked to look at a small fixed rate residential loan portfolio with a consistently very high churn rate. Unfortunatly it's come to light that there had been little consideration given to managing the customer churn aspect of the portfolio which we know is poor by external measures (acquisition is excellent, retention is terrible.) We have a range of historic data - all the usual loan data (i.e. amounts, repayment rates etc.) and a few peripherals such as if early exit fees have previously been calculated, if discounts and incentives where offered previously, if a client has been retained using incentives, monthly reporting on how many customers have refinanced to other providers etc.

Every month we generate a report of fixed rate agreements which will be expiring but they are infrequently contacted. What I wanted to do was look towards building a simple model so that we could isolate residential loans coming due that are at a higher probability refinancing to another financial service provider.

What sort of modelling do you think I should be looking into initially? While I studied a STEM at university it was a few years ago now and I did not take higher level statistics courses so your assistance and/or thoughts would be very much appreciated.

V8LM

5,268 posts

216 months

[report]

[news]

Saturday 20th January 2018

You could look at a multivariate analysis (partial least squares, principle component analysis, multivariate curve resolution) where you look to find a model that takes all your input data and predicts the refinancing probability. The value of this depends on whether there is indeed any relation, and how big and accurate your historical data is. Or a machine learning method, such as a neutral network.

Jag_luvver

82 posts

84 months

[report]

[news]

Saturday 20th January 2018

Neural networks seem to be 'de rigueur' at the moment. This document seems to give a good overview of stuff that should be relevant:

https://www.cc.gatech.edu/~isbell/tutorials/rbf-in...

The intro highlights some search terms that you could chuck into google scholar (i.e. different names for doing the same thing), and there seem to be some good examples that should help to apply RBF networks to your data.

speedy_thrills

Original Poster

7,775 posts

250 months

[report]

[news]

Saturday 20th January 2018

Thanks, I'll investigate both of those and read through that document.

(steven)

468 posts

221 months

[report]

[news]

Saturday 20th January 2018

Logistic regression is probably the best approach as it is well known, it deals with a binary outcome and so is good for churn modelling.

If you work in finance, it’s also the same approach that is commonly used to forecast delinquent loans so you may have some SME’s in the business already.

There are more glamorous techniques, but logistic regression is probably the most understood in this context.

V8LM

5,268 posts

216 months

[report]

[news]

Saturday 20th January 2018

‘R’ is possibly the best way to test and try things out. Plenty of tutorials and examples on the web.

nammynake

2,608 posts

180 months

[report]

[news]

Sunday 21st January 2018

As above, assuming you have data for each account (i.e. key variables that you suspect are good indicators of whether a customer will leave) then logistic regression is perfectly suited to this kind of problem. It's the method used to build application scorecards.

speedy_thrills

Original Poster

7,775 posts

250 months

[report]

[news]

Monday 22nd January 2018

Logistic regression does look like a good start. After reading a few case-studies it look like kNN is often next step on from that.

Toaster

2,940 posts

200 months

[report]