Anomaly Detection Solution

In September 2022, I started my journey with GAIN Credit as an MSBA grad student at UC Davis to create an intelligent anomaly detection solution. During this time, my team of six- Tamalika, Sungho, Yifan, Simon, Sumanth, and me, started on a challenge that truly pushed our boundaries. Some of us were not expert coders, some did not know machine learning, and all were new to the credit lending and fintech business. And during the next few quarters, all of us grew in these areas and developed new skills!

Project Overview

Machine Learning - Bringing the Intelligence in Anomaly Detection

We explored various unsupervised machine learning models to create a robust anomaly detection model with minimum false positives. Then, we narrowed down on three models, given their reliable results with respect to the manual validation and analysis done by us. Considering each model follows a unique algorithm to find outliers, they do a better job together than one model alone. If two or more different methods determine the same data to be an anomaly, there is a higher conviction that it is indeed an anomaly. Hence we created an ensemble with the three below models-

DBSCAN

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) distinguishes between high-density and low-density regions. It creates clusters by grouping together data points with a specified minimum number of neighbors within a given radius (epsilon). Points outside of these clusters are considered to be anomalies/outliers.

Isolation Forest

Isolation Forest is an anomaly detection algorithm that operates on the principle of isolating anomalies. It uses a set of decision trees where each split is based on a random feature and a random split value, resulting in a forest of such trees. In this process, anomalous points, which are rare and different, tend to be isolated closer to the root of the trees.

Autoencoder 

A neural network used for data compression and anomaly detection. It comprises an encoder, which reduces input data into a lower-dimensional code, and a decoder, which reconstructs the original data. It's trained to minimize the difference between input and output, thereby learning to capture the most important data features. High reconstruction errors can signal anomalies.

Making Solutions Seamless with Interactive Dashboards  

For this project, we created a two-page dashboard using Python libraries Plotly & Dash for users to track, validate, and take necessary action on the anomalies flagged by the ML models. The dashboard provides the team with high-level insights to track anomalies in their data. It also offers specific details to understand each anomaly and provide feedback on any false positives. 

Impact