Solving the 190 Billion Dollar Issue

10 min readMar 22, 2019

Demo video: https://youtu.be/PGvLXGPyEFs

In the United States alone, an estimated 190 billion dollars is lost annually to credit card fraud by companies across the nation.

What if I told you that we could solve this problem today, and save at least a few billion dollars? Well, that’s exactly what I tried to do with my fraud detecting SOM (self-organizing map).

But before I get into how my model and SOM works, what the heck is a SOM?

What are SOMs

A SOM, aka a self-organizing map, is a type of unsupervised neural network and usually it takes vast amounts of data and simplifies it to a graph sorted by trends in the data. They basically reduce dimensionality meaning data that was 3-dimensional would be 2-dimensional by the end of the process. SOM’s are especially useful with a huge amount of data as they are able to group data into categories and compare it, making it easier to analyze vast amounts of data.

Let’s take an example of a SOM:

The graph above represents countries’ qualities of life based on a number of factors such as healthcare, the quality of education, or the average income in that country. The dataset started off with about 39 columns and more than 200 countries, but this graph allows us to easily understand the spectrum these countries are on. Countries with a lower quality of life tend to be on the left end of the graph, while countries with a better quality of life on the right end of the graph.

How do SOMs work?

Unlike most neural networks, SOMs have more output nodes than input nodes.

I know what you’re thinking, aren’t they supposed to make the data easier to understand? How do they have more output nodes?

Well, it’s a bit more tricky than it seems. Let’s take an example of the graph above, each input node represents a column with 3 rows. In the end, the data is expressed as a 2-dimensional array meaning that it's no longer a 3-dimensional set of data and depending on the situation this can make the data a lot easier to understand and analyze.

SOMs also lack activation functions which differentiate them from most other types of neural networks, and the weights aren’t directly applied to the input and stored. Instead, the weights are a part of the output node, and it carries versions of the input nodes with the weight applied.

Let’s bring it back to our example, in the sample with 3 input nodes we see at the output layer that each output node carries 3 versions of that original input node but with the weights applied.

How Does it Learn?

To get our SOM to learn, or to get closer to the actual dataset, which the end goal we need to calculate the distance from the dataset. What this looks like is that each of the output nodes (remember there are 3 values), is compared with the original dataset and we calculate the Euclidean distance. The one closest to the original dataset is called the BMU (best-matching unit), but before we move on let’s go back to our example.

So, if we want to find the value of the first row through the three columns we can take the first W in each of the nodes and subtract that amount from the actual value of that row from the dataset. This gives us the Euclidean distance and the closest one, which in this case is node 3, is the BMU (best-matching unit)!

Then the SOM will go back and change the weights to be closer to the weights in the BMU, and basically, it’s trying to drag the whole set closer to this BMU trying to bring the whole SOM closer to the actual dataset.

This diagram helps to visualize what’s occurring as the white dot represents the actual data point, the yellow is the BMU and the grid represents our SOM (as a grid of nodes basically). The BMU then has a radius drawn around it and any node that falls under it is updated with weights similar to the one the BMU carried, and the closer anode to the BMU the more its weight is updated.

So as the weights of the nodes near the BMU are updated the SOM is brought closer to the dataset until we eventually match the BMU precisely with our dataset. This continues on for each row of data (if we go back to the example, this means we would move onto the second W in each node) and we get a new BMU for each row.

Wait, what if there’s overlap between the radii of BMUs like this?

The node will then be affected by both the BMUs as it is under both of their radii, but it will be affected by one to a greater extent compared to the other and that is depending on whichever BMU it is closer to!

Each BMU then shrinks in radius when there are multiple and this means they influence fewer nodes. The focus is on getting the whole graph of nodes closer to the actual dataset rather than just the BMUs touching the dataset. Each node is then assigned a BMU and we get a graph like this. That’s all!

Applying it to Fraud Detection + Code!

Using credit card client data from UC Irvine, I made a SOM that would classify possible fraudulent customers and graphed it out. The SOM went through the exact steps described above when training and when graphing to get closer to the actual dataset, but the only difference is the code which I’ll get into right now!

Libraries and Datasets

So I started by importing the libraries I need and called upon the dataset I got from UC Irvine’s website. I then defined x and y with x being every column in the dataset except for the last column, which is why I put the -1. The “iloc” function basically let’s call a specific index and so I put the -1 in there to exclude the last column. I then defined y as just the last column which was the column showing if the users were approved or not, both x and y were followed by “.values” so they would return a value for the variables. I excluded the last column as the first 14 columns all were attributes of data such as gender, banking history, etc. and were all the customer id while the last column was if they got approved or not.

Data Preprocessing

After, since the data has a lot of relations that are non-linear and not easy for the neural network to understand, I normalized it. So using the MiniMaxScaler from the sklearn library, I made the variable “sc” to define the range I want the normalized numbers to be in between and that was 0 and 1 in this case. The last line of code in this part essentially fits to whatever x value it receives and the transform function makes sure we get a normalized version of x (a version of x between 0 and 1).

Training!

Now to get to the training of this SOM, I started off by importing MiniSom. The MiniSom class is imported from the minisom file I downloaded online and it gives me a very basic SOM model which I can then use to get my own model. So in the next line, I defined “som” by calling upon the MiniSom class I imported and defining a few attributes.

I defined the x and y as 10 and 10, as the x and y can be any dimensions we want to set for our map and I set it as 10. The next is the input_len which is the amount of attributes or columns in our original x since we are training this neural network on the x dataset. So that means that we have 14 columns plus the one column for the customer id so we can go back in our data at the end and identify the possible fraudulent customers. Therefore input_len is defined as 15. Sigma is similar to the radius of that BMU described above and I kept it as its default of 1 and kept the learning rate also at the default of 0.5.

The next line is basically just to give the neural network I made some random initial weights which are to be changed as training continues. The very last line trains the SOM based on the steps described above, where the weights are changed for many iterations and eventually the radius shrinks. So to use the “train_random” I had to again define some attributes, and in this case I had to define the data and number of iterations. I defined the “data” as x as that’s what I’ve been using to train the SOM and the “num_iteration” as 100 since that’s enough for my SOM model.

Visualizing Data Results

Now, onto building our graph/map of fraudulent customers!

I started off by importing some useful functions like bone, pcolor, colorbar, plot, and show from pylab which helps to get us visual graphs/charts. To start off I need “bone()” which basically creates a plain white window in which I can build more in it. The next line maps out the SOM by mean interneuron distance (MID), the larger the MID the further it is from a node and more likely it is to be an outlier.

But, there’s more.

What if we could map out in addition to the outliers who was approved or not? We can do this using dataset y which if u recall earlier was the last column we left out of dataset x. So we define the markers as ‘o’ and ‘s’ (an o for a circle and s for a square) while the colors we are going to use are red and green.

Here I defined a for loop which goes through each customer and using the “som.winner” (the winning nodes or the BMUs) I can plot out the green circle (approved) and red circles (unapproved).

In the “plot” class there are a few attributes that need to be defined before I go on. First is the W[0] + 0.5 and W[1] + 0.5, these essentially are just where the markers will be placed on the individual squares of the map. The next is the “markers[y[i]]” now this is going to indicate if it is a square or a circle. Using the dataset y, it’ll return a 1 if the customer was approved which corresponds to a square, or it’ll return a 0 if the customer wasn’t approved, corresponding to a circle. The same occurs for the “colors[y[i]]” function but instead of a circle or a square it returns a red or green. The “markersize” and “markerwidth” are just to indicate how big the marker will be on the map and I left the “markerfacecolor” empty.

The last line executes the whole map and shows us something like this!

SOM map with markers indication approval!

Finding Fraudulent Customers

The very last part of this program gets us the list of frauds by using the graph (defined through variable “mappings”), looking at the graph for outliers in the white and calling upon them in the variable defined “frauds”. This basically gets us the data for the outliers (which we can see on the map are at 6, 8 and 0,1 but since each time we run the program we get a different map, it can be anything), then the last line is to bring the customer id’s back by reversing the pre-processing of the data and unscaling it, giving a list at the very end of possible fraudulent customers.

By clicking on “Variable Explorer” and going to the variable defined “fraud” I get a list of the possible fraudulent customers with their id’s!

And that’s all!

Key Takeaways + Looking at the Future

SOMs are pretty useful to synthesize a whole lot of data to an easy to understand form such as a map, but they can also be used to get lists as I did in my example program. SOMs can be extremely useful when trying to group or classify data and I think in the future we’ll be seeing a lot more of them being used in research and even in commercial use such as for fraud detection!

Slide deck: https://docs.google.com/presentation/d/1Cnt0UucYC4L2lYz3ZKkCST6oRKH7Ngm5qCjTBThGWwM/edit?usp=sharing