Hanalei, Hawaii

Scott and I were talking yesterday about his compromised Apple ID. I was telling him I didn’t understand why he was so upset about it - it seemed perfectly reasonable (as MG Siegler said) that Occam’s Razor applies here: Scott has his login/password hacked somehow. But that’s not the problem - turns out there’s a lot more going on here.

The Problem

Scott was more than irritated when he wrote that post. He said as much in a followup:

I suppose. Sure, it was a rant, I’ll accept that.

Unfortunately, when you’re irritated and trying to write, things don’t come out clearly. In talking to Scott yesterday it became clearer (at least to me) that there’s no way Scott’s password was brute-forced. He explained to me how he created his password and let’s say that there’s no way it was guessed/bruted.

But that doesn’t matter as it wasn’t really the point of his post. It was the circumstances surrounding the purchase that seemed ridiculous. The question: Should Apple should have been able to detect the fraudulent charge? And if so - what do they do about it?

It turns out they did detect the charge - they sent him an email. Scott’s wondering if they should have stopped the charge as it was blatantly wrong. To complete the charge, Scott should have been required to click on a link.

Now he has to challenge the charges and go through a recovery process. Annoying.

But did Apple have enough information to warrant that kind of response? Blocking a sale is a very, very risky thing to do.

For Apple to do this kind of thing they would need to snoop your purchases and derive some information about you - perhaps information you’d rather not have them know - in an effort to create some type of profile. This sounds scary to a lot of people.

“You’re not buying things according to our profile of you. So we stopped the sale”

Would you be pissed of more if you received that email in error, or if, like Scott, you have to challenge a fraudulent charge?

“Get out of my life” and “Protect me from evil” - that’s the problem we face in today’s Cloud world. Companies find themselves too far on both sides of this see-saw. Apple is constantly reminded that we don’t want them knowing too much about us. 

So what can they do here?

It turns out that there is a better way to do Fraud Detection using a little-known symmetrical weirdness of numbers. I’m not a math person and the things you’re about to read might make you cringe as I bleet my way along. If you think I’m incorrect - please let me know (with details) in the comments.

Benford’s Law

Simply put: Benford’s Law states that across a big enough set of numbers, the distribution of the first sequence of digits in each number will follow a predictable curve:

This is a really weird thing, if you ask me, and doesn’t make intuitive sense.

A good example of this is your bank account. If you were to take all of debits over the course of 2 years and then squeeze out the distributions of 1’s, 2’s and 3’s in the first 2 or 3 positions of each number - it would follow this curve almost exactly. 

Another example is from pure mathematical functions - like the Fibonacci numbers and the factorials of any number. This crazy.

If you’re a math wonk - here’s the equation:

Benford’s Law is so exact that it’s currently being used to detect fraud - so much so that analysis results are admissible as evidence in court:

… Mark Nigrini showed that Benford’s law could be used as an indicator of accounting and expenses fraud.[7] In the United States, evidence based on Benford’s law is legally admissible in criminal cases at the federal, state, and local levels.

But there’s a catch: the dataset needs to be rather large and the growth pattern needs to be understandable.

But how would this apply to Hanselman’s purchases? And how would Apple use it?

Let’s take a look at Scott’s numbers. We don’t know the purchase prices - but they don’t matter exactly. As long as they fall within the same order of magnitude and can be somewhat predictable - which they can since most apps Scott buys are between $0 and $5:

I have, according to iTunes, 492 applications. They have all been purchased on either my iPad or my iPhone.

OK - so we have a purchase history here and we can throw Benford’s Law at it. But will it apply? Unfortunately I don’t know Scott’s purchase totals over time, but we do know the offending purchases:

Lots of 9’s. You might think “aha! this is where your math breaks down dude!” - but that’s not the case believe it or not. The retail world is full of “99s” - and this is the catch: Benford’s Law accounts for that.

In our case we’re mostly concerned with the first digits as Mark Nigrini explains that Benford:

found that about 31% of the numbers had 1 as the first digit, 19% had 2 , and only 5% had 9 as a first digit. Benford then made some physics-related assumptions about the distribution of naturally occurring data and, using integral calculus, he computed the expected frequencies of the digits and digit combinations.

Given this, we can stop worrying about the 9’s - it’s the first sequence of digits in the total purchase price that we care about.

The purchase price above? $39.89. I’ll lay some money down that if we extrapolated Scott’s purchases out, it would fall into Benford’s Law and this number right here would stick out some because 3 rarely appears as a first digit in Scott’s purchase history.

But does that mean it’s fraudulent? Enough to warrant Apple to fire off an email to Scott or prohibit the purchase? 

This is the hard part.

Actionable Data?

Here’s Apple’s choice when they see the Benford outlier:

  1. Stop the transaction and have Scott approve the purchase by email.
  2. Do what they did and send off an email to Scott saying “we’ve noticed an irregular charge”

This is where we come to the laws of probability one more time. The chances of this being fraudulent are actually pretty low. In Scott’s case, as it turns out, it’s a 1 in 493 occurrence. 

Blocking a purchase based on dumb fraud checks is annoying and we’ve all been there, at the airport when our credit card is blocked and we have to get on the phone, explaining to the droning account rep that “yes I’m really Rob”.

No company wants to look stupid. For some companies it’s much easier to correct a bad situation with good customer service than it is to play policemen.

I, for one, would like Apple to not play policeman with my purchases… but I’m getting off track here.

Summary

I’m not a math wonk and I may very well have screwed up my thoughts on Benford’s here. I remember studying it in school as a bit of trivia in one of my calculus classes and later I heard about it on RadioLab.

My gut tells me that there’s not enough data in Scott’s app purchases to offer a valid analysis here - but I’m hoping that some math-y types out there might pitch in here and help me out. 

At the same time, if you look at *all* of Scott’s purchases with Apple:

  • iTunes music
  • Computers and devices
  • Apple Store purchases
  • App Store purchases

I wonder if it would be enough. I wonder if the distribution of “3’s” in the first digit would be random enough that the analyzer would say “need more data”…

Who knows - it’s fun to think about though.

Blog comments powered by Disqus

My name is Rob Conery and I am the owner/smooth operator of Tekpub, creator of
This Developer's Life, and an avid Ruby/Rails/.NET developer.

Find Something