## Monday, January 30, 2012

### Comparing distributions

I was thinking if you have these various distributions, like here are four different normal distributions:
I got to wondering how you could find out if you randomly pick a number from say the green distribution, how often a number picked from say the red distribution would be greater. One way of doing it is simulating say a million trials on a computer and seeing how often one was bigger than the other. So doing that first:

But I was more interested in an exact solution so I thought of this formula:

I'll explain the reasoning in a second, let's see that it works first:

It came out to 94.8753% or pretty much the same.
Anyway Maple makes it look really ugly but the just above is evaluating the integral in the next image up with the two distributions and over a suitable interval. And you see the result is the same to within a couple millionths as the computer simulation. Probably my formula is correct so I'll explain the reasoning:
Looking at the bounds of integration the inner integral is from r to d and the outer is from a to d, that means the integral is only being evaluated over points q,r where q is greater than r. Like for a particular r in the outer integral the inner one looks at every q greater than that r up to d and so on. Q>R
So for every possible pair of q,r such that q>r the function being evaluated is X(q)*Y(r) which is to say the probability of choosing a point q,r. There's a certain probability of picking a q from X and a certain probability of picking an r from Y so X(q)*Y(r) is the probability of picking both. So when you add up all those infinitesimal probabilities you end up with the total probability of picking a pair such that q > r for those two distributions.
So in other words you have the probability if you pick a number from each distribution that the first is greater than the second.
*Edit*
This formula scales to any number of variables, incidentally. Here is an example where you have 6 independent uniform variables ranging from 0 to 1. And you want to know the likelihood of them being chosen so that each one you choose is bigger than the last one.
So it turns out 1/6! which makes sense there are 6! possible orderings and any one should be one over that probability. The 1^6 I just wrote it like that so you could see it would be six distributions multiplied together.

*UPDATE 2
Another interesting thing to do is something like calculate the probability that 3 uniformly random numbers between 0 and 1 will be able to form a triangle. You can set up the integral like this:
C is forced to be between b and a+b and b is greater than a. But there are 3 ways to choose which variable will be the hypotenuse so all in all there is a 50% chance that the three numbers will be able to form a triangle.