EXERCISE IN PROBABILITY NO. I




Many people that study mathematics are exposed to the idea of a bell curve. The general formula of a normalized distribution curve is:

f(x) = (1/sqrt(2*pi)) * exp(-(x^2)/2)

The notation used is fairly readable to a computer programmer. It is the same syntax used by a popular symbolic calculator.

If you have a set of data values, you can compute the value of the median and shift and scale the bell curve so it peaks at that point. You can also compute the standard deviation and spread out the curve based on that information. The question I take on in this paper is how to approach a bell curve when you are working with finite samples and you don't have a data set to observe.

If you are given 3 dice and you roll them 6^3 times so that each permutation occurs exactly once, what will the distribution curve look like? This is a much more complex problem than pulling black and white balls out of a box and noting their distribution. That example is commonly given to show that a bell curve develops but you only get a true curve with an infinite number of samples and an infinite sample size.

In the case of 3 dice, I am asking what does the true distribution curve look like. This information can be critical when small sample sizes are considered. I also have the problem that although I can easily calculate where the peak occurs, I do not know the number of permutations that correspond to that peak. I also have no idea of how the curve spreads out so I can not fit it to a bell curve even if I wanted to.

I am going to derive a formula for the above problem with an experimental approach. I will use a computer to verify the answers are correct for small samples. As long as the formula seems to work, I will use it. It I ever get a wrong result, I will investigate the formula further. Often times the error points to where the problem is. This is much like debugging a program.

Initially, I found a rather complex expression for the above problem with 3 dice. Trying larger numbers of dice quickly choked the computer. I then tried a radically different approach: if I want to know how many permutations of 3 dice add up to the value s, I need only consider the number of points in the plane x+y+z=s.

3d plot


In order to simplify the calculations, assume the dice have the values 0 to 5. The final formula can be adjusted to correct for this. Another useful simplification is to project the plane x+y+z=s onto the xy plane. None of the points are lost even though the original plane is at a 45 degree angle to the xy plane. This only works because we are counting finite points - not calculating areas of planes.

p2


Now lets look at the projection of the plane x+y+z=5 onto the xy plane. We have 6 points, then 5, then 4, ..., down to 1. The total number of points in the triangle is (6*7)/2=21. (You may recall the sum of the integers from 1 to n = n*(n+1)/2.) We can conclude that there are 21 permutations that add up to 5. We check this with a computer that considers every permutation using brute force and it yields 21 also.

As long as we consider sums from s=0 to s=5, we conclude the number of permutations p is:

p=(s+1)*(s+2)/2 where s=0 to 5

It is a good idea to use a computer to verify every case above. We have not reached the peak of the distribution curve though and if you use a computer you will find p(6)=25. This contradicts the above formula which yields a value of 28. Now things are more interesting.

What happens here is that the x+y+z=s plane is not entirely inside the cube of points cut by the planes x=0, x=5, y=0, y=5, z=0, and z=5. This cube represents all the legal permutations. We can still project our x+y+z=s plane onto the xy plane but we must trim the corners that reach beyond the cube of legal values.

To shorten matters, we only consider the values s=6 and s=7. This is in fact as far as we have to go since p(7)=p(8)=peak value. We know or at least should believe the curve is symmetric about the peak so we can use a reflection to extend the points p(8) up to p(3*5)=p(15).

By the way, how do we know p(7)=p(8)=peak? The average value of one die is (0+5)/2=2.5 and 3 dice add up to 3*2.5=7.5. The peak is always centered on this number. Since it is halfway between 7 and 8, both 7 and 8 are peak values.

The projected plane for x+y+z=7 and its trimmed corners are shown below:

p3


The points for a single corner = 1 for s=6 and 1+2=3 for s=7. It seems clear we are adding successive integers only they start when s=6 and three times their sum is subtracted from the total number of points.

This yields the formula:

p=(s+1)*(s+2)/2 : s=0 to 5 p=((s+1)*(s+2)/2)-3*((s-5)*(1+s-5)/2) : s=6 to 7

In order to create a reflection about the peak:

p(s)=p(15-s) : s=8 to 15

Finally, we will correct for the fact that a die is in the range 1 to 6. If we increase the value on every die face by one, the sum will always be 3 more. Below the peak, for any given sum, the permutations will be as if the sum was 3 less than it is. We substitute s-3 for s:

p=(s-2)*(s-1)/2 : s=3 to 8 p=((s-2)*(s-1)/2)-3*((s-8)*(1+s-8)/2) : s=9 to 10 p(s)=p(21-s) : s=11 to 18 (We substitute 21 for 15 here because of the reflection characteristics.)

We conclude this paper with a graph of the distribution curve.

p4


Another paper is planned for this topic. It is of interest to extend the above formula for p(s) with n dice and each die in the range 0 to m. This can become a real challenge because we must move into hyperspace. There are cutting planes that add and subtract points in a binomial expansion fashion. You can not really visualize what is happening but you can see patterns at dimensions 2 and 3 and try to extend the formulas to higher dimensions. What suggests that they are correct is their correspondence with brute force computer calculations.



Home