EXERCISE IN PROBABILITY NO. I
Many people that study mathematics are exposed to the idea of a bell curve.
The general formula of a normalized distribution curve is:
f(x) = (1/sqrt(2*pi)) * exp(-(x^2)/2)
The notation used is fairly readable to a computer programmer. It is the
same syntax used by a popular symbolic calculator.
If you have a set of data values, you can compute the value of the median
and shift and scale the bell curve so it peaks at that point. You can also
compute the standard deviation and spread out the curve based on that
information. The question I take on in this paper is how to approach a bell
curve when you are working with finite samples and you don't have a data set
to observe.
If you are given 3 dice and you roll them 6^3 times so that each permutation
occurs exactly once, what will the distribution curve look like? This is a
much more complex problem than pulling black and white balls out of a box
and noting their distribution. That example is commonly given to show that a
bell curve develops but you only get a true curve with an infinite number of
samples and an infinite sample size.
In the case of 3 dice, I am asking what does the true distribution curve look
like. This information can be critical when small sample sizes are
considered. I also have the problem that although I can easily calculate
where the peak occurs, I do not know the number of permutations that
correspond to that peak. I also have no idea of how the curve spreads out so
I can not fit it to a bell curve even if I wanted to.
I am
going to derive a formula for the above problem with an experimental
approach. I will use a computer to verify the answers are correct for small
samples. As long as the formula seems to work, I will use it. It I ever get
a wrong result, I will investigate the formula further. Often times the
error points to where the problem is. This is much like debugging a
program.
Initially, I found a rather complex expression for the above problem with 3
dice. Trying larger numbers of dice quickly choked the computer. I then
tried a radically different approach: if I want to know how
many permutations of 3 dice add up to the value s, I need only consider the
number of points in the plane x+y+z=s.
In order to simplify the calculations, assume the dice have the values 0 to
5. The final formula can be adjusted to correct for this. Another useful
simplification is to project the plane x+y+z=s onto the xy plane. None of
the points are lost even though the original plane is at a 45 degree angle
to the xy plane. This only works because we are counting finite points - not
calculating areas of planes.
Now lets look at the projection of the plane x+y+z=5 onto the xy plane. We
have 6 points, then 5, then 4, ..., down to 1. The total number of points
in the triangle is (6*7)/2=21. (You may recall the sum of the integers from
1 to n = n*(n+1)/2.) We can conclude that there are 21 permutations that add
up to 5. We check this with a computer that considers every permutation
using brute force and it yields 21 also.
As long as we consider sums from s=0 to s=5, we conclude the number of
permutations p is:
p=(s+1)*(s+2)/2 where s=0 to 5
It is a good idea to use a computer to verify every case above. We have not
reached the peak of the distribution curve though and if you use a computer
you will find p(6)=25. This contradicts the above formula which yields a
value of 28. Now things are more interesting.
What happens here is that the x+y+z=s plane is not entirely inside the cube
of points cut by the planes x=0, x=5, y=0, y=5, z=0, and z=5. This cube
represents all the legal permutations. We can still project our x+y+z=s
plane onto the xy plane but we must trim the corners that reach beyond the
cube of legal values.
To shorten matters, we only consider the values s=6 and s=7. This is in
fact as far as we have to go since p(7)=p(8)=peak value. We know or at
least should believe the curve is symmetric about the peak so we can use a
reflection to extend the points p(8) up to p(3*5)=p(15).
By the way, how do we know p(7)=p(8)=peak? The average value of one die is
(0+5)/2=2.5 and 3 dice add up to 3*2.5=7.5. The peak is always centered on
this number. Since it is halfway between 7 and 8, both 7 and 8 are peak
values.
The projected plane for x+y+z=7 and its trimmed corners are shown below:
The points for a single corner = 1 for s=6 and 1+2=3 for s=7. It seems clear
we are adding successive integers only they start when s=6 and three times
their sum is subtracted from the total number of points.
This yields the formula:
p=(s+1)*(s+2)/2 : s=0 to 5
p=((s+1)*(s+2)/2)-3*((s-5)*(1+s-5)/2) : s=6 to 7
In order to create a reflection about the peak:
p(s)=p(15-s) : s=8 to 15
Finally, we will correct for the fact that a die is in the range 1 to 6. If
we increase the value on every die face by one, the sum will always be 3
more. Below the peak, for any given sum, the permutations will be as if the
sum was 3 less than it is. We substitute s-3 for s:
p=(s-2)*(s-1)/2 : s=3 to 8
p=((s-2)*(s-1)/2)-3*((s-8)*(1+s-8)/2) : s=9 to 10
p(s)=p(21-s) : s=11 to 18 (We substitute 21 for 15 here because of
the reflection characteristics.)
We conclude this paper with a graph of the distribution curve.
Another paper is planned for this topic. It is of interest to extend the above
formula for p(s) with n dice and each die in the range 0 to m. This can
become a real challenge because we must move into hyperspace. There are
cutting planes that add and subtract points in a binomial expansion fashion.
You can not really visualize what is happening but you can see patterns at
dimensions 2 and 3 and try to extend the formulas to higher dimensions.
What suggests that they are correct is their correspondence with
brute force computer calculations.
Home