Lovegrove Mathematicals

logo

"Dedicated to making Likelinesses the entity of prime interest"

the Multinomial Theorem

Prologue

Let P be a set of N strictly positive reals, say P={x1, ..., xN}.

If n is a non-negative integer then usually the mean of the n'th powers of x1, ..., xN is not equal to the n'th power of their mean. Sometimes, however, it is. Under what circumstances do we have equality?

It is not difficult to identify 3 cases:-

  1. N= 1 We have x1n/1 and [x1/1]n, respectively
  2. n= 0 We have N/N and (something)0, respectively
  3. n= 1 We have [x1+...xN]/N in both cases

We can write these as:-

  1. P is a singleton set: case (i)
  2. n<2: cases (ii)&(iii)

Of these, 1 is the important case. 2 is trivial: we usually wouldn't even think about invoking the Multinomial Theorem to raise something to the power 0 or 1.

The relevance of this is that we are talking about the interaction between means and raising to powers. The importance of this is that we are inter alia talking about:-

The interaction between likelinesses and the Multinomial Theorem.
Likelinesses are mean probabilities, and the Multinomial Theorem is all about raising to powers.
The Perfect Cube Factory.
The argument behind The Perfect Cube Factory is founded on the squaring (and cubing) of mean distributions and then comparing with the mean of squares.
Multinomial Theorem

Commentary on the Multinomial Theorem

The proof of the Theorem can be found in Fundamentals of Likelinesses.

The statement that Multinomial statementis what I call the 'Multinomial Statement'.

The LHS of the Multinomial Statement is basically about the mean of powers, with the RHS being about powers of means. As would be expected from the Prologue, the Multinomial Theorem says that the Multinomial Statement applies if the underlying set is singleton or if the required integram is 0 or "i" (for some i).

Multinomial Consistency

'Great Likelinesses' calculates the Multinomial Consistency. This is the ratio between the two sides of the Multinomial Statement, and is given by Multinomial Consistency.

This can be interpreted as the ratio between what we are trying to find -the best estimate of a probability- and what we get if we try to find it by substituting the best-estimates of the probabilities of "1", ...,"N" into the Multinomial Statement without regard to the conditions in the Multinomial Theorem.

Example

Multinomial Consistency graph

This figure shows CM(g,αh,P) for g=(1,2,4), h=(1,0,0), P=R(3).

The introduction of α is a simple way of varying the sample size of the given histogram whilst keeping the relative frequencies the same.

This graph shows that if we were to best-estimate the probabilities of "1","2" and "3", and then treat those estimates as if they were the probabilities themselves (rather than just estimates) by substituting them into the Multinomial Statement then our estimate of the probability of (1,2,4) would be in error by a factor of about 5 with the given histogram h=(1,0,0), but that with h=(5,0,0) the error would be more than a factor of 15.

At heart, all that the Theorem is saying (or, rather, that its converse would say) is that -apart from a few simple exceptions, as stated in the Theorem- the mean of n'th powers isn't equal to the n'th power of the mean.

It is at this point that the Multinomial Theorem links with the Perfect Cube Factory.

That there are examples with ω(g)>1 for which the Multinomial Consistency is not 1 means that the Multinomial Statement cannot justifiably be used in any case consisting of a non-singleton underlying set together with ω(g)>1 without proof that that may be done in that instance.

There is nothing special about the Multinomial Statement, here. The same difficulty arises with any non-linear (strictly, non-affine) formula.