The outcomes of a chance event such as tossing a coin, rolling a die, or
drawing a card from a deck can be describe by two related methodologies,
probability and statistics. Roughly they differ in that
probability concerns statements before events take place, while
statistics involves after the fact analysis. Qualitatively probability
is more based on proof while statistics tends to involve judgement.
There are three approaches to probability: analysis of the ways that
a chance event can occur; specification of one's personal belief about
the occurrence - i.e., that it takes place; and the relative frequency of
it happening - the proportion of favorable cases in a fixed number of trials.
In some situations reasoning can conclude that a result is likely a
certain number of times. When two possibilities are equally likely as in
the case of tossing a coin this leads to geometric probabilities
of one half for both head and tail results. The same is
true for six faces of a die or one card chosen from fifty-two. Without
any other indication it is likely that there is a one-sixth probability
for each number of the six on the die faces, and a one-fifty-second
probability for selecting each and every card in the deck. One of the
basic tools of probability by geometric means is the sample
space: essentially a diagram that shows all the possible outcomes
and thus lists the alternatives. (Without any other factors one
would conclude that the sample space outcomes are equally likely. That
leads to dividing the number one which stands for all the probability by the
number of different sample space points. The following are exercises in
geometric probability.
This concerns simultaneously tossing a fair coin while rolling a die with six
numbered faces. The faces are numbered one through six and the die is
also fair.
Draw the sample space.
How big is the
sample space in
terms of points corresponding to die-roll/coin-toss outcomes?
What is
the probability of a three(die)-head(coin) outcome?
Four births occur. Of the following two events which is most probable?
Half male, half female.
Exactly three of one sex.
The contrast with analysis is provided by acting or betting.
Sometimes people can believe strongly about a future occurrence such as
rain, price changes, or another random event. When that thing is not
sure, the odds or probability that it takes place represent belief of an
individual, or many individuals. Relative frequency measures what actually happens in many cases of the same random situation. Favorable results
divided by total trials, a statistical measure, is an empirical
probability.
If two random things take place that have nothing to do with one another
there are three separate probabilities to consider: that of the first
alone, of just the second, and that from both taking place. When they truly
have no relationship, the last probability is the product of the first two, and
those are called independent. But it often happens that random
things have some relationship. If this is so we use a written language
based on a division-like symbol to record that in a formal way.
Dependence of two random events called A and B is when the product of
the individual probabilities P(A) times P(B) does not create P(A, B),
the probability of both occurring simultaneously. When we know that one
of the events has occurred, say B, we can find the way that impacts the
probability of A taking place. We write this as P(A|B) (read
"probability of A given that B occurred" or "the conditional probability
of A given B"). The basis of a decision-making is the following
definition of conditional probability:
P(A|B) = P(A, B)/P(B), wherever P(B) > 0 (1)
Multiplication by the denominator portion P(B) and But (1) holds
whatever the letter labels are used. So by interchange of A and B, followed by clearing away division by P(A), in the similar expression, leads to the following:
P(A|B)P(B) = P(A, B) = P(B|A)P(A) (2)
Suppose a ten sound string is associated with a word like
orthopedist or orthopedic. We can index the sounds to indicate their sequence position. The decision to be made is what has generated
the observed s10
given the two things we know: a) the previous nine sounds s1,
... , s9, and b) observations
about s10. [We identify the event A as the observed sound
string s1,
... , s9, and B as s10.]
Models
In many situations there is neither independence nor complete
dependence. In analytical terms, this middle ground means P(A, B) differs from P(A)P(B) but the parts of A differ in their influence on how it is related to B.
Independence means that (1) becomes P(A|B) = P(A, B) = P(A)P(B)/P(B) = P(A) and
likewise:
P(B|A) = P(A, B) = P(B)P(A)/P(A) = P(B) (3)
Can (2) deal with the tenth-sound-in-a-word decision? Formally,
If the current sound value depends only on the immediately prior one,
this becomes:
P(s9|s10)P(s10) =
P(s10|s9)P(s9)
If we know: 1) for every possible s10 value, the probability that
all possible s9 quantities occurred; 2) the same thing with 9
and 10 interchanged; and, 3) the unconditional probabilities of all the
s9 values; then the
decision about s10 can be made. One chooses the
s10 where the probability P(s10) computed from
P(s10|s9)P(s9)/P(s9|s10)
is the greatest.
A sometimes reasonable assumption is that only adjacent sounds influence one another. That could be true in the occurrence case for
s10. The next section describes this.