Probability
Basics

The outcomes of a chance event such as tossing a coin, rolling a die, or drawing a card from a deck can be describe by two related methodologies, probability and statistics. Roughly they differ in that probability concerns statements before events take place, while statistics involves after the fact analysis. Qualitatively probability is more based on proof while statistics tends to involve judgement. There are three approaches to probability: analysis of the ways that a chance event can occur; specification of one's personal belief about the occurrence - i.e., that it takes place; and the relative frequency of it happening - the proportion of favorable cases in a fixed number of trials. In some situations reasoning can conclude that a result is likely a certain number of times. When two possibilities are equally likely as in the case of tossing a coin this leads to geometric probabilities of one half for both head and tail results. The same is true for six faces of a die or one card chosen from fifty-two. Without any other indication it is likely that there is a one-sixth probability for each number of the six on the die faces, and a one-fifty-second probability for selecting each and every card in the deck. One of the basic tools of probability by geometric means is the sample space: essentially a diagram that shows all the possible outcomes and thus lists the alternatives. (Without any other factors one would conclude that the sample space outcomes are equally likely. That leads to dividing the number one which stands for all the probability by the number of different sample space points. The following are exercises in geometric probability.

This concerns simultaneously tossing a fair coin while rolling a die with six numbered faces. The faces are numbered one through six and the die is also fair.
1. Draw the sample space.
2. How big is the sample space in terms of points corresponding to die-roll/coin-toss outcomes?
3. What is the probability of a three(die)-head(coin) outcome?
Four births occur. Of the following two events which is most probable?
1. Half male, half female.
2. Exactly three of one sex.

The contrast with analysis is provided by acting or betting. Sometimes people can believe strongly about a future occurrence such as rain, price changes, or another random event. When that thing is not sure, the odds or probability that it takes place represent belief of an individual, or many individuals. Relative frequency measures what actually happens in many cases of the same random situation. Favorable results divided by total trials, a statistical measure, is an empirical probability.

If two random things take place that have nothing to do with one another there are three separate probabilities to consider: that of the first alone, of just the second, and that from both taking place. When they truly have no relationship, the last probability is the product of the first two, and those are called independent. But it often happens that random things have some relationship. If this is so we use a written language based on a division-like symbol to record that in a formal way.

Dependence of two random events called A and B is when the product of the individual probabilities P(A) times P(B) does not create P(A, B), the probability of both occurring simultaneously. When we know that one of the events has occurred, say B, we can find the way that impacts the probability of A taking place. We write this as P(A|B) (read "probability of A given that B occurred" or "the conditional probability of A given B"). The basis of a decision-making is the following definition of conditional probability:

P(A|B) = P(A, B)/P(B), wherever P(B) > 0 (1)
Multiplication by the denominator portion P(B) and But (1) holds whatever the letter labels are used. So by interchange of A and B, followed by clearing away division by P(A), in the similar expression, leads to the following:

P(A|B)P(B) = P(A, B) = P(B|A)P(A) (2)
Suppose a ten sound string is associated with a word like orthopedist or orthopedic. We can index the sounds to indicate their sequence position. The decision to be made is what has generated the observed s₁₀ given the two things we know: a) the previous nine sounds s₁, ... , s₉, and b) observations about s₁₀. [We identify the event A as the observed sound string s₁, ... , s₉, and B as s₁₀.]

Models

In many situations there is neither independence nor complete dependence. In analytical terms, this middle ground means P(A, B) differs from P(A)P(B) but the parts of A differ in their influence on how it is related to B.

Independence means that (1) becomes P(A|B) = P(A, B) = P(A)P(B)/P(B) = P(A) and likewise:

P(B|A) = P(A, B) = P(B)P(A)/P(A) = P(B) (3)
Can (2) deal with the tenth-sound-in-a-word decision? Formally,

P(s₁, ... , s₉|s₁₀)P(s₁₀) = P(A|B)P(B) = P(B|A)P(A) = P(s₁₀|s₁, ... , s₉)P(s₁, ... , s₉)
If the current sound value depends only on the immediately prior one, this becomes:

P(s₉|s₁₀)P(s₁₀) = P(s₁₀|s₉)P(s₉)

If we know: 1) for every possible s₁₀ value, the probability that all possible s₉ quantities occurred; 2) the same thing with 9 and 10 interchanged; and, 3) the unconditional probabilities of all the s₉ values; then the decision about s₁₀ can be made. One chooses the s₁₀ where the probability P(s₁₀) computed from

P(s₁₀|s₉)P(s₉)/P(s₉|s₁₀)

is the greatest.

A sometimes reasonable assumption is that only adjacent sounds influence one another. That could be true in the occurrence case for s₁₀. The next section describes this.

	4/22/02 Version			http://www.cs.ucla.edu/~klinger/conditional.html
	©2002 Allen Klinger