\documentstyle[fleqn,epsf,aima-slides]{article}
\begin{document}
\begin{huge}
\titleslide{Uncertainty}{Chapter 14}
\sf
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Outline}
\blob Uncertainty
\blob Probability
\blob Syntax
\blob Semantics
\blob Inference rules
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Uncertainty}
Let action $A_t$ = leave for airport $t$ minutes before flight\\
Will $A_t$ get me there on time?
Problems:\al
1) partial observability (road state, other drivers' plans, etc.)\al
2) noisy sensors (KCBS traffic reports)\al
3) uncertainty in action outcomes (flat tire, etc.)\al
4) immense complexity of modelling and predicting traffic
Hence a purely logical approach either\\
\phantom{or }1) risks falsehood: ``$A_{25}$ will get me there on time''\\
or 2) leads to conclusions that are too weak for decision making:\nl
``$A_{25}$ will get me there on time if there's no accident on the bridge\nl
and it doesn't rain and my tires remain intact etc etc.''
($A_{1440}$ might reasonably be said to get me there on time\\
but I'd have to stay overnight in the airport $\ldots$)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Methods for handling uncertainty}
\u{Default} or \u{nonmonotonic} logic:\al
Assume my car does not have a flat tire\al
Assume $A_{25}$ works unless contradicted by evidence\\
Issues: What assumptions are reasonable? How to handle contradiction?
\u{Rules with fudge factors}:\al
$A_{25} \mapsto_{0.3}$ get there on time\al
$Sprinkler \mapsto_{0.99} WetGrass$\al
$WetGrass \mapsto_{0.7} Rain$\\
Issues: Problems with combination, e.g., $Sprinkler$ causes $Rain$??
\u{Probability}\al
Given the available evidence,\nl
$A_{25}$ will get me there on time with probability 0.04\\
Mahaviracarya (9th C.), Cardamo (1565) theory of gambling
(\u{Fuzzy logic} handles {\em degree of truth} NOT uncertainty e.g.,\al
$WetGrass$ is true to degree 0.2)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Probability}
Probabilistic assertions {\em summarize} effects of\al
\u{laziness}: failure to enumerate exceptions, qualifications, etc.\al
\u{ignorance}: lack of relevant facts, initial conditions, etc.
\u{Subjective} or \u{Bayesian} probability:\\
Probabilities relate propositions to one's own state of knowledge\nl
e.g., $P(A_{25} | \mbox{no reported accidents}) = 0.06$
These are \u{not} assertions about the world
Probabilities of propositions change with new evidence:\nl
e.g., $P(A_{25} | \mbox{no reported accidents},\ \mbox{5 a.m.}) = 0.15$
(Analogous to logical entailment status $KB \models \alpha$, not truth.)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Making decisions under uncertainty}
Suppose I believe the following:
\begin{eqnarray*}
P(A_{25}\mbox{ gets me there on time} | \ldots) &=& 0.04 \\
P(A_{90}\mbox{ gets me there on time} | \ldots) &=& 0.70 \\
P(A_{120}\mbox{ gets me there on time} | \ldots) &=& 0.95 \\
P(A_{1440}\mbox{ gets me there on time} | \ldots) &=& 0.9999
\end{eqnarray*}
Which action to choose?
Depends on my \u{preferences} for missing flight vs. airport cuisine, etc.
\u{Utility theory} is used to represent and infer preferences
\u{Decision theory} = utility theory + probability theory
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Axioms of probability}
For any propositions $A$, $B$
1. $0 \leq P(A) \leq 1$\\
2. $P(True) = 1$ and $P(False) = 0$\\
3. $P(A \lor B) = P(A) + P(B) - P(A\land B)$
\vspace*{0.2in}
\epsfxsize=0.45\textwidth
\fig{\file{figures}{axiom3-venn.ps}}
de Finetti (1931): an agent who bets according to probabilities that violate
these axioms can be forced to bet so as to lose money regardless of outcome.
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Syntax}
Similar to propositional logic: possible worlds defined by assignment of
values to \u{random variables}.
\u{Propositional} or \u{Boolean} random variables\al
e.g., $Cavity$ (do I have a cavity?)\\
Include propositional logic expressions\al
e.g., $\lnot Burglary \lor Earthquake$
\u{Multivalued} random variables\al
e.g., $Weather$ is one of $\$\\
Values must be exhaustive and mutually exclusive
Proposition constructed by assignment of a value:\al
e.g., $Weather \eq sunny$; also $Cavity \eq true$ for clarity
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Syntax contd.}
\u{Prior} or \u{unconditional probabilities} of propositions\al
e.g., $P(Cavity) = 0.1$ and $P(Weather \eq sunny) = 0.72$\\
correspond to belief prior to arrival of any (new) evidence
\u{Probability distribution} gives values for all possible assignments:\al
$\pv(Weather) = \<0.72,0.1,0.08,0.1\>$ (\u{normalized}, i.e., sums to 1)
\u{Joint probability distribution} for a set of variables gives\\
values for each possible assignment to all the variables\al
$\pv(Weather,Cavity)$ = a $4 \times 2$ matrix of values:
\[\begin{array}{l|cccc}
\hfil Weather \eq & sunny & rain & cloudy & snow \\
\hline
Cavity \eq true & & & & \\
Cavity \eq false & & & &
\end{array}\]
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Syntax contd.}
\u{Conditional} or \u{posterior probabilities}\al
e.g., $P(Cavity | Toothache) = 0.8$\al
i.e., \u{\u{given that $Toothache$ is all I know}}
Notation for conditional distributions:\al
$\pv(Weather | Earthquake)$ = 2-element vector of 4-element vectors
If we know more, e.g., $Cavity$ is also given, then we have\al
$P(Cavity | Toothache,Cavity) = 1$\\
Note: the less specific belief {\em remains valid} after more evidence
arrives, but is not always {\em useful}
New evidence may be irrelevant, allowing simplification, e.g.,\al
$P(Cavity | Toothache,49ersWin) = P(Cavity | Toothache) = 0.8$\\
This kind of inference, sanctioned by domain knowledge, is crucial
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Conditional probability}
Definition of conditional probability:
\[
P(A|B) = \frac{P(A\land B)}{P(B)} \mbox{ if } P(B) \neq 0
\]
\u{Product rule} gives an alternative formulation:\al
$P(A\land B) = P(A|B)P(B) = P(B|A)P(A)$
A general version holds for whole distributions, e.g.,\al
$\pv(Weather,Cavity) = \pv(Weather|Cavity) \pv(Cavity)$\\
(View as a $4\times 2$ set of equations, {\em not} matrix mult.)
\u{Chain rule} is derived by successive application of product rule:\al
$\pv(X_1,\ldots,X_n) = \pv(X_1,\ldots,X_{n-1})\
\pv(X_n | X_1,\ldots,X_{n-1})$\nl
= $\pv(X_1,\ldots,X_{n-2})\
\pv(X_{n_1} | X_1,\ldots,X_{n-2})\
\pv(X_n | X_1,\ldots,X_{n-1})$\nl
= $\ldots$\nl
= $\myprod_{i\eq 1}^n \pv(X_i | X_1,\ldots,X_{i-1})$
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Bayes' Rule}
Product rule $P(A\land B) = P(A|B)P(B) = P(B|A)P(A)$
\[
{}\implies \mbox{\u{Bayes' rule }} P(A|B) = \frac{P(B|A)P(A)}{P(B)}
\]
Why is this useful???
For assessing \u{diagnostic} probability from \u{causal} probability:
\[
P(Cause|Effect) = \frac{P(Effect|Cause)P(Cause)}{P(Effect)}
\]
E.g., let $M$ be meningitis, $S$ be stiff neck:
\[
P(M|S) = \frac{P(S|M)P(M)}{P(S)} = \frac{0.8 \times 0.0001}{0.1} = 0.0008
\]
Note: posterior probability of meningitis still very small!
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Normalization}
Suppose we wish to compute a posterior distribution over $A$\\
given $B\eq b$, and suppose $A$ has possible values $a_1 \ldots a_m$
We can apply Bayes' rule for each value of $A$:\al
$P(A\eq a_1|B\eq b) = P(B\eq b|A\eq a_1)P(A\eq a_1)/P(B\eq b)$\al
$\ldots$\al
$P(A\eq a_m|B\eq b) = P(B\eq b|A\eq a_m)P(A\eq a_m)/P(B\eq b)$\\
Adding these up, and noting that $\mysum_i P(A\eq a_i|B\eq b) = 1$:
\[1/P(B\eq b) = 1/\mysum_i P(B\eq b|A\eq a_i)P(A\eq a_i)\]
This is the \u{normalization factor}, constant w.r.t.~$i$, denoted $\alpha$:
\[
\pv(A|B\eq b) = \alpha \pv(B\eq b | A)\pv(A)
\]
Typically compute an unnormalized distribution, normalize at end\al
e.g., suppose $\pv(B\eq b | A)\pv(A) = \<0.4,0.2,0.2\>$\nl
then $\pv(A|B\eq b) = \alpha \<0.4,0.2,0.2\>
= \frac{\<0.4,0.2,0.2\>}{0.4+0.2+0.2}
= \<0.5,0.25,0.25\>$
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Conditioning}
Introducing a variable as an extra condition:
\[
P(X|Y) = \mysum_z P(X|Y,Z\eq z) P(Z\eq z|Y)
\]
Intuition: often easier to assess each specific circumstance, e.g.,\\
$P(RunOver|Cross)$\al
= $P(RunOver|Cross,Light\eq green)P(Light\eq green|Cross)$\al
+ $P(RunOver|Cross,Light\eq yellow)P(Light\eq yellow|Cross)$\al
+ $P(RunOver|Cross,Light\eq red)P(Light\eq red|Cross)$
When $Y$ is absent, we have \u{summing out} or \u{marginalization}:
\[
P(X) = \mysum_z P(X|Z\eq z) P(Z\eq z) = \mysum_z P(X,Z\eq z)
\]
In general, given a joint distribution over a set of variables, the
distribution over any subset (called a \u{marginal} distribution for
historical reasons) can be calculated by summing out the other
variables.
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Full joint distributions}
A \u{complete probability model} specifies every entry in the joint
distribution for all the variables $\mbf{X} = X_1,\ldots,X_n$\\
I.e., a probability for each possible world $X_1\eq x_1,\ldots,X_n\eq x_n$
(Cf. complete theories in logic.)
E.g., suppose $Toothache$ and $Cavity$ are the random variables:
\[\begin{array}{l|cc}
& Toothache\eq true & Toothache\eq false \\
\hline
Cavity \eq true & 0.04 & 0.06 \\
Cavity \eq false & 0.01 & 0.89
\end{array}\]
Possible worlds are mutually exclusive $\implies$ $P(w_1 \land w_2) = 0$\\
Possible worlds are exhaustive $\implies$ $w_1 \lor \cdots \lor w_n$ is $True$\nl
hence $\mysum_i P(w_i) = 1$
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Full joint distributions contd.}
1) For any proposition $\phi$ defined on the random variables\nl
$\phi(w_i)$ is true or false
2) $\phi$ is equivalent to the disjunction of $w_i$s where $\phi(w_i)$ is true
Hence $P(\phi) = \mysum_{\{w_i:\ \phi(w_i)\}} P(w_i)$
I.e., the unconditional probability of any proposition is computable
as the sum of entries from the full joint distribution
Conditional probabilities can be computed in the same way as a ratio:
\[
P(\phi|\xi) = \frac{P(\phi\land \xi)}{P(\xi)}
\]
E.g.,
\[
P(Cavity |Toothache)
= \frac{P(Cavity \land Toothache)}{P(Toothache)}
= \frac{0.04}{0.04+0.01} = 0.8
\]
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Inference from joint distributions}
Typically, we are interested in \al
the posterior joint distribution of the \u{query variables} $\mbf{Y}$\al
given specific values $\mbf{e}$ for the \u{evidence variables} $\mbf{E}$
Let the \u{hidden variables} be $\mbf{H} = \mbf{X} - \mbf{Y} - \mbf{E}$
Then the required summation of joint entries is done by summing out
the hidden variables:
\[
\pv(\mbf{Y}|\mbf{E}\eq \mbf{e}) = \alpha \pv(\mbf{Y},\mbf{E}\eq \mbf{e})
= \alpha \mysum_{\smbf{h}} \pv(\mbf{Y},\mbf{E}\eq \mbf{e},\mbf{H}\eq \mbf{h})
\]
The terms in the summation are joint entries because $\mbf{Y}$, $\mbf{E}$, and $\mbf{H}$ together exhaust the set of random variables
Obvious problems:\al
1) Worst-case time complexity $O(d^n)$ where $d$ is the largest arity\al
2) Space complexity $O(d^n)$ to store the joint distribution\al
3) How to find the numbers for $O(d^n)$ entries???
\end{huge}
\end{document}