Assignment 6, due 5/10, total value EXTRA 5% of grade

If you do none of it, you get 8% (100/100) automatically.

If you do all of it, you get an EXTRA 5% on your semester total grade.

This assignment deals with learning in Bayesian networks. You will develop learning algorithms for both the complete-data case and the incomplete-data case. You will apply these to the restaurant and oil-drilling domains. The complete-data algorithm is essentially counting in the training set. The incomplete-data algorithm is a modification of MCMC to count in the sampled complete states generated by the MCMC process. You should have an MCMC algorithm from A5; if not, we will be publishing the solution on Monday May 3rd (after the expiry of the 5 late days for A5); in the mean time, if you want to start earlier, you can use complete states generated by the rejection sampling algorithm instead.

**Be sure to use the latest version from ~cs188.**

**Question 1 (5 pts).**
To familiarize yourself with the learning code, generate
an incremental learning curve for decision tree learning applied to the
100 restaurant examples. Your curve should reflect 100 trials with data pooints
every 5 examples. Use `plot-alist` to write the output to a file
called `restaurant-dtl-curve.data`.
There is a gnuplot command file
called `restaurant-dtl-curve.plot`. Copy this
to your directory, and run it using the Unix command

/usr/sww/bin/gnuplot restaurant-dtl-curve.plotThe results should appear on the screen and then will be written as

**Question 2 (10 pts).**
Complete-data learning is described on pages 716-718 of AIMA2e.
The key is counting: the maximum-likelihood parameter estimates
are exactly the relative frequences of the appropriate events in
the data. For example, the conditional probability
*P*(*Y=true* | *X=true*) is estimated by the
fraction of cases with *X=true* that also have *Y=true*.
We will need to keep counts corresponding to every CPT entry in the network.
Recall that a CPT for a tabulated node is an array, indexed by parent values,
each element of which is a discrete distribution, i.e., a vector of probabilities.
Hence, corresponding to each CPT, we will need an array, each element
of which is a vector of counts. Write a function

(make-bn-counts bn)that creates and returns a set of such arrays, one for each node in the BN. Write also a function

(increment-bn-counts event bn counts)that updates and returns the set of count arrays for the new event.

**Question 3 (5 pts).**
Probably your first answer to Question 2 initialized the counts to 0.
Explain, by means of a simple example, why this can cause
problems when an ML-trained BN is used to answer queries
concerning a new example. [Hint: what happens to an inference algorithm when
given evidence that has probability zero according to the network?]

**Question 4 (10 pts).**
Now write a function

(complete-data-bn-learning examples bn)which is a given a set of examples and a BN, and returns the BN with CPT entries set to the ML estimates given the examples. [Hint: the

**Question 5 (5 pts).**
Write a function

(bn->hypothesis bn goal)that is analogous to

(complete-data-bayes-net-hypothesis examples goal bn)that is analogous to

**Question 6 (10 pts).**
In the file `learning/domains/restaurant-naive-bayes.bn` is a naive Bayes model
for the restaurant data, i.e., the root is *WillWait*
and the leaves are the 10 other attributes. (The model has
uniform CPTs; since your learning algorithm will modify them,
you can always use `set-cpts-to-uniform` from
`bayes-nets.lisp` to reset them.) Use an appropriate call
to `incremental-learning-curve` and call
`plot-alist` to write the output to a file
called `restaurant-naive-bayes-curve.data`.
Call gnuplot on `restaurant-naive-bayes-curve.plot`
to show the results. [Hint: this is a little bit tricky, because the
learning curve function expects an induction algorithm
that takes examples, attributes, and a goal as input.
Lambda to the rescue!]

With MCMC, the process is very simple indeed: each state visited by the MCMC algorithm, given an incomplete example as evidence, can be viewed as a possible completion of the example. Thus the E-step calls a suitably modified MCMC algorithm once for each example, and accumulates counts (just as in complete-data learning) for every complete state visited by MCMC. From the accumulated counts, the M-step recalculates all the CPTs.

**Question 7 (10 pts).**
You should have a function `(mcmc X e bn)` from A5.
Modify this to define a function

(counting-mcmc e bn counts)which calls

**Question 8 (20 pts).**
Write a function

(incomplete-data-bn-learning examples bn)that is analogous to

**Question 9 (10 pts).**
We can generate incomplete data from a Bayes net by
generating complete samples (using `prior-sample`
and deleting some of the attribute values.
For example, `*oil-100-incomplete-examples*` in
`learning/domains/oil.lisp` were generated from
`uncertainty/domains/oil.bn` with some of the variables hidden.
Generate a learning curve for your algorithm when applied
to this data with the goal of predicting *Bankrupt* (see
`*oil-goal*`). You may need to play with the number
of EM iterations and the number of MCMC steps per example;
you should be able to do 10 trials with data points every 10 examples.

Plot the results using
`oil-incomplete-curve.plot`
How do the results compare to the prediction performance of the true
network on the same 100 examples? Explain any apparently surprising
aspects of your learning curve.