This notebook accompanies Lecture 17. The exercises should be used to gain facility with computing interventional distributions and generally playing around with encoding joint distributions as Bayes Nets.
To require the fewest dependencies, this notebook encode distributions in a bare-bones way. You should feel free to use matrix or object representations as you see fit.
Our first task will be to encode the conditional probability distributions for the graph
$B$ $E$
$\searrow~\swarrow$
$A$
$\swarrow~\searrow$
$F$ $L$
# use indices to denote values that the variable takes on
B = [0.99, 0.01]
E = [0.98, 0.02]
# Altered from lecture -- using the outer index for the variable's value
# So: P(A=1 | B=0, E=1) is accessed like so:
# > A[1][0][1]
# old definition: A[b][e][a] -> A[a][b][e]
# A = [[[0.99, 0.01], [0.10, 0.90]], [[0.05, 0.95], [0.01, 0.99]]]
A = [[[None, None], [None, None]], [[None, None], [None, None]]]
for (b, bprobs) in enumerate([[[0.99, 0.01], [0.10, 0.90]], [[0.05, 0.95], [0.01, 0.99]]]):
for (e, eprobs) in enumerate(bprobs):
for (a, p) in enumerate(eprobs):
A[a][b][e] = p
# old definitions: outer index is value of A
# F = [[0.95, 0.05], [0.25, 0.75]]
# L = [[0.90, 0.10], [0.05, 0.95]]
F = [[None, None], [None, None]]
L = [[None, None], [None, None]]
for (a, aprobs) in enumerate([[0.95, 0.05], [0.25, 0.75]]):
for (f, p) in enumerate(aprobs):
F[f][a] = p
for (a, aprobs) in enumerate([[0.90, 0.10], [0.05, 0.95]]):
for (l, p) in enumerate(aprobs):
L[l][a] = p
# Quick sanity check to ensure, e.g., P(A=0 | B=1, E=0) + P(A=1 | B=1, E=0) = 1.0
assert(abs(A[0][0][0] + A[1][0][0] - 1.0) < 0.001)
assert(abs(A[0][0][1] + A[1][0][1] - 1.0) < 0.001)
assert(abs(A[0][1][0] + A[1][1][0] - 1.0) < 0.001)
assert(abs(A[0][1][1] + A[1][1][1] - 1.0) < 0.001)
assert(abs(F[0][1] + F[1][1] - 1.0) < 0.001)
assert(abs(F[0][0] + F[1][0] - 1.0) < 0.001)
assert(abs(L[0][1] + L[1][1] - 1.0) < 0.001)
assert(abs(L[0][0] + L[1][0] - 1.0) < 0.001)
We can then answer the usual queries, e.g., $P(F=1\mid A=1)$ by looking up their values.
print(F[1][1])
0.75
We can compute arbitrary queries in the usual way. For example, we can compute the marginal probability of F (i.e., $P(F)$) using the factorization of $P(B, E, A, F, L)$.
# F takes on 2 values (0 and 1), so we initialize our representation of its
# parameters as a 2-vector, indexed on the value of F
PF = [0.0, 0.0]
for b in range(len(B)):
for e in range(len(E)):
for a in range(len(A)):
for f in range(len(F)):
for l in range(len(L)):
# marginalize over P(F|A)P(L|A)P(A|E,B)P(E)P(B)
PF[f] += F[f][a] * L[l][a] * A[a][b][e] * E[e] * B[b]
# Let's make sure we've just computed a valid probability distribution
print(PF)
assert(abs((PF[0] + PF[1]) - 1.0) < 0.001)
[0.924079, 0.075921]
We can now compute $P(F\mid do(A:=1))$. We could express this in two ways: we can update the probability distribution, setting all instances where A=1 to have probability 1.0, or we could modify our loop. Let's do one, then the other.
# What is P(F|do(A:=1))?
newA = [[[0, 0], [0, 0]], [[1, 1], [1, 1]]]
PFdoA1 = [0.0, 0.0]
for b in range(len(B)):
for e in range(len(E)):
for a in range(len(newA)):
for f in range(len(F)):
for l in range(len(L)):
# marginalize over P(F|A)P(L|A)P(A|E,B)P(E)P(B)
# Change (2): A takes on 1 with probability 1; replace A[b][e][a] with 1.0 (or remove)
PFdoA1[f] += F[f][a] * L[l][a] * newA[a][b][e] * E[e] * B[b]
# Let's make sure we've just computed a valid probability distribution
assert(abs((PFdoA1[0] + PFdoA1[1]) - 1.0) < 0.001)
# equivalently, without using newA
PFdoA1_2 = [0.0, 0.0]
for b in range(len(B)):
for e in range(len(E)):
# Change (1): manually restricting values of A from [0,1] to [1]
for a in [1]:
for f in [0, 1]:
for l in [0, 1]:
# marginalize over P(F|A)P(L|A)P(A|E,B)P(E)P(B)
# Change (2): A takes on 1 with probability 1; replace A[b][e][a] with 1.0 (or remove)
PFdoA1_2[f] += F[f][a] * L[l][a] * 1.0 * E[e] * B[b]
# Let's make sure we've just computed a valid probability distribution
assert(abs((PFdoA1_2[0] + PFdoA1_2[1]) - 1.0) < 0.001)
# assert that the results are the same
assert(abs((PFdoA1[0] - PFdoA1_2[0])) < 0.001)
assert(abs((PFdoA1[1] - PFdoA1_2[1])) < 0.001)
Now that we have the interventional distribution, we can compute the effects of setting A:=0 vs. A:=1. Since we may not have observed instances of both A:=0 and A:=1, the way to do this is to compute the expected value of the effect: $\mathbb{E}(F\mid do(A:=0) - F\mid do(A:=1)) = \sum_{f\in\{0, 1\}} f * P(f\mid do(A:=0)) - \sum_{f\in\{0, 1\}} f * P(f\mid do(A:=1))$.
# What is the effect of intervention on A?
# First need to compute effect of intervetion under A:=0:
# TODO
PFdoA0 = [-1, -1]
EFdoA0 = sum([f * PFdoA1[f] for f in range(len(F))])
EFdoA1 = sum([f * PFdoA0[f] for f in range(len(F))])
print("Effect of setting A:=1:\t", EFdoA0 - EFdoA1)
Effect of setting A:=1: -1.75
# Does P(F|A=1) differ from P(F|do(A:=1))? Why or why not?
# What is the effect of intervention on A for the modified graph below?
# use indices to denote values that the variable takes on
B = [0.99, 0.01]
E = [0.98, 0.02]
# outer index is B; inner is E. don't do this at home.
A = [[[0.99, 0.01], [0.10, 0.90]], [[0.05, 0.95], [0.01, 0.99]]]
# outer index is value of A; inner is B
F = [[[0.99, 0.01], [0.98, 0.02]], [[0.97, 0.03], [0.96, 0.04]]]
L = [[0.90, 0.10], [0.05, 0.95]]
# practive computing each of P(F), P(F|do(A:=0)), P(F|do(A:=1)), and E[F|do(A:=0) - F|do(A:=1)]