This notebook walks through the usage of the context tree weighting implementation in ctw.py
.
First, we'll generate some data from a tree source. We'll do this using a somewhat unsightly parameter dictionary where each key represents a context and each value is a specification of parameters for the next samples. The parameter dictionary defined below generates two sequences, $X_n\in\{0,1,2\}$ and $Y_n\in\{0,1\}$ for $n=1,\dots,N$. The depth of the tree is chosen to be $D=2$, meaning that for each context $s=(x_{n-1},x_{n-2},y_{n-1},y_{n-2})$ we have the two parameter vectors $\theta_x = [p_{x0},p_{x1},p_{x2}]$ and $\theta_y = [p_{y0},p_{y1}]$ with $p_{xi} \triangleq Pr(x=i)$ and $\sum_i p_{xi}=1$ (and similarly for $y$). This particular set up demonstrates the ability of the CTW class to make predictions of $M$-ary sequences (in this case $M=3$) and to make predictions using side information, $Y_n$. You can make changes to the generating tree source, but note that for ease of building the data generation function, only complete trees are handled (an incomplete tree can be made by setting neighboring leaf nodes to have the same parameters).
%matplotlib inline
import sys
sys.path
sys.path.append('../python/')
import numpy as np
np.random.seed(0)
from gendata import gendata, plotprobs
# dictionary of parameters for depth 2 tree with x in {0,1,2} and y in {0,1}
# params['[[x{n-1},x{n-2}],[y{n-1},y{n-2}]]'] = ([p(xn=0),p(xn=1),p(xn=2)],[p(yn=0),p(yn=1)])
params ={
'[[0,0],[0,0]]':([1/3]*3,[1/2]*2),
'[[0,0],[0,1]]':([1/3,1/2,1/6],[1/2]*2),
'[[0,0],[1,0]]':([1/6,1/6,2/3],[1/2]*2),
'[[0,0],[1,1]]':([1/3]*3,[1/2]*2),
'[[0,1],[0,0]]':([1/3,1/2,1/6],[1/2]*2),
'[[0,1],[0,1]]':([7/9,1/9,1/9],[1/2]*2),
'[[0,1],[1,0]]':([1/3]*3,[1/2]*2),
'[[0,1],[1,1]]':([1/3]*3,[1/2]*2),
'[[0,2],[0,0]]':([1/6,1/6,2/3],[1/2]*2),
'[[0,2],[0,1]]':([1/3,1/2,1/6],[1/2]*2),
'[[0,2],[1,0]]':([1/3]*3,[1/2]*2),
'[[0,2],[1,1]]':([1/6,1/6,2/3],[1/2]*2),
'[[1,0],[0,0]]':([1/3,1/2,1/6],[1/2]*2),
'[[1,0],[0,1]]':([1/3]*3,[1/2]*2),
'[[1,0],[1,0]]':([1/3]*3,[1/2]*2),
'[[1,0],[1,1]]':([1/3]*3,[1/2]*2),
'[[1,1],[0,0]]':([1/3,1/2,1/6],[1/2]*2),
'[[1,1],[0,1]]':([1/3]*3,[1/2]*2),
'[[1,1],[1,0]]':([1/3]*3,[1/2]*2),
'[[1,1],[1,1]]':([1/3]*3,[1/2]*2),
'[[1,2],[0,0]]':([1/6,1/6,2/3],[1/2]*2),
'[[1,2],[0,1]]':([1/3]*3,[1/2]*2),
'[[1,2],[1,0]]':([1/3]*3,[1/2]*2),
'[[1,2],[1,1]]':([1/3]*3,[1/2]*2),
'[[2,0],[0,0]]':([1/3,1/2,1/6],[1/2]*2),
'[[2,0],[0,1]]':([1/3]*3,[1/2]*2),
'[[2,0],[1,0]]':([1/3]*3,[1/2]*2),
'[[2,0],[1,1]]':([1/3]*3,[1/2]*2),
'[[2,1],[0,0]]':([7/9,1/9,1/9],[1/2]*2),
'[[2,1],[0,1]]':([1/3,1/2,1/6],[1/2]*2),
'[[2,1],[1,0]]':([1/3]*3,[1/2]*2),
'[[2,1],[1,1]]':([1/3]*3,[1/2]*2),
'[[2,2],[0,0]]':([1/3,1/2,1/6],[1/2]*2),
'[[2,2],[0,1]]':([1/3]*3,[1/2]*2),
'[[2,2],[1,0]]':([7/9,1/9,1/9],[1/2]*2),
'[[2,2],[1,1]]':([7/9,1/9,1/9],[1/2]*2),
}
x,y = gendata(N=5000,params=params,plot_samples=100)
For viewing purposes, we are only showing the last 100 samples of the sequences. The bottom plot shows the true next sample probabilities of $X_n$ (i.e. $\theta_x$ at time $n$) because our CTW will be output a probability assignment for the next sample $X_n$ given the past of the sequence and the side information. In these plots, larger circles represent higher probabilities.
First we'll produce sequential probability assignments using a depth-2 context tree with access to side information. A couple notes on use of the CTW
class:
CTW
object, specify:depth
- Depth of treesymbols
- Number of symbols that $X$ can take (i.e. $M$)sidesymbols
- Number of symbols the $Y$ can take. If this is left blank, then there is assumed no side info.CTW.predict_sequence(x,sideseq=y)
, it is assumed that:x
only contains values in $0,1,\dots,$ symbols-1
y
only contains values in $0,1,\dots,$ sidesymbols-1
y
is omitted if not using side info and y
is included if using side infoctw.predict_sequence
returns an $M\times N$ matrix of probabilites, where the $n$th column represents the $M$ assigned probabilities of values of $X_n$.from ctw import CTW
ctw = CTW(depth=2,symbols=3,sidesymbols=2)
pxs = ctw.predict_sequence(x,sideseq=y)
plotprobs(pxs,estimate=True,plot_samples=100)
We can see that by the end of the when using the correct tree, with access to side information, the probability assignments match the true probabilities.
Next we will create sequential probability assignments from a depth-3 context tree. While (slightly) less efficient, we would hope that the probability assignments would be equally good after enough samples.
ctw = CTW(depth=3,symbols=3,sidesymbols=2)
pxs = ctw.predict_sequence(x,sideseq=y)
plotprobs(pxs,estimate=True,plot_samples=100)
By reducing the depth to make the tree too shallow, we will expect worse performance. In particular, consider the following two contexts from the parameter dictionary:
'[[0,2],[0,0]]':([1/6,1/6,2/3],[1/2]*2)
'[[0,2],[0,1]]':([1/3,1/2,1/6],[1/2]*2)
Note that they have the same depth-1 context but different depth-2 contexts. Then given that they have different parameter vectors, the predictability of $X_n$ when the depth-1 context is $(0,0)$ is going to be much worse.
ctw = CTW(depth=1,symbols=3,sidesymbols=2)
pxs = ctw.predict_sequence(x,sideseq=y)
plotprobs(pxs,estimate=True,plot_samples=100)
Similarly to above, if we remove the sideinformation, we can expect a decrease in accuracy from the probability assignments. This "decrease in accuracy" can be used to measure the causal influence of $Y$ on $X$.
ctw = CTW(depth=2,symbols=3)
pxs = ctw.predict_sequence(x)
plotprobs(pxs,estimate=True,plot_samples=100)
Suppose that $X$ and $Y$ are jointly $d$-Markov. Then when including $Y$ as side info when predicting $X$, we only need a tree of depth $d$. In general however, we cannot assume that $X$ is marginally $d$-Markov when leaving out the side info. As such the scenario considered above (example 5) may not be a true representation of how well $X$ can be predicted from its own past. To account for this, we can predict $X$ based on stale side info. In this case, we leave out the $k$ most recent samples of side info.
ctw = CTW(depth=4,symbols=3,sidesymbols=2,staleness=2)
pxs = ctw.predict_sequence(x,y)
plotprobs(pxs,estimate=True,plot_samples=100)