Like many (all?) static analysis tools, our PlanOut analyzer does not have to be complete, but it should always be sound. Therefore, we must ensure that we never return estimators that are incorrect.

Conditional random assignment presents a complication on this front – in order to reason about conditionally assigned variants, we need to determine what values of variables in a particular guard lead us down a particular path. To solve this problem in the general case, we need an SMT solver.

Do we really need an SMT solver? Why not just SAT?

Previously, I just punted whenever we saw an expression in a guard that evaluated down to a relation, constraining our analysis to only those scripts that used strict boolean expressions in their guards. This meant that programs with conditional random assignment that relied on external boolean variables (e.g. whether a user is in a valid set of participants) were okay, but those that relied on numeric ranges were not okay. At first glance, it might seem that this is an acceptable tradeoff. However, numeric ranges and comparisons are useful for testing, e.g., specifying ranges of whitelisted or blacklisted versions or identifiers like userids or versionids.

What is whitelisting/blacklisting? Whitelisting and blacklisting bring up an important issue that we face when we mechanize and formalize processes that have previously been performed entirely by humans: systems need concrete rules to follow and don’t have human intuition. Note: we want to allow whitelisting and blacklisting – these are important in vivo QA features. While an experiment is running, I may want to test the current version for me and ensure that the control is working for my coworker. My coworker and I technically should not be included in the analysis of the experiment, since our variants were not randomly assigned. However, for a sufficiently large sample, the effect of our non-random assignment would have essentially no impact on the results. It’s up to the analyst to decide whether these data make a difference. You may think, why not just exclude the data? It certainly couldn’t hurt. However, doing so may be tricky, may require extra computing resources, or may simply lead to difficult-to-read code. When there is no formal reason to include or exclude these data, the analyst may choose the option that leads to the simplest extraction or data processing code.

A new attempt, using Z3

Unfortunately, the analyzer does not have any of the analyst’s domain knowledge. We will always need to consider all paths through the program. This is not inherently a bad thing, but it means we need to seriously consider the impact of conditional random assignment and plan accordingly. However, let’s first look at how the analyzer deals with input PlanOut programs.

Parsing. First we parse the JSON representation into a typed AST. The JSON representation has no knowledge of types, so we must either infer them, or using information provided by the user in the form of config files. We have typed and untyped versions of Gets to deal with unknown types, and use any input config information to annotate external sources of randomness.

Normalization. We run constant propagation, mark external variables, reduce arithmetic and boolean expressions, and transform all if-statements into expression that can be evaluated independently, per path. So, if we have some program snippet:

a <- ext_fun_call();
if (a && b) {
  a <- "foo";
} else if (a) {
  b <- "bar";

it will be transformed into:

a <- ext_fun_call();
if (a && b) {
  a <- "foo";
} else if (a && !(a && b)) {
  b <- "bar"
} else if (!(a || (a && b))) {
  # Nothing here.

Clearly we could also add some optimization on guards: the second could reduce to a && !b and the third to !a. It’s not entirely clear to me that we would save time doing this, though. If the program runs slowly in the future, or if it turns out that minimizing the guard is necessary to prove some property in the program, then I will consider doing this in the future. However, for now we just leave them as-is.

Control Flow. Next, we split the program into unique paths according to the control flow operators in the program. We only have two control flow operators to worry about – if-statements and return-statements. Once we have split the program into paths, we compute the implicit control flow on the basis of possible values external operators and random variables may take on.

If the above code snippet were the entirety of its program, then we will have three paths, each starting with the guard, to be interpreted as an assert. This is similar to techniques in concolic execution, where the static analyzer builds up invariants along paths. Since we can reassign variables, we would need to have SSA implemented, or at least bild in a notion of shadowing variables, so that literals point to the correct version of a variables. That is, we could end up with a path that assigns some variable foo, then executes an assert using foo, and then reassigns foo. This pattern happens quite a bit when checking for null or default variables. Although the presence of an order-dependent assert may seem odd, we evaluate paths to compute the propensity score, so this linear evaluation is something we’re doing anyway.

When discussing the possibility of using Z3 with Bobby, he asked why we split into separate traces, rather than maintain the status of all possible worlds at each branch point. I told him I had thought that, but dealing with things on a per-path basis, we would avoid some duplication across paths that would lead to exponential blowup in the state space, but I will need to think carefully about whether that’s really true.

SMT=>SAT=>SMT. Now, back when I was parsing in the JSON representation of the PlanOut program, I also did some term rewriting for convenience’s sake. Remember how I said that we couldn’t deal with relations and other operators in the guards of if-statements? Well, for the interim, I rewrote all relations and unevaluated function calls as boolean values. Unevaluated function calls include random variables, indexing expressions, and external functions with potentially unknown domains. Once all of these expressions have been replaced by boolean literals, we can solve SAT the quick and dirty way, by generating all truth tables. We exclude all assignments that evalute to false, and return all assignments that evaluate to true.

The problem with this approach is that, in eliding the issues introduced by relations etc., we allow the static analyzer to return unsound assignments. For example, when we have disjunctions of injective relations (injective functions), we can generate unsound assignments. Consider the guard var1 || var2, where var1 <- userid == 1234 and var2 <- userid == 2345. Then when we generate all valid assignments for var1 || var2, we will produce the unsound assignment that sets both var1 and var2 to true. In order to know that we can never do this, we must look inside the atoms var1 and var2 to see that have a shared reference that puts a restriction on their downstream assignments. Although we could try to roll our own implementation to track constraints, we decided to give Z3 a shot instead.


So although in principle Z3 should be super easy to use, it’s a huge pain in the neck (as in, my neck is currently causing me pain from having been leaning over my computer for so long). Just kidding! Although the OCaml implementation can be in flux, Z3 abides by the SMTLib protocol. Arjun has written a nice library for communicating with Z3. I’ve adapted it to work with planalyzer.

Z3/Smtlib allow you to, in some sense, enter the entire program into the solver. If all I wanted was to know if a path through a program was unsatisfiable, or if it were satisfiable, what a witness looked like, I would just convert my entire program into a Z3 program of declarations and assertions, send it to Z3, and do whatever I need to do with the response. The problem is that I want something more than this. Specifically, I need to know what parts of a guard need to remain symbolic, and which can be instantiated. Consider the case of a guard having boolean literals and numeric relations. The boolean case will generalize to variables with bounded domain; the numbers represent sets having at least the cardinality of the natural numbers. I pinged Arjun about whether or not Z3 could tell me the cardinality of the satisfying, and he said the problem was squarely in #P and that Matt Fredrickson’s 2014 LICS paper had a supplement that could be used for this. He also warned that computationally, it might be a bit of a drag, so I decided to punt on this for now.

In any case, the current approach is to first test whether the assertion along a path is unsatisfiable. If it is unsatisfiable, then whatever values are bound in the current context are a contradicton and should not be considered when computing experiment variants. If it is satisfiable, then I ask Z3 for a model. I save this model temporarily, while I check to see if the model is unique. I do this by opening a context and pushing negated assertions of equality for the current bindings (i.e., if Z3 says 1234 is a satisfying assignment for userid, then I push an assertion that says userid can’t be 1234). I then ask again whether the conditions are satisfiable. If not, then I have a unique solution. If so, then it’s not unique. If it’s not unique, then I leave the term in its symbolic form. If it’s unique, I replace it in the current context.

Two other cool things I’ve learned from poking around Arjun’s code

  1. local opens: let open <module name> in
  2. includes – this may solve the name clash problem I tried to handle in mutually recursive modules