# Bounding Viral Impact in Experiments

Experimentation on social networks has a unique situation. Two connected members in different cohorts can influence what each other do. This social interference invalidates the accuracy of experimental results. For more details here is a nice talk explaining social interference by Johan Ugander and in technical detail see chapter 6. In short, the solution is to partition the network as much as possible and assign cohort treatments to the partitions. It’s a reasonable solution to a complex problem, but costly to implement in experimental systems. Even once implemented, cross experiment interference limits the number of experiments you can run simultaneously. Given these limitations, it’s handy to know when to invoke the Kraken solution. Fortunately, a simple 2 variable equation of the virality influence V and experimental impact can act as a guide. Virality is when one action triggers another, such as sending a message triggers the next person to send a message. Let p < 1 be the probability an action triggers another. Then the virality influence V is the expected total number of messages sent. In general for such actions as likes, comments, and shares V < 0.2. In short here are the variables:

$V \approx \frac{1}{1 - p}$

Cohort Actual Performance Observed Performance
A x z
B a · x c · z

By assuming all interference is across cohorts, we can bound the actual experimental impact a. The relation between observed to the actual performance is then:

$z = \frac{(x + aVx)}{N}$

$c \cdot z = \frac{(ax + Vx)}{N}$

With a small bit of algebra we find a beautiful bound:

$a= \frac{c - V}{1 - cV}$

So, why is it gorgeous? Let us consider the extrema.

• In the ideal experiment V = 0 and the bound is a = c. Nice! Our bound is tight.
• When all activity in cohort B is a byproduct of cohort Ac = V and the bound is a = 0. So in a dangerous system, a becomes scary.

In general the bound remains tight for low values of V and explodes with increasing social influence. In social networks, there are hundreds of features with V > 0. Fortunately, it’s easy to bound how much virality is impacting your results. If the impact is larger than a tolerable amount of error, then time to bring out the sledge hammer and split up your network. If it’s a tolerable amount of error, then march forward and conquer with your traditional a/b framework!  When in doubt. Gold:

$a= \frac{c - V}{1 - cV}$

. . . . Additional Examples with V = 0.04:

Cohort Actual Performance (T) Observed Performance (T) Actual Diff Bound
A x z
B a x 0.97 · z [-3.2%, -3%]
B’ a’ x 1.03 · z [+3%, +3.3%]