# Bayesian modeling of StarCraft II ladder performance

I've been in a bit of a pickle recently. I really need to figure out Bayesian inference as practice for my masters' thesis. I've been wondering, *what kind of cool project - hopefully with my own data - could I make?*, I thought as I fired up StarCraft II in the evening, as I usually do to unwind nowadays. *What kind of fun use of PyMC3, the neat Python library for probabilistic programming, could I showcase?*, I wondered as I watched my ladder ratings fall from the distraction. *What kind of useful knowledge could I try to acquire using it?*, I thought, watching my game performance fluctuate over the course of months.

And then it hit me.

In this post, I'm going to use PyMC3 to analyse my 2019 StarCraft II ladder games. In particular, I'm going to look at the relation between MMR - MatchMaking Rating, a metric of ladder performance - mine and that of my enemies - and what it can tell us about my win chances.

### A few facts about StarCraft¶

- SC2 is a fast-paced real-time action/strategy game played competitively all over the world;
- Each game of SC2 lasts about 10-30 minutes;
- There are three
*races*in SC2, which means factions, each with a completely different playstyle and toolset; I play Protoss (the advanced space aliens) and the other two~~suck~~are Terran (scrappy future humans) and Zerg (hivemind insectoid aliens).- There is no tribal animosity between players of the races in the community whatsoever.

- Each 1v1 game pits two randomly selected players of similar MMR. Better players have higher MMR, and it's used to find
**worthy**(adequate)**opponents**for you. A Protoss player has three different possible matchups - Protoss vs Protoss (PvP), PvT and PvZ.

### A few facts about Bayesian inference¶

- it's an alternate, computationally intensive approach to statistics (of which you probably know frequentist statistics)
- it's a really neat tool to formulate complex models of processes occuring between entities
- it can let you infer knowledge about quantities not directly included in your data at all (so-called "latent" quantities)
- it can combine data with your initial assumptions ("priors") or initial beliefs about quantities in your system
- this means you need to explicitly state what you believe about the data first

- it returns probability distributions for your parameters, rather than simple point estimates like means or variances
- this makes handling asymmetric uncertainties and error bars much easier

- it's a great framework for learning new knowledge from data, as I'll try to show

In this post, we're going to use my own dataset of ladder replays. The motivation is this: there are days when I play terribly and there are days when I play my heart out. It does, however, feel like my performance fluctuates a lot. I thought I could use Bayesian modelling to learn something about these fluctuations.

I was going to have an example of pulling this data using ZephyrBlu's replay parser library and an unreleased custom wrapper I have for that. However, since I'm getting a ton of warnings that would distract from the main ideas, in the name of simplicity I'll just post the parsed version on GitHub. I'll come back to parsing those once I finish that library of mine.

```
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/StanczakDominik/stanczakdominik.github.io/src/files/replays.csv", index_col=0)
df['time_played_at'] = pd.to_datetime(df.time_played_at)
df
```

## MMR vs Elo¶

We're going to need to figure out a way to connect MMR to winning probabilities. Each game won awards a certain amount of MMR which depends on the difference between the two players' pre-match MMR. If you win against a player you weren't expected to beat, the system awards you more MMR - and the other player loses the exact large amount.

Interestingly, according to some wonderful sleuthing by u/MisterL2 on r/starcraft, MMR is basically the classic chess Elo rating, except for a simple scaling - 100 Elo is basically 220 MMR. With the Elo estimated win percentage formula depending on the difference in Elo ($\Delta_{Elo}$) as

$$ P(\text{win}) = \left(1 + 10^{-\Delta_{Elo}/400}\right)^{-1} $$We can adjust to MMR as

$$ P(\text{win}) = \left(1 + 10^{-\Delta_{MMR}/880}\right)^{-1} $$As a quick check, let's compare with the sample data provided:

```
def MMR_winrate(diff):
return 1 / (1 + 10**(-diff/880))
diffs = np.arange(0, 600, 100)
diffs = np.concatenate([-diffs[:0:-1], diffs])
winrates = np.array([0.5, 0.564, 0.626, 0.686, 0.742, 0.793])
winrates = np.concatenate([1-winrates[:0:-1], winrates])
plt.scatter(diffs, winrates, label = "Data from u/MisterL2")
diffs_plot = np.linspace(diffs.min(), diffs.max())
approx = MMR_winrate(diffs_plot)
plt.plot(diffs_plot, approx, label = "Our approximate formula")
plt.legend()
plt.setp(plt.gca(), xlabel=r"$\Delta_{MMR}$", ylabel = "Win %", ylim=(0,1));
```

Splendid.

## Digging into the data¶

We'll need the MMR differences, and we'll use those to calculate the expected winrates:

```
df = df.sort_values('time_played_at')
df['enemy_mmr'] = df['mmr'] - df['mmr_diff']
df['expected_winrate'] = MMR_winrate(df.mmr_diff)
df
```

```
import altair
altair.Chart(df).mark_circle().encode(
altair.X('time_played_at'),
altair.Y('mmr',
scale=altair.Scale(zero=False)),
color='enemy_race',
).interactive()
```

There are a couple of clear outliers:

- there are some games at a MMR of -36400; I may be bad at StarCraft, but not that bad. I'm not willing to trust our replay parser about them, so I'll just throw these out.
- the games at a MMR of 0 are mostly custom or vs AI games. Those are recorded in replays as well. We'll dump them too.
- there are few games where I offraced as Zerg, and the game tracks a separate MMR for each race you play as. We'll skip those as well and look only at the Protoss perspective. This

```
data = df[(df.mmr > 0) & (df.enemy_mmr > 0) & (df.race == "Protoss")]
data
```

```
altair.Chart(data).mark_circle().encode(
altair.X('time_played_at'),
altair.Y('mmr',
scale=altair.Scale(zero=False)),
color='enemy_race',
).interactive()
```

I'd say it's a steady climb upwards, but sometimes it feels nothing like that.

I'm also going to limit my current analysis to 2019 replays (I did have more before October, but I think I lost them somewhere):

```
data2019 = data[(data['time_played_at'] > '2019-01-01') & (data['time_played_at'] < '2020-01-01')]
data2019
```

```
altair.Chart(data2019).mark_circle().encode(
altair.X('time_played_at'),
altair.Y('mmr',
scale=altair.Scale(zero=False)),
color='enemy_race',
).interactive()
```

We can now take a look at whether our `expected_winrate`

metric makes sense:

```
altair.Chart(data2019).mark_circle().encode(
x=altair.X('enemy_mmr', scale=altair.Scale(zero=False)),
y=altair.Y('expected_winrate'),
color='win',
).interactive()
```

```
altair.Chart(data2019).mark_circle().encode(
x=altair.X('enemy_mmr', scale=altair.Scale(zero=False)),
y=altair.Y('expected_winrate'),
color='win',
facet='enemy_race',
).interactive()
```

There are a few things we can already say from this:

- Enemy Protosses at 4k MMR were already a large challenge, as I won no games whatsoever against those. This is, in fact, what inspired me to write this post!
- PvT was my best matchup - there are the most wins there.
- PvZ games were mostly even, though there are a good amount of lost games that I should probably have won! This seems to point to MMR not being enough to estimate my winrate properly - for example, strategical variation in Zerg play. I'm still sort of confused on what to do with a very defensive, lategame-oriented Zerg player.

Let's see if our estimates were right:

```
data2019.groupby('enemy_race').win.mean()
```

Ouch. That PvP still hurts. Well, enough sulking! We'll get back to this point later - promise - but for nowo, let's get right to

## The Bayesian model¶

The idea is going to be simple. I'm going to assume that:

- my true MMR is some random number about 4k, oscillating at most 300 up and down from that number: $\mu \sim \text{Normal}(4000, 300)$.
- in each game, my effective MMR is a random normal variable $\text{MMR}^n \sim \text{Normal}(\mu, \sigma)$ (where by the superscript I denote the fact that we're sampling n of these, one per game) that includes a ton of effects, some of which can be:
- the size of my breakfast of that day
- time of day
- enemy race
- whether I'd exercised before (I should really start tracking that, come to think of it...)
- the map the game is played on

- I have no idea how much my effective per-game MMR varies, so I'm just going to say the variance of MMR is going to be some positive number, and I'll assume a half-normal distribution with a scale of 100: $\sigma \sim \text{HalfNormal}(100)$. Let's take a look at that distribution:

```
from scipy.stats import halfnorm
x = np.linspace(0, 500)
PDF = halfnorm(scale=100).pdf(x)
plt.plot(x, PDF);
```

which is another way of saying "some positive number, small-ish, probably not larger than 200".

Okay, but that's the MMR. We need some way of connecting that to our won and lost games! Luckily, we have the `MMR_winrate`

formula: we know that the ladder system essentially models
each game as a biased coin *(Pro)* toss (that joke is much funnier when you're the Protoss on the ladder), with the bias being equal to the estimated winrate. The probability of a coin toss giving heads is usually modelled with the Bernoulli distribution.

Let's now go ahead and use PyMC3 to encode that into a Bayesian model:

```
import pymc3 as pm
import arviz as az
with pm.Model() as bernoulli_model:
mmr_μ = pm.Normal('μ', 4000, 300)
mmr_σ = pm.HalfNormal('σ', 100)
mmr = pm.Normal('MMR', mmr_μ, mmr_σ, shape=data2019.enemy_mmr.size)
```

Note the `shape`

part - we sample one effective MMR for each game in the dataset. Now, we deal with calculating the expected winrates for our games:

```
with bernoulli_model:
diffs = pm.Deterministic('MMR_diff', mmr - data2019.enemy_mmr)
p = pm.Deterministic('winrate', MMR_winrate(diffs))
```

`Deterministic`

variables are stuff that depends in a predictable way on your random variables, once you know their values. We could skip that, but I label them as `Deterministic`

to easily track them later.

We can now add the actual data using the `observed`

keyword:

```
with bernoulli_model:
wl = pm.Bernoulli('win', p=p, observed=data2019.win)
```

And now we press the magic inference `pymc3.sample`

button! I'll go into details on it another time (when I understand it better myself!). In a nutshell, though, `sample`

is going to launch a few "random walks" (Hamiltonian simulations of particle motion, technically! Those are the details I wasn't going to explore today...) in parameter space. Ideally, these can explore a good amount of parameter value sets and find ones that fit the data well.

I'll also wrap it in the neat `arviz`

Bayesian visualization library.

If you're running this live, this is a good time to put the kettle on:

```
with bernoulli_model:
bern_data = az.from_pymc3(trace=pm.sample(2000, tune=2000, chains=4))
```

Aaaaand everything crashed. We can take a look at these results, but they won't be pretty:

```
var_names = ['μ', 'σ']
az.plot_trace(bern_data, var_names=var_names);
```

Each colored line represents each of the trajectories, with histograms of their values on the left, and actual (pseudo)time trajectories during the sampling on the right. As we can see, the chains don't exactly agree with each other. Divergences (vertical lines at the bottom... yeah) mean that our random walkers, or probability space particles, flew off to infinity, forgetting all about energy conservation and that kind of jazz. A model with divergences is... basically, bad. They didn't manage to sample parameters well, getting stuck in multiple regions of parameter space, with slow variations and definitely not independent steps. This completely messes up the histograms. If the chains disagree as badly as they do here, well, that's not a model to be trusted.

We can also look at where in particular in parameter space the model started having trouble:

```
az.plot_pair(bern_data, var_names=var_names, divergences=True);
```

Well, there's clearly some issues at the bottom, at low values of $\sigma$. It is, however, kind of unhelpful.

I don't quite understand the next part myself just yet, but as I (probably not particularly well) think of it: what people usually do in this case is decouple sampling the **amplitude** or **variance** $\sigma$ of our estimated MMR from sampling the **direction** and **value** of each particular iteration, and from the mean itself. We treat the mean as we did before:

We'll also take the variance as usual:

$$\sigma \sim \text{HalfNormal}(100)$$What we'll also do, however, is grab a set of normal variables:

$$\sigma_\text{norm}^n \sim \text{Normal}(0, 1)$$And we'll calculate the effective MMR per game as:

$$\text{MMR}^n = \mu + \sigma * \sigma_\text{norm}^n$$This is called a noncentered formulation, and is a big deal apparently. If I'm correct, it makes exploring variations in the amplitude easier when the possibly small variations from the separate normal helper random variables are disconnected from it, and from the mean. More about this as I know more about this.

Without further ado, the corrected model:

```
with pm.Model() as bernoulli_model_noncentered:
mmr_μ = pm.Normal('μ', 4000, 300)
mmr_σ = pm.HalfNormal('σ', 100)
mmr_σ_norm = pm.Normal('σ_norm', 0, 1, shape=data2019.enemy_mmr.size)
mmr = pm.Deterministic('MMR', mmr_μ + mmr_σ * mmr_σ_norm)
diffs = pm.Deterministic('MMR_diff', mmr - data2019.enemy_mmr)
p = pm.Deterministic('winrate', MMR_winrate(diffs))
wl = pm.Bernoulli('win', p=p, observed=data2019.win)
trace = pm.sample(2000, tune=2000, chains=4)
bern_noncentered_data = az.from_pymc3(
trace=trace,
prior=pm.sample_prior_predictive(),
posterior_predictive=pm.sample_posterior_predictive(trace),
)
```

That seems to have worked nicely!

```
var_names = ['μ', 'σ']
az.summary(bern_noncentered_data, var_names=var_names)
```

So my true MMR in 2019 was somewhere around 3973, with a standard deviation of ~65. A more Bayesian approach is saying that we can say with 97% (I'm not completely sure it's not 94%, because of symmetry...) certainty that the true MMR is between 3850 and 4094. That's the Highest Posterior Density - hpd - metric.

Note how this standard deviation is an okay metric for the non-gaussian $\mu$, but completely fails to make any sense for $\sigma$, where I'm just going to say I'm 97% sure it's not larger than 188, with a mean of 81.

Also note that HPD means what it means - we're 97% (or 94%, because I'm not sure about that detail!) sure that the variable is in these bounds, according to this model. The comparable metric in frequentist statistics - the $\chi^2$ test - says... something that nobody really understands. I can find the papers on that for you if you'd like, but basically... people think it says what HPD says, and it doesn't, really.

```
az.plot_trace(bern_noncentered_data, var_names=var_names);
```