War Tournament

Wednesday, September 13th, 2017

If you'd like to follow along at home, I've published the code for this post on Github

Last Friday, FiveThirtyEight's Friday puzzle column, The Riddler, posted another exciting game theory competition. I'll let the author, Oliver Roeder, explain:

You and a random opponent are playing a simplified game of War. Both you and your opponent have 13 cards in your deck: 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king and ace. You can arrange these cards in any order you want. You’ll play a one-round game, where you go through all 13 of your cards just once. Both you and your opponent draw a single card off the top of your deck and compare them. If your card outranks your opponent’s, you get a point. (No points are awarded for ties.) After all 13 cards have been shown, the player with the most points wins.

I’ll match you up against every player who submits, and the player who wins the most games overall will be this week’s Express winner.

However, to enter into this tournament, you must first beat me, the house. I have a deck that is in this order: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A. I, being the house and all, get an additional advantage in that I also win ties. Plus I can choose to play my decks forward or backward — the order above or A, K, Q, J, 10, 9, 8, 7, 6, 5, 4, 3, 2. Your deck must be able to beat both of my decks to enter the tournament.

If this sounds familiar to my two previous posts on The Riddler's two previous Colonel Blotto tournaments, then you're onto something. The key lies in the "beat the house" requirement to entry. There are 13! or 6,227,020,800 permutations of the thirteen cards (what I'll refer to as a deck), only ~15% of which beat the house. Without the entrance requirement, the Nash equilibrium of this game is simply a random choice of those six billion options, i.e. no deck is better than any other. To visualize this, I've simulated a tournament of 40,000 random entrants, and calculated their win margin percentage against the field. This is defined a deck's winning percentage minus its losing percentage. Here, I've plotted the win margins as a function of their percentile:

As you can see, the spread of win margins is quite minimal, and can entirely be explained by chance. For example, the best deck in the field only won 1.7% more games than it lost, with a record of 16717 wins, 7247 ties, and 16036 losses. Another way to visualize this is by comparing the win margin of a random sample of decks against two different random fields. In the figure below, I've plotted the win margin of 1,000 random decks against two different random fields of 10,000 decks:

The R squared, or coefficient of determination, is practically zero, showing no relation between the scores. Thus, even the most successful decks against field one were completely average against the second one, i.e. the scores are due to random chance. However, when requiring that decks beat the house, these figures change noticeably. First, the percentiles of win margin percentage for a field of 40,000 random decks that beat the house:

The spread of win margins is much larger, with some entries dominating the field, and others being, well, utterly dominated. Next, we'll repeat the correlation plot, comparing the win margin of a random sample of decks that beat the house against two different similarly qualified random fields. Again. this is 1,000 decks scored against two fields of 10,000:

Notably, the error between the fields is similar, both in rms and mean absolute terms, since the same random variation occurs given the identical sample sizes. However, the R squared is now at 1 to three significant digits since the spread of win margins is so much larger.

So clearly, with the beat the house requirement, some decks are better than others, but which ones? In order to answer that question, I've plotted on the right the probability distribution of cards for each hand in the deck. The color of each bar represents the average win margin percentage of decks with that card in that hand.

The histogram for the first hand shows that the most likely card is a 3, with high cards being less likely than lower ones. Decks that start off with a mid-valued card are more successful, while a deck with a 2 or Ace at the top will on average lose 11-12% more games than it wins.

One of the first things to note from this chart is the symmetry of the hands in a deck. Beating the house means beating the cards in ascending and descending order, thus any valid deck's reverse order is also a valid deck, i.e. its reverse beats the cards in descending and ascending order. That leads to hands 1 and 13 having an identical distribution, along with 2 and 12, and so on.

Interestingly, it appears that a successful strategy involves playing mid-value cards at the beginning and end, with low and high cards in the middle. However, any one card placement only has so much impact on the win margin, with the spread of colors representing -15% to +10%, while the actual win margins vary all the way from -60% to +40%. Clearly, this level of analysis can only get us so far. Thus, its time to break out the optimization!

In my two previous posts on the Colonel Blotto tournament, I used a simple multi-start hill climbing algorithm to search the space for optimal troop distributions. Since the troop distributions were discretized, i.e. the number of troops defending each castle was a round number, each distribution had 90 nearest neighbors, found by taking a troop from one castle and moving him/her to another. The hill climber, given an initial start, would try out one nearest neighbor at a time (in random order) until it found a better one, at which point it would step to that distribution and start its search of nearest neighbors all over. At some point, after stepping one troop movement at a time, it would find a local maximum, at which each nearest neighbor was inferior. To combat the existence of many local but not global maxima, I would start the algorithm multiple times, i.e. multi-start, giving me a higher likelihood of searching the entire space.

It turns out a very similar framing can work wonders for this tournament. We can step through all possible decks by swapping one card with another, our new definition of nearest neighbor. In this case, a deck has 78 nearest neighbors to choose from. The only catch is that some of these nearest neighbor decks no longer beat the house, so before stepping to a new deck, we have to check its validity first.

Running this optimization algorithm with 1,000 random starts over a random field of 10,000, I found that the deck [5, 8, 7, 3, 4, K, Q, 2, A, J, 9, 10, 6] has a win margin of 52%. The distribution of win margins for locally optimal decks looks like the following:

Optimizing over the random field is interesting, but ultimately not the way to win this tournament. This is because the field against which our deck will be matched up will not be random. Instead, each competitor will, to varying degrees of obsessiveness, also be strategically planning their own deck. The question is, to what degree?

As I discussed in my second Colonel Blotto post, this tournament can be thought of as a Keynesian beauty contest:

... professional investment may be likened to those newspaper competitions in which the competitors have to pick out the six prettiest faces from a hundred photographs, the prize being awarded to the competitor whose choice most nearly corresponds to the average preferences of the competitors as a whole; so that each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one's judgment, are really the prettiest, nor even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practise the fourth, fifth and higher degrees.

In this analogy, first degree thinkers randomly choose the first deck they find that beats the house. Second degree thinkers optimize around beating the first degree thinkers, i.e. the deck we previously found. Third degree thinkers optimize around beating the second degree thinkers, and so on. An infinitely rational thinker corresponds with the Nash equilibrium.

I previously used publicly available data from a "Guess ⅔ of the Average" game to estimate the "level of rationality" of the public, but in the case of this game, I don't think that data applies very well. Its clear that the complexity of a game partially determines to what depth the players analyze it. While "Guess ⅔ of the Average" is simple, and only takes a multiplication of ⅔ to dive one level deeper, some games, like this one, require a deep analysis to do so. As well, the audience of the Riddler is decidedly more analytical than your average Jack & Jill. For these reasons, I arbitrarily chose a distribution of 57% first level, 29% second level, and 14% third level thinkers (each level has half of the previous one).

One final note: one important consideration to my analysis of the Colonel Blotto tournament was factoring in the "trolls," i.e. players intentionally playing sub-optimal strategies. Notably, someone found and submitted a troop distribution guaranteed to lose every time. Impressive! By my calculation, trolls made up about a quarter of the entrants.

However, even after channeling my inner troll, I was unable to come up with a strategy meant to throw people off and/or lose for this game. For that reason, I didn't take them into account.

The results of the tournament should post in this week's Riddler on Friday, so look out for an update soon thereafter!

UPDATE: FiveThirtyEight didn't end up publishing the data from this competition in the next week's post, and since I didn't place in the top 5, I'll never know how I did. Ah well! That said, it looks like they incorrectly reported the percentage of decks that beat the house, so that's a bummer.


Questions | Comments | Suggestions

If you have any feedback and want to continue the conversation, please get in touch; I'd be happy to hear from you! Feel free to use the form, or just email me directly at matt.e.fay@gmail.com.

Using Format