Solving Wordle with Greedy Search

Monday, February 21st, 2022

Riding the Wordle viral wave by solving it using greedy search algorithms and a variety of heuristics. A reference Python notebook can be found here, and the underlying code here.

A few weeks ago, I got caught up in the Wordle craze, but not in the usual way. Well, in addition to the usual way — one of my close friends texts me a few minutes after midnight each night with his latest result ❤️. Specifically, I immediately began thinking about ways to “solve” Wordle. Classic nerd sniping!

To start, I reproduced Wordle’s response for a given guess and answer word. This is not trivial, and in fact, I think it would make for a great Fizz Buzz style programming challenge. I then found a corpus of ~2,000 common 5-letter words, and created a dictionary of responses for every guess/answer combination. Turns out this wasn’t too bad, taking up <100 MBs of memory and only taking a few seconds to calculate. Yay!

I decided a reasonable way to “solve” Wordle would be to recursively choose the guess with the minimum expected number of possible answers, assuming each word in my corpus was equally likely to be the answer. How hard would this be to calculate? Looping through each guess and answer and aggregating on each response was proving to be quite slow. To speed things up, I realized an index of possible answers given a guess and response would help tremendously, and producing such an index would only take a few seconds. With that innovation, I’d still have an outer loop for each guess, but my inner loop calculating the expected number of possible answers was now much simpler, looping through the 3 ^ 5 = 243 possible responses and intersecting the answers consistent with that single response with the previous set of possible answers (determined from previous guesses).

This managed to “solve” my list of ~2,000 words in less than a minute, at which point I was curious: does this work? On a phone call with a friend, we tried it out for that day’s word, and sure enough, it got the solution! Was this lucky, i.e. that the word for that day was contained in my list? Some quick searching for previous day’s words led me to discover that Wordle’s source code contains two lists of words: 2,315 possible answers and a further set of 10,657 allowed guesses. Unfortunately, my existing code wasn’t memory efficient enough to handle searching through that many words, so I needed to give it another try with memory in mind.

There’s quite a variety of Python data types available that could theoretically encode a Wordle response — string, int, even a list or tuple of strings/ints, but unfortunately, all of them take a significant amount of memory beyond the minimum 8 bits (2 ** 8 = 256 > 243 = 3 ** 5). Luckily, NumPy’s uint8 had much less memory overhead, so I chose a NumPy array of that type. With a NumPy array, I was also able to remove my inner loop when evaluating the expected number of possible answers, instead relying on the return_count=True option of NumPy’s unique function.

From there, implementing hard mode (guesses must be consistent with previous responses) wasn’t too difficult, other than it expanded my array of responses to be 12kx12k. To speed things up, I tried out the Python concurrent futures module’s ProcessPoolExecutor, spreading the work across (in my case) 8 virtual CPUs. This was my first time doing multi-process work in Python without joblib, and I must say, it wasn’t so bad thanks to the improvements in Python 3. Generating the response array takes ~2 minutes, so not too bad.

So, I’m sure all the Wordle fans are wondering at this point, what’s the best first guess?! Without further ado, guesses with the least expected number of possible answers:

  • ROATE - 60.4
  • RAISE - 61.0
  • RAILE - 61.3
  • SOARE - 62.3
  • ARISE - 63.7

As you can see, Wordle is rather permissive when it comes to allowable guesses. Note: these are not “globally” optimal, but rather a result of greedy search, where our guess is chosen based solely on the current step. A more thorough analysis would be to play out each of these guesses until a solution is reached, and choose the first guess with lower number of average guesses (or some other metric you like).

Generating a solution map takes about a minute, no multiprocessing involved, although I’m sure it could be massively sped up with Numba since it’s basically just loops around NumPy operations. Finally, I wrote up a simple interface to make it convenient to watch the algorithm solve for a particular word, screenshotted to the right.

Inspired by Vincent Tjeng & Laurent Lessard, I decided to also code up three other metrics for my "greedy" tree search solver. The first is a version of the Minimax algorithm, choosing the guess with the smallest worst-case response. Using this algorithm, we get ARISE as the best, first guess. Below, the scores represent the number of possible answers for that worse-case response, with a 0.1 tiebreaker for a guess that could be the answer.

It turns out this worst-case response is the one qntm’s Absurdle chooses! Very fun to play around with. For what it’s worth, the minimax algorithm solves for VOUCH in five guesses:

The next metric from Vincent & Laurent: choosing the guess with the maximum number of potential responses. The guess TRACE has the highest of any one guess with 150 (same 0.1 tiebreaker), out of a maximum possible of 243.

Finally, the fanciest metric: Shannon entropy! Here, we choose the guess with the highest average "information" contained in a response. 3Brown1Blue has a couple of great videos implementing this approach. This metric results in a first guess of SOARE, which is... a young hawk?

Comparing the resulting "solution maps" for each metric shows that my original expected value (all solutions in <= 5 guesses) along with the maximum partitions (lowest mean # of guesses) are clearly better.

Surprisingly, I get very similar results when constraining guesses to the 2,315 "common" words, so those cheat words like SOARE and ROATE weren't doing us much good anyway!

Last but not least, hard mode really is quite a bit harder, with some similar words being indiscernible without a longer string of guesses, e.g. HOLLY, JOLLY, FOLLY, DOLLY, etc.

Using Format