Picking problems for programming interviews.

Published on April 19, 2020. interviewing (4), staff-plus (36), python (65)

Someone recently send me a note asking about whether their internal process for interviewing Staff engineers was a good one. They were particularly concerned that they were encountering a number of Staff-plus engineers who were struggling to write code for finding palindromes or reversing an array.

There is an oral tradition of programmers who simply cannot program, captured back in 2007 when Imran Ghory coined the idea of Fizzbuzz problems, along with the most canonical Fizzbuzz question:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

One particularly banal solution in Python3 might be:

for i in range(100):
    div_3 = i % 3 == 0
    div_5 = i % 5 == 0
    if div_3 and div_5:
        print("Fizzbuzz")
    elif div_5:
        print("Buzz")
    elif div_3:
        print("Fizz")
    else:
       print(str(i))

To be honest, I’ve never been willing to give a candidate a Fizzbuzz question. At best it feels dismissive, and at worst genuinely disrespectful to the candidate. I’ve also never interviewed a software engineer who simply couldn’t write this kind of problem, although in my career I certainly have interviewed candidates for adjacent roles like Systems Administration who couldn’t write this sort of program.

When interviewing Staff-plus engineers, I do believe it’s important to verify that they’re still capable of the thoughtful creation of software, while trying your best to not get caught up on signals that don’t matter much. I’ve very rarely worked at a company that was constrained by access to folks with deep computer science or mathematics skills. That’s not because these skills aren’t valuable, but rather because you’re likely to already have them in spades. On the few occasions where those skills were critically missing, what we needed was a world-class specialist in a particular area (distributed databases, compilers, etc), rather than something general.

If you’re hiring a world-class specialist, by all means you should evaluate them hard on their speciality with custom, specific and hard questions. If that speciality happens to be some facet of computer science, then drill them on that facet! However, for most senior folks, I’ve found that direct references to computer science aren’t particularly good evaluations, and instead the best bet is to find real-life problems that introduce some facets of the domain without centering on them.

When Digg was trying to get acquired some years ago, we did a bunch of phone screens, and one of them I remember was to write a function that would return the start and stop positions of each mention and hashtag, so that you could render them differently on a webpage versus a mobile app versus a text message.

For example, you might, have a tweet like:

Hey @lethain, did you know about a #typo on page 24 of your book?

And then the output would be something like:

[
  {kind: "mention", name: "lethain", start: 5, end: 7},
  ...
]

I like this question a lot, because it’s a real, honest problem that you would encounter in real-life. You have to understand the problem, understand the purpose and then solve it. There are also a number of interesting edgecases to think about, such as excluding punctuation from usernames and so on.

At the time I didn’t have much experience writing a tokenizer, so I really don’t remember what I did, but today I might try to solve that writing something along these lines:

txt = "Hey @lethain, did you know about a #typo on page 24 of your book?"

WORD = 'word'
MENTION = 'mention'
HASH = 'hash'

HASH_START = '#'
MENTION_START = '@'
WHITESPACE_START = ' '
PUNCTUATION = '!?,;:'


def make_token(kind, txt, start, end):
    return (kind, txt, start, end)


def tokenize(txt):
    tokens = []
    acc = ""
    start = 0
    kind = WORD
    for i, ch in enumerate(txt):
        if ch in WHITESPACE_START or ch in PUNCTUATION:
            if acc:
                token = make_token(kind, acc, start, i)
                tokens.append(token)
            acc = ""
            start = i
            kind = WORD
        elif acc == "" and ch == HASH_START:
            kind = HASH
        elif acc == "" and ch == MENTION_START:
            kind = MENTION
        else:
            acc += ch

    if len(acc) > 0:
            token = make_token(kind, acc, start, len(txt))
                    tokens.append(token)

    return tokens

print(tokenize(txt))

There’s a lot to like about this problem: it’s real, the problem is easy to explain, it requires almost no scaffolding or boilerplate to get started, and it doesn’t require knowing many libraries or built-in methods.

On the other hand, there are a few things that make this a difficult question to give candidates. First, tokenizing text is a fairly specific task, and if you’ve done it a handful of times in your career then you’re going to be much better off than someone who hasn’t. When I first got the question I’d never tokenized text before, and since then I got a bit more experience writing the systems library. Good interviews don’t test for specific knowledge, but instead problem solving. Only the rarest engineer I interview would actually end up writing a tokenizer in their work. Even if they did need some sophisticated tokenization, I’d expect them to probably use a tool like ANTLR instead of writing their own.

The other concern I have with this specific problem is that it is the sort of problem that can be fairly easy while all the pieces are fitting into your head, but once you make one mental slip then all the pieces come crumbling down around you. I confused myself a bit writing this code alone in a room, and could easily imagine getting stuck during a timed interview with someone watching you.

At Calm, we’ve historically relied on having a single staging environment where software was tested against a complete environment, but recently the team rolled out multiple staging environments. This is a major improvement, and opened up an interesting question: how should we assign staging environments to people?

The good enough solution is to assign one staging environment to each team, but the theoretically ideal solution might be having a queue of folks who want to use an environment, and then assign them across the pool of environments as they become available.

I was thinking that might be an interesting interview problem. It’s a fairly real problem, doesn’t require much backstory to answer, and it avoids the mental load or context-specific experience of something like tokenizing text.

POOL = ['a', 'b', 'c', 'd']
ASSIGNMENTS = {key: None for key in POOL}
WAITING = []

def status():
    txt = "Status\n"
    for key in POOL:
        val = ASSIGNMENTS[key]
        val_txt = val if val else ""
        txt += "\t" + key + ": " + val_txt + "\n"
    txt += "\nwaiting: %s" % (WAITING,)
    print(txt)


def queue(name):
    for key, val in ASSIGNMENTS.items():
        if val is None:
            print("UPDATE")
            ASSIGNMENTS[key] = name
            return
    WAITING.append(name)
    

def pop(name):
    for key, val in ASSIGNMENTS.items():
        if val == name:
            first = None
            if len(WAITING) > 0:
                first = WAITING.pop(0)
            ASSIGNMENTS[key] = first
            return


ops = [
    (queue, "will"),
    (pop, "will"),
    (queue, "will"),
    (queue, "jill"),
    (queue, "bill"),
    (queue, "phil"),
    (queue, "chill"),
    (queue, "quill"),
    (queue, "fill"),
    (pop, "chill"),
    (pop, "bill"),
    (pop, "chill"),
]

status()
for cmd, val in ops:
    cmd(val)
    status()

This is a problem where it’s easy to write something that works, and watch someone debug the issues they run into along the way. Then once someone writes an intial solution, you can imagine a bunch of ways to continue adding complexity to it that require them to refactor their code. For example, you could add a maximum allowed time to use an environment, after which you’d automatically get removed, and so on.

Requirements for a good problem

Putting all of this together, I think what makes good problems to evaluate programming experience of senior and Staff-plus candidates are:

Support simple initial solutions and compounding requirements
Are solvable with a few dozen lines of code
Require algorithmic thinking but not knowing a specific algorithm
Don’t require juggling numerous variables in your head
Does support debugging and evolving the code
Are not domain specific to advantage arbitrary experience
Require writing actual code in language of their choice
Are done in a real code editor

I’m sure there are other criteria that are important, but generally I think those are a good starting point.

As a final warning against mathematical problems, I still fondly remember when I in my younger days thought it would make sense to give one of the easier but not too easy Project Euler problems to a candidate with a math background, which backfired when they immediately produced the closed-form equation rather than writing code to do it.

Just as some folks will find the problems you select to be artificially hard if you pick the wrong sorts, you’ll also find other folks who’ll find them artificially easy. Picking general problems that start small and evolve into more complex problems throughout the interview allows you to mostly solve for both.