
A Brutal Look at Balanced Parentheses, Computing Machines, and Pushdown Automata - braythwayt
http://raganwald.com/2019/02/14/i-love-programming-and-programmers.html
======
kjeetgill
This is amazingly well written. You covered a good deal of any introduction to
computational theory course in a straightforward, well motivated matter. This
is definitely something I'll be passing around to some of my bootcamp friends
who are curious taste some of the theory they don't get exposure to.

Ironically, what I thought what made this so effective and interesting was
that it dug into the theory of a _really really easy_ programming problem.
This is a reasonable homework assignment for any first year student. Maybe not
month one, but year one for sure.

> That suggests the question: Is balanced parentheses a good interview
> question?

If you just needed a solution to solve the problem and maybe check a few edge
cases; yeah, it's great because it's simple. If you're using it to broach a
conversation on automata, regular expressions, and context free languages well
yeah delving into that is bad unless you're writing real parsers/compilers at
this gig.

I was really put off my the musing over whether it's a good interview
question. For such a beautifully written piece it seems to say: if you don't
expect candidates to know as much as this don't ask it at all.

If you want to make an apple pie from scratch, you must first create the
universe.

...

But if I ask you to make an apple pie I'm not going to ding you for starting
with apples, flour, sugar, and butter.

~~~
braythwayt
I took your feedback seriously, tell me what you think:

[https://github.com/raganwald/raganwald.github.com/commit/bec...](https://github.com/raganwald/raganwald.github.com/commit/bec3a460c0fcd8afa78502c8af79b3e19d1c6e6f)

~~~
kjeetgill
I like it! I think it cuts closer to what _I_ think people generally should
inow at the very least.

> An obvious question is, _Do you need to know the difference between a
> regular language and a context-free language if all you want to do is write
> some code that recognizes balanced parentheses?_

You have a real talent for motivating the conversation around theory. I like
the subject matter so I rarely need more than: hey, I think this is cool. But
when I share it with others it says all the things I can't!

------
kerkeslager
This is a very well-written essay which explains its computing concepts
effectively and thoroughly. I upvoted.

However, the bit at the end with the balanced parentheses matchers being a bad
interview question seems to be missing the forest for the trees. A simple
solution to this problem which has nothing to do with any of the computational
theory discussed:

    
    
       def is_balanced_parens(s):
           nesting_depth = 0
           contains_nesting = False
    
           for c in s:
               if c == '(':
                   nesting_depth += 1
                   contains_nesting = True
               elif c == ')':
                   nesting_depth -= 1
               else:
                   return False
    
               if nesting_depth < 0:
                   return False
    
           return contains_nesting and nesting_depth == 0
    

This isn't that hard of a problem, and if an interview candidate couldn't come
up with a basic solution like this, I'd very much like to know that.

~~~
braythwayt
As I mentioned elsewhere, I do not find it a hard problem, and neither do most
programmers that I know.

BUT.

Most of the people in my "tribe" had some formal education in computer
science.

Our experience when hiring people is that there are programmers with a lot of
raw talent and potential who will need a full slot (about 45 minutes of actual
problem-solving time) to come up with something like the the code you suggest.

There will be no time to discuss their solution, ask them how to handle
multiple types of parentheses, &c.

What most of those "They seem really smart, but struggled to balance
parentheses" candidates seem to have in common is coming to programming
without the formal education, e.g. through a "hacker school."

Whereas, the folks coming to us directly out of a university with a
traditional CS curriculum grasp the basic idea right away, and we usually get
past the first question and on to a follow-up in the same 45 minutes.

We already know from their resumé that some of the folks are coming from
Waterloo or wherever, so it adds no useful information to us for them to do a
lot better than the people with less formal backgrounds. For this reason, we
now only ask this question of people doing work terms from universities with a
formal CS curriculum.

When people are coming to us via another path, we now give them another
question entirely, and we will be retiring this one outright in the near
future.

~~~
user5994461
Matching parenthesis is a puzzle question. Either the candidate knows to use a
stack, or he's gonna spend one hour to figure it out.

~~~
deathanatos
I disagree that it's a puzzle.

A reasonable person could look at it, and go, okay, so a parenthesized string
is made up of some sequence of

    
    
      ( <more balanced parens> )
    

possibly repeated, to get things like "()()()". i.e., do that check in a loop.

That is, for whatever position you're at, check that the first character is
"(". Check that at position+1 there is a balanced parentheses string. Check
that at position+1+len(the balanced paren string) there is a ")".

Consume as many of those as you can in your loop; return the consumed data as
the "balanced string". Now, this function has to return the string (since our
use of it above needs the length to know where the inner string ends). It
stops after it can't consume anything further. The whole string is balanced if
we consumed everything. _That 's it._

The stack is implicit here. Yeah, you have to know recursion, but that's the
thing about interview questions: at some point you have to know something.

I asked elsewhere, and it hasn't been answered in this thread: if this is a
bad question, what would make it a _good_ one? I would love to see what the
author's idea of a good question is.

For the real-world applicability of this: I deal primarily w/ HTTP APIs.
Sometimes, that requires me to make use of certain HTTP headers, and a great
deal of web "frameworks" out there do not parse HTTP headers; they just give
you back the raw key-value map as strings or somesuch. _Several_ of HTTP's
headers are context-free. (They require at least a stack to parse.)

~~~
user5994461
It's a bad interview question IMO because it's a puzzle question. Either you
know the magic trick to use a stack and you'll be done in a few minutes, or
you could spend an hour to come up with a solution struggling to cover the
basic edge cases.

As the interviewer, all you can do is watch the developer suffers for a while,
without being able to give help or shift to a different problem. That doesn't
give you any indication of the ability to code or solve problems.

~~~
deathanatos
> _It 's a bad interview question IMO because it's a puzzle question._

You asserted this in your first message. The point of my reply is to show that
it's not a "puzzle", and that a reasonable line of thinking and reasoning can
guide you through the problem to an answer. The naïve answer one might get
here is fairly "undisciplined", in that there's no rigor in language theory,
no deep discussion, but the question is still answerable.

> _the magic trick to use a stack_

My suggested line of reasoning does not use a stack, beyond function calling.
If using the actual stack is a "magic trick", again, I think we have removed
so much material from consideration as to not be able to ask anything of use.

Again, if we simply dismiss any line of reasoning that a candidate might not
see as a "puzzle", then every question is a puzzle, and nothing can be asked
in an interview. Simply repeating your thesis that it's a puzzle does not
advance this discussion: do you not think the line of reasoning I laid out is
reasonably obtainable in an interview? And if not (since I think it's a fairly
minimal one), then what types/sorts of questions should an interview ask,
since it would seem like I'm left with only things that are dead obvious (b/c
if it wasn't, it's now a magic trick puzzle), and that leaves me with nothing
to screen candidates with.

As an interviewer, I want to see you struggle with something you might not
fully understand, and I want to know your reasoning and thinking as you go
along with it. The real job is not going to come with a manual, after all.

~~~
user5994461
Incrementing on parenthesis and decrementing on closing parenthesis is
trivial. You can surely agree that it's more of an ahah moment to get there,
than a long reasoning that may or may not have lead to it.

What do you do if the candidate cannot come up with the solution in 10
minutes? I've been there and find it embarrassing. It doesn't indicate
aptitude or lack of aptitude.

I find that system design, performance optimization or sometimes code review
have the great advantage of being progressive. The candidate can go in
multiple directions. The interviewer has the opportunity to lead to find
strengths and match with experiences on the resume, while it's also possible
to assist or to pull away from any particular aspect.

For coding questions, either stay on simple problems with straightforward
solutions, could be as stupid as printing numbers from one to ten, to test the
ability to actually write code. Or a progressive exercise that must have both
trivial solutions and optimized solutions and then can be integrated into a
bigger function.

------
skybrian
I skimmed, but feel like he should have talked about how in an interview
question, you could solve this by counting parens with a 64-bit counter. There
are limits on practical input size and on stack size, so arguments about
infinite languages don't directly apply.

A deeper question to consider: why do we study infinite languages and then
apply the results to finite problems? What do we learn this way?

~~~
Epholys
Yeah, that's the first solution that popped in my mind too. I think it needs
just a special case at the beginning to check if the first paren is not a
closing one.

EDIT: yep, after reading the comments below I see that I really should have
put just a little more thought before dismissing this as too trivial to think
about.

~~~
braythwayt
The stack solution checks that the stack is non-empty before popping.

The counter solution simply needs to check that the counter is non-zero before
decrementing.

All cases then work perfectly.

------
pflats
_> Asking someone who hasn’t been recently exposed to computing theory to
write a balanced parentheses recognizer is asking them to reinvent the basic
research of people like Kleene and von Dyck, extemporaneously. And then write
code under the interviewer’s watching eye. That is unlikely to show these
candidates in their best light, and the results become very uneven.

> Outside of a special-case like certain CS students, this question is likely
> to give very inconsistent results, and those results are going to be
> dominated by a candidate’s recent familiarity with the underlying problem,
> rather than their coding skills.

> In most cases, that makes for a bad interview question._

This only makes it a bad interview question if the goal of the question is to
only hire the candidates that can code the solution.

Otherwise, it can be a very good interview question. How do they go about
"reinvent[ing] the basic research of people like Kleene and von Dyck,
extemporaneously"? What questions do they ask? What ideas do they come up
with? What resources do they say they would consult? Under pressure, how do
they behave? If they make a mistake, are they coachable? Do they lash out or
double-down on mistakes? If they mention using a language's regex, do they
understand the performance tradeoffs of various different features?

There is a lot to see by asking the question beyond just "whether or not
candidate X can code a solution".

~~~
pflats
Beyond that, I have taught theory of computation in the past, and I think the
rest of the article is a solid writeup. The DPA section might need another
pass; that felt the most uneven. Things that jumped out at me:

\- If the audience if programmers unfamiliar with automata theory, I am not
sure how many would know what a Turing machine, von Neumann machine, or even a
program counter is.

\- Clarifying what is meant by "internal" and "external" state.

~~~
tropo
There is no program counter. Computers don't normally need to count programs.
("let's see, I have calc.exe, that's one, and notepad.exe makes two...")

There is an instruction pointer. Note that it doesn't count. It frequently is
incremented, but it can jump forward and even backward. It points at an
instruction.

~~~
spc476
There's a lot of literature that states otherwise. Of the assembly languages
I've studied (and within reach of me I have references to the Motorola 6809,
680x0, VAX, MIPS, Z80, 6502, and x86) it's only the x86 line that used the
name `IP` (Instruction Pointer [1])---the rest all use `PC` (Program Counter).
And (to be even _more_ pedantic) that register (`IP` or `PC`) always points to
the _next_ instruction to be executed, never the current instruction being
executed. And just to note, on the VAX, the `PC` register is also `R15`, so it
_can_ participate in any instruction as either a source or destination;
whether that's a good idea is another matter.

[1] Technically, until you get to the 64-bit version, it's `CS:IP` (Code
Segment, Instruction Pointer).

~~~
tropo
There's a lot of literature that repeats a harmful misnomer.

The register is always an instruction pointer, even if the hardware
incorrectly calls it a program counter.

The processor that matters most, by far, is x86. At one point PowerPC wasn't
too far behind, and PowerPC still dominates in high-end network gear. Both of
these processors are correctly documented. For x86 the name is ip, eip, or
rip. For PowerPC the name is usually nip, meaning "next instruction pointer".
Sometimes the PowerPC documentation will use "current" instead of "next", or
"address" instead of "pointer". All of these are correct.

~~~
spc476
And the processor that matters the most after the x86 is the ARM, which uses
`PC`. I suppose all ARM documentation is incorrect then.

Also, in electrical engineering, the direction of electricity is taught
backwards from what it really does. In all sources I've read, electricity
flows from positive to ground, whereas in reality, it's the opposite. Good
luck in changing that.

------
agumonkey
after that you're ready to read about
[https://en.wikipedia.org/wiki/Catalan_number](https://en.wikipedia.org/wiki/Catalan_number)

------
karthickgururaj
Sorry - I'm not able to understand the post. I fully understand the problem -
I ask this question sometimes during interviews - with _two_ bracket
characters. Say, () and []. So a string like "([)]" is not correctly balanced.

Some folks try with counters - but that very quickly gets out of hand. I try
to prod them to a solution - by giving patterns that will defeat their
solutions.

What I'm expecting is that the person can identify that use of stack here,
which (and this is my source of confusion here) - is the most natural way to
solve this problem. Push when we encounter an opening bracket, pop and match
when we encounter a closing bracket. The function is generic, within 10 lines
in C, can easily be extended to more bracket characters.

So - what is the point of this post?

~~~
poopchute
Its a quick intro to computability theory. There is a progression of more and
more complicated 'machines' on the way from Finite Automata towards Turing
Machines. This article goes over what types of things each machine can and
can't solve. Its more of a theoretical topic than a practical one (and why
you'll occasionally see silly things like someone implementing a turing
machine in powerpoint to show that powerpoint could compute anything)

------
deathanatos
> _Outside of a special-case like certain CS students, this question is likely
> to give very inconsistent results, and those results are going to be
> dominated by a candidate’s recent familiarity with the underlying problem,
> rather than their coding skills._

> _In most cases, that makes for a bad interview question._

Well, I'm going to have to disagree there. Asking the question and expecting
full knowledge of the theoretics behind it, yes. But real-world "practical"
software engineering often requires writing parsers, and yes, parsers for
context-free languages (whether the parser-writer realizes that or not). A
simple, hand-rolled recursive descent parser would suffice for this language.

> _But what if someone hasn’t seen this stuff in a decade? Or learned to
> program without going through a formal CS education? Is it reasonable to
> expect them to work out the ideas in an interview?_

Someone, in a decade of SWE experience, has never needed to implement a
parser?

Parsing, as a category of things to talk about, has stuff that someone with a
theoretical background should know, but it is also so imminently practical
that someone without the degree should have hit it and be able to answer it.
I'm not asking that they realize this is a deterministic context free
language, or that a PDA is the appropriate machine to parse such a language.

The point of balanced-parens as an interview question is not to get you to
recreate the entire history and hierarchy of languages. It's to get you to
write a for loop that requires a moderately involved data structure (a stack)
to see if you have any ability to reason and to code.

(I also dislike asking the single-variant form of this question, as you can
solve it w/ a simple counter. If you involve a second type of matched
character, e.g., {}s, then it should require a PDA.)

Are other engineering disciplines so readily accepting of folks w/o training
in the field in which they're applying a job to?

If not this, then what? "They might not have run into this!" is why we
(should…) ask more than one question in an interview. But I feel like this
refrain gets used to justify never asking _any_ question which might reveal
that half the candidates that apply for "Senior Software Engineer" positions
cannot express proficiency in _any_ programming language…

~~~
braythwayt
Where I work, we track statistics on our interview questions. We are retiring
this one because we believe that the results are strongly skewed by whether
the candidate has had exposure to the theory behind it.

We only use this for people at the beginning of their career, mind you. We
have completely different expectations of someone with ten years of experience
than we do of someone applying for a CAP (aka "work-term") or SDE I (aka
"entry-level") position.

It's not that good people can't work it out from first principles, but it's
clearly a lot more work for those who have to figure it out from first
principles than for those who know immediately that a stack is involved and
apply their focus to the loop and the stack and handling follow-up questions.

The ideal question or problem from our perspective is is one that is equally
challenging for everyone, that way when we're having a candidate review, we
won't need a lot of, "...But then again, the candidate clearly knew nothing
about parsing, so they actually did quite well, considering."

~~~
thaumasiotes
> The ideal question or problem from our perspective is is one that is equally
> challenging for everyone

It seems like you could solve this by asking a question no one has ever been
exposed to before, _or_ by exposing each candidate to the theory relevant to
the question yourself.

Except that the first strategy is guaranteed to fail some of the time, because
there are no questions like that.

Why not go the other way?

~~~
braythwayt
There is an interesting communication problem here.

If I could guarantee that every candidate would read my blog, I'd happily
point them to an essay like this, and then in the interview we could work on
something meaty, like pattern matching.

That would suit your second recommended strategy. But I wouldn't want to write
an essay like this, and then favour those candidates who like to read Hacker
News or follow me on Twitter. We're trying to select for talent, not for
membership in a tribe :-)

So we're retiring this question, and we already have a few replacements that
we use for many candidates. But I won't blog about them, in an attempt to hew
closely to the first strategy you suggest.

~~~
thaumasiotes
I think you could pretty easily guarantee that every candidate will read email
from you (where "you" are their point of contact with the company) prior to
the interview. Why do you need to direct them to the essay from your blog?

------
praptak
A silly trick question: are balanced parentheses expressions palindromes?

~~~
noir_lord
No because () reversed is )(

~~~
pygy_
Also, they are not necessarily mirror images:

    
    
        (()())()

~~~
noir_lord
Yep but I went for the simplest case that answers the question.

~~~
pygy_
Indeed, my comment was more "even if you count `()` as a visual palindrome, it
doesn't work".

