
Optimizing a breadth-first search - yoha
https://www.snellman.net/blog/archive/2018-07-23-optimizing-breadth-first-search/
======
frankmcsherry
[https://en.wikipedia.org/wiki/Iterative_deepening_depth-
firs...](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search)

Edit: for context, wasn't meant to be "zomg how you so dumb" so much as
"everyone should also read, cause is relevant".

~~~
rrobukef
This was my thought too, but I don't want to assume: he cites enough standard
but specific sources that he should know the general search algorithms. Also I
think that since he doesn't mention standard depth first search at all it
perhaps wasn't relevant?

~~~
jsnell
As mentioned briefly in the introduction, I didn't have good heuristics to use
for a scoring function. A totally undirected DFS didn't seem like a great
option. The readme links to an existing Python-based solver for the same game
[0], which had both BFS and DFS modes. The DFS one was considerably slower on
large puzzles even when given the optimal target depth.

Totally happy to believe I'm wrong about that, though :)

[0]
[https://github.com/apocalyptech/snakebirdsolver](https://github.com/apocalyptech/snakebirdsolver)

~~~
frankmcsherry
It's a super interesting post, and I've no reason to think that iterative
deepening would be better, just that it is designed to deal with exactly this
problem. The lack of growth in novel states may invalidate its main hypothesis
though (tree-like growth).

The Python version seems to do DFS using function calls, and I could imagine
this isn't the most performant way to do it. Also, the description implies
that their DFS implementation retains state; not sure what to make of it, and
not a Python reader:

> Depth-First should be a bit kinder to system memory, though it'll still chew
> up quite a bit remembering which game states we've seen before.

~~~
jsnell
Fair enough, and it's easy to test :)

I made a non-iterative DFS, but hardcoded the maximum depth to the optimal
solution. On a trivial puzzle that should take <1ms, it takes 6 seconds. The
BFS visits 303 states, the DFS 9M (but only 237 unique ones). This is with
deduping the states on the path. If we allow the same state on the path
multiple times, it'd be 12s, 50M visited states, 239 unique states. This is
with a random visiting order, it'd be a bit worse with a static visiting
order.

This is obviously not fully optimized code (whee, std::unordered_set). But
fixing that won't help when we're off by 3-4 orders of magnitude. I think the
shape of the search graph of this game just isn't well suited to any form of
DFS, there are far too many alternate paths to each state.

~~~
rrobukef
Thank you. It's interesting to see the explosion of states. You could add a
cache (per search depth) to deduplicate states again. It will reduce small
cycles and the used memory is easier to control.

Kudos for the encoding. 0.7 bits per state is dense.

------
stcredzero
Something has happened to Comp Sci programs over the past 3 decades. Based on
what's too small a sample size (the graduates I've been interviewing in SF) it
seems like a very large number of graduates from CS programs with 3.75 GPAs or
above, can't do much more than glue together libraries, can't practically
design a system on their own, and if ever confronted with a graph theory
problem, can't do much more than name-drop algorithms, and fall far short of
being able to implement those algorithms.

There are literally problems that were 1) once covered in freshman year, 2)
could once be recognized and solved by CS grads in seconds, 3) stump recent CS
graduates, 4) prompt HN commenters to say how they could solve it if given a
few days, and 5) come up in conversation if you go to meetups and talk to
people doing actual work.

How does this relate to BFS? It used to be that someone trained as a computer
scientist would look at a data structure or a graph and start running some
quick gedankenexperiments: What would happen if I tried to find that with DFS?
What would happen if I tried to find that with BFS? Those aren't going to be
suitable solutions for all problems, but it's a good place to start thinking.
There seem to be a large number of recent grads who can't even get that far.

~~~
taeric
I think you are using very rose colored glasses looking at the past. The best
of the best could do that, likely. However, few could ever do things in
seconds. Nor is there really any benefit in being able to solve something in
seconds.

Interestingly, to me, it seems our industry was dominated by people that got
good at gluing things together. To a very large degree. We bemoan this when we
talk about how much more responsive machines used to be, but I think there can
be very little denying that computers do more.

Granted, I suspect I am at best one of the bad graduates you are referencing.
:(

~~~
hashkb
Nobody here would deny the business value of gluing APIs together. We're not
asking for your empathy, either; but we are telling you the truth about the
difference between where the bar was and where it is now.

The idea that "computers are fast, my code just needs to work" is a really
really horrible way to treat your users' devices. A whole host of similar
ideas are now popular, and it's pretty obvious that it's because the industry
is now dominated by people that are (just) good at gluing things together.
This is a bad thing.

~~~
taeric
I'd wager the bar wasn't as high back then as you think. Most people still
wrote inefficient things. Lots of it. There is a hope that most of the
inefficient things flat out stalled out due to needing to be much more frugal
of resources, but I don't know of any data backing that.

So, seriously, do you have data showing that the bar is lower? Or are you just
performing selection and survivor bias to get such a negative view?
(Similarly, am I doing the same to get a positive view?)

My main qualm is people that seem to hold incoming folks to the bar that they
have today, without acknowledging the growth that was necessary for some of
that. As a parent, I fully know my children will be better at most everything
than I am. In time. Same for most of the younger generation coming out of
college into the industry. Most are or will be better than I am. Pretty much
full stop.

~~~
stcredzero
_I 'd wager the bar wasn't as high back then as you think. Most people still
wrote inefficient things. Lots of it._

Lots of those people who wrote inefficient things were the C students in Comp
Sci. A lot of those people writing bad code were home-grown programmers who
were completely missing pieces of knowledge. From what I saw as an undergrad,
it wasn't the 3.75 GPA CS students who were doing those things.

 _So, seriously, do you have data showing that the bar is lower?_

It's not my job to do such studies, however I do interview job applicants. A
disturbingly large number of applicants, who all have 3.75+ GPAs, try to tell
me nonsense, like "null pointers take up no data." I'd say I encounter
something about as egregious as that with a bit short of half. In my
recollections of interactions with classmates, I could take it for granted
that CS students who made it past Freshman year knew enough to actually
implement algorithms and could actually implement a recursive function. If you
had tried to talk to them about such things as some kind of deep esoteric
knowledge, they would've just given you a funny look.

~~~
taeric
More assertions that I just don't know that I buy. I don't ultimately think
you have to take my assertions as valid counterpoints, but as things currently
stand, I don't think either of us have actually presented any data. I'm just
not asking anyone to believe that things are truly different than they used to
be.

I've similarly been doing interviews for upwards of 20 years now. There are
some particularly bad interviews I've done recently, but there were some
particularly bad ones I did towards the beginning of my career, as well.
Worse, some of the senior engineers I used to be under were quite bad, all
told. (Which does not take away from many of the ones I was with that were
bloody amazing.)

~~~
stcredzero
_I don 't ultimately think you have to take my assertions as valid
counterpoints_

Many of your supposed counterpoints are straw-man. No one is expecting someone
to have memorized or to reinvent KMP or this algorithm or that. What we're
looking at is basic understanding and _conceptual tools_.

 _I 'm just not asking anyone to believe that things are truly different than
they used to be._

But things are clearly different than they were years ago. The number of CS
grads has fluctuated a lot, in response to increased investment and industry
bubbles bursting. Things are clearly very different today than they were in
the early 90's.

[https://nces.ed.gov/programs/digest/d12/tables/dt12_349.asp](https://nces.ed.gov/programs/digest/d12/tables/dt12_349.asp)

We also covered "gluing things together" in the early 90's. The concept was
already very well worn even when I was a newly minted CS grad, with Jon
Bentley discussing it in Programming Pearls using awk. The thing is this: We
would also glue libraries together, but we did that while applying our
generalist knowledge. It seems to me that there are a whole lot of CS grads
who get through their entire undergrad education pretty much only doing that.
Maybe that's all well and good, and they can accomplish many great things this
way. However, it seems to me that much of the basic generalist knowledge is
now mistakenly labeled as "specialist" and is needlessly missing from the
general population.

~~~
taeric
(Just so it isn't completely lost, I basically merged my answer for this in
the above branch. Realized it was both of us on both of these out croppings,
didn't see the point in pretending it was two discussions. :) )

------
megaman22
> 100GB of memory would be trivial at work, but this was my home machine with
> 16GB of RAM. And since Chrome needs 12GB of that, my actual memory budget
> was more like 4GB. Anything in excess of that would have to go to disk (the
> spinning rust kind).

I hope this is a joke, although it's a little scary totalling up how much
Chrome is chewing right now, with just one window and seven tabs open...

