
Searching 20 GB/sec: Systems Engineering Before Algorithms (2014) - misframer
http://blog.scalyr.com/2014/05/searching-20-gbsec-systems-engineering-before-algorithms/?1
======
snewman
Hi! Great to see this pop back up on HN. I'm the author of the blog post (and
Scalyr founder), happy to answer any questions.

Downthread, someone mentioned that they couldn't find the HN discussion from
when this was originally posted; it's here:

[https://news.ycombinator.com/item?id=7715025](https://news.ycombinator.com/item?id=7715025)

------
hglaser
Great post.

BTW, this product (Scalyr) is a lifesaver. We (Periscope) are able to operate
~ a dozen heterogeneous servers with no FT DevOps largely because of Scalyr.

------
twotwotwo
Lots of attention goes to OLTP-type loads for good reasons, but when you do
design to just stream fast, some fun things happen:

You can use lots of relatively cheap spindles in parallel, and think of each
one as (at least) 100MB/s of sequential read speed and a couple TB of space.
You have fast compression available that can increase your effective bandwidth
and make the effective cost of space cheaper.

You can draw on well-understood ways to search, sort, do hash- or sort-based
joining and grouping, and so on.

Streaming doesn't need a big in-memory cache to avoid disk seeks, so you can
use those gobs of RAM for other things--aggregating results or holding data to
join against, say. (Of course, if you don't need the RAM, disk cache might
still be useful for some access patterns.)

Besides log search, you see a stream-fast approach in analytics-focused DBs:
BigQuery, Redshift, Vertica, and open-source ones--Facebook put up a good post
about the work that led to the their Hive ORCFile design.

Some bioinformatics tools load a big hashtable into memory and, roughly, hash-
join against a ton of raw data streamed from disk, then sometimes then repeat
the process with another hashtable.

These are not at all original observations, but I managed to hear about these
sorts of analytics and bioinformatics tools for a while before really getting
how or why why they did things all that differently from a typical random-
access-oriented database.

------
imaginenore
I have another idea for you guys. Instead of relying on expensive AWS SSD
instances, why not switch to Hetzner, and keep everything in RAM?

128 GB RAM for $135/month:

[https://www.hetzner.de/en/hosting/produkte_rootserver/px120](https://www.hetzner.de/en/hosting/produkte_rootserver/px120)

And you will have so much extra disk space, you can use it for backups. Or
even resell it.

Your i2.4xlarge cost you $2,455/month.

~~~
noir_lord
Tried hetzner last year would never touch it with a bargepole.

~~~
imaginenore
What's wrong with them?

~~~
noir_lord
Had the machine for ~6 weeks.

In that time I had 4-5 days I couldn't even login since the network was "under
attack", they didn't tell me I told them.

Their management interface is an exercise in frustration.

Customer support is horrific, "We are not aware of any on going network
issues" yup except I can't ping it from any of my other servers anywhere in
the world.

You get a fast machine for a cheap price but the network is a joke, the
support is a joke and they are under a near perpetual DDoS.

------
imaginenore
I wonder why they chose Java for substring search. Why not C (strstr) or grep?

[http://www.arstdesign.com/articles/fastsearch.html](http://www.arstdesign.com/articles/fastsearch.html)

~~~
snewman
We started with Java because the rest of our backend is written in Java. We
stuck with it because it has not turned out to be much of a bottleneck in
practice. This came as a surprise; I had expected that we'd need to rewrite
the substring search loop in native code. We may still do so eventually, but
we're able to get very high performance in pure Java. (1.25GB / second / core,
as noted in the post.)

Edit to add: "good enough is not good enough" \-- very well put! Yes, it would
be good to move this loop to native code. However, with our current design,
the overhead of calling out from Java to native code would probably outweigh
any benefits. We're planning to move our core database storage to off-Java-
heap memory buffers, at which point it will become much more feasible to call
out to native code.

The inner loop speedup on this particular code will be more like 2x than 3x,
and the speedup of the overall system will be quite a bit less than that. But
your point is still correct.

~~~
imaginenore
Well, if you can triple your performance, you will require 1/3 of the servers,
and your clients will enjoy faster responses. I thought that was the idea,
good enough is not good enough.

Anyways, congrats on your success. I would love to read more about the
business side - finding clients, profits, expenses, etc.

~~~
exDM69
String searching is an inherently I/O or memory bound problem. Your CPU ends
up waiting for bytes to arrive from memory at around 50 GB/s theoretical max,
half of that in practice usually. The programming language or algorithm
doesn't matter that much when memory bandwidth is saturated.

A faster implementation of a string searching algorithm could only save a few
milliwatts of CPU power, it wouldn't make it faster nor require less hardware.

~~~
imaginenore
The author says they max out at 1.25 GB/s. That's a long way from the
theoretical max when it comes to even DDR3.

It's possible they are bound by the SSD speed.

I can't find much on AWS SSD max speeds.

------
lostmsu
Nice. But does not scale.

------
swatow
Judging from the comments, this article was written around May 8 2014. Can we
get a (2014) in the title?

~~~
dang
Good catch. Added.

------
kiallmacinnes
The linked article has been posted before, I can't find the old HN thread..
But it was certainly worth a re-read :)

I wonder has scalyr reached their expected 100GB/s yet?

~~~
caust1c
It was posted 17 hours ago by the same guy and a different url:
[https://news.ycombinator.com/item?id=9201444](https://news.ycombinator.com/item?id=9201444)

~~~
misframer
I was asked by HN to repost it.

~~~
dang
Yes; part of an ongoing experiment to reduce the randomness of /newest by
giving substantive stories multiple chances to make the front page.

Edit: though it looks like we broke HN's rule about duplicates in this case.
Sorry!

~~~
tbirdz
What's the criteria for a "substantive story"? Not being snarky or difficult,
just genuinely curious.

~~~
dang
Sorry for the late reply—I wanted to give you a serious answer and didn't have
time earlier.

The criteria for "substantive" are what the HN guidelines (along with
[https://news.ycombinator.com/newswelcome.html](https://news.ycombinator.com/newswelcome.html))
say about which stories are on topic: intellectually interesting (as opposed
to sensational) and so on. HN's culture is well-enough established by now that
I think most community members can probably agree, not of course on what
interests each of us personally, but on a working set of candidate stories
that roughly fit the criteria. That's our hypothesis, anyhow.

HN sees perhaps a thousand new stories a day. Most aren't "intellectually
interesting" in HN's sense (i.e. gratifying intellectual curiosity), and the
weeds have grown too thick for the upvote system alone to reliably surface the
potential gems. To comb through /newest looking for them has become too much
work.

Some interesting stories (again, "interesting" in the HN sense) do fine under
the upvote system. Breaking news, stories about fashionable technologies, and
anything controversial reliably attract upvotes. But the quieter and deeper
submissions are often not immediately recognizable. Those deserve closer
attention than they were getting, so we've been experimenting with different
approaches to finding them.

In the spirit of "do things that don't scale", it's mostly us trying different
things manually for now, but what we're looking for is a mechanism that can be
opened to anyone willing to put in the effort. The upvote system will work
exactly as it always has, but we're hoping to add a new one that complements
it and compensates for its weaknesses.

Since it requires significant effort to look through the story stream hunting
for out-of-the-way pieces, one idea (my favorite) is to make this a new way of
earning karma on the site. But first we have to find something repeatable that
will work.

