

The Anatomy Of Search Technology: Blekko’s NoSQL Database - McKittrick
http://highscalability.com/blog/2012/4/25/the-anatomy-of-search-technology-blekkos-nosql-database.html

======
krishna2
This is the same technology that is used by the webgrepper tool
[<http://blekko.com/webgrep>] (a grep for the web pages' sources).

Disclaimer: I work at blekko and I developed the webgrepper.

As a side note, we have used this for various other purposes - some fun ones
being, store a big music collection (to extract meta data via mapjob),
citizenship test q&a (to pick random questions), the 'joke of the day' (of
course, this is our "hello world" example internally to new employees) ..etc.

~~~
DennisP
It looks like it'd be useful for a lot of stuff besides search engines. Is
there any place to get more details?

~~~
jsrfded
Greg has planned a whole series about the combinator architecture behind
blekko's datastore. Greg and I have both presented aspects of the system at
various conferences, but we're happy to chat about it with you directly too. I
think this might be the first time it's been published on the web though.

~~~
simonw
There's a video of one of Greg's talks (from Surge 2011) here:
<http://lanyrd.com/2011/surge/shzth/>

------
JohnGolt
Hey, great stuff! How do you implement high availability?

~~~
greglindahl
There are 2 parts to our high availability.

First is on our frontend side. We have 2 nginx servers (using linux HA and
vips) which send traffic out to the nodes of the cluster which are up,
retrying to a different node in the case of failure or a slow reply.

Deeper in the system, there are 3 copies of every piece of data.

Both of these are fairly normal mechanisms; the 3-copy thing is used at Google
and by Hadoop and friends.

------
kveykva
Isn't DynamoDB on SSDs? AWS alone might not be cost effective but, how does it
look with the other services also offered?

~~~
wumpus
Since our SSDs hold a cache of what's on disk, it's most effective to have the
SSDs and the disks are in the same server. It's hard to assemble separate
pieces into a tightly-coupled system.

------
Juha
Very good article in deed. I have always been amazed of how the search engines
can query so huge data sets so quickly. This brings some light to it.

It would had been nice to see some examples of the query language they use, it
if is comparable to other NoSql databases.

~~~
wumpus
Like many NoSQL databases, we don't really have a query language. Mostly you
read a few columns from a particular row. That could contain a combinator,
such as at TopN containing 1,000 inlinks to a webpage... something that you
would ordinarily fetch using a range query that returned 1,000 rows from a
table.

------
gruseom
Can you say more about the choice of swarm algorithms instead of Paxos?

~~~
jsrfded
Paxos would be good for electing a master, but we wanted to avoid having any
masters in the architecture. There are also scenarios where paxos can be slow
or fail to reach a consensus. We wanted high availability from each node in
the cluster regardless of whether 2/3 of the rest of the cluster were down or
unreachable; both parts of a partioned cluster should also be able to continue
to function as best they could.

Individual nodes can often make "personal" decisions about what to do in
subobtimal situations. If you can answer an incoming request, even with
partial or out-of-date data, do so; it's better than not replying. For the
repair agent, each node can see its own view of "holes" in the 3-level
replication, and offer to make copies of <3 buckets to bring back up to three
copies.

~~~
gruseom
This article and you guys' comments in this thread are really interesting and
suggestive. I hope you continue writing and talking about it. It's great
marketing for blekko too :)

------
lsuejung
Awesome insights into what goes into building a search engine. Very impressive
indeed.

------
tinyjoe
does blekko DB partition data base on primary key or it just query all nodes
everytime like elasticsearch?

~~~
jsrfded
Partitions are based on a hash of the primary key. The number of buckets in
the system has to be a power of 2. But we can split buckets to increase the
number, or even have some buckets that have split and some that haven't yet.
Each bucket is stored on 3 separate servers (and the assignment makes sure the
three servers are on separate racks).

