
LinkedIn open sources IndexTank: search engine and service - wvl
http://engineering.linkedin.com/open-source/indextank-now-open-source
======
emmett
This is awesome news. Massively advances the current state of the art in open
source search.

Definitely considering replacing our search backend at TwitchTV with this...

~~~
hajrice
Hey Emmet, we're one of the companies interested in continuing IndexTank's
platform.

If you want to hear more, just ping me at emil@helpjuice.com

~~~
citricsquid
related to your start up and not this post, you should work on your
introduction/explanation video on the home page. Just from a quick watch it
has some problems, the lack of any script (or if you had one you didn't
rehearse it) means time I am investing in watching your pitch to me as a
potential customer is time spent watching you think and decide on what to do
next. The video on your tour page (<http://helpjuice.com/tour>) isn't great,
but it is much _much_ better as an introduction video to your product.

------
mattdeboard
What is the differentiation between using this and using Solr? ElasticSearch?

What does "real-time" mean in this context? Is it indexing database content in
real-time? Is it in reference to the look-ahead, predictive query completion
LinkedIn has?

What would compel someone like me -- a dev who has ownership over the very
significant search piece of my company's primary product -- to give this
serious evaluation?

~~~
nl
I looked at it some before IndexTank was bought (and I've done a reasonable
amount of Solr work).

The biggest conceptual difference seemed to be that IndexTank was specifically
written to autoscale - it was designed from the ground up to run on cloud
providers, and to instantiate new resources as needed. It also has no central
point of failure.

Solr Cloud (and things like Solandra) deliver some of this functionality to
Solr.

~~~
sandGorgon
If you had to incorporate search today - would you use indextank or solr ?

~~~
nl
Solr, because I know it well. But I'd love to play with IndexTank.

------
biznickman
Great news but I'm still willing to pay for someone to manage the operational
side of this :) Know of any solutions? I'm aware of websolr but their
configuration process wasn't as simple as IndexTank

~~~
nestlequ1k
Same here. Indextank service and pricing was great. Hoping someone can match
it.

------
toisanji
I'd like to see how this compares to lucene/solr. With solr its easy to index
100's of millions of docs, but its a pain to write a custom scorer.

~~~
espeed
IndexTank provides real-time document indexing and its algorithm incorporates
real-time metrics, like vote data. And it scales horizontally.

------
alexro
Last time I read about IndexTank I noticed that their query language isn't
that sophisticated, it could basically find only matches. Did it improve, is
it possible to do fuzzy matches?

ADD: also, does it support non-english languages at all?

~~~
nachopg
IndexTank right now supports preffix search, stemming and a basic
implementation of a Did You Mean feature. Regarding languages, it supports
tokenization for every western language, and not long ago, we added support
for CJK too.

------
gexla
And a new startup offers a hosted IndexTank service in 3,2,1...

For anyone looking for a job at LinkedIn, making impactful contributions to
this project could be a way in.

~~~
sycr
Yeah, really though.

The indextank repo proper is interesting (and useful) enough, but indextank-
service (<https://github.com/linkedin/indextank-service>) made my jaw drop a
little. It's a full administrative stack for deploying indextank as a service.

------
SlightGenius
Does IndexTank still integrate social inputs?

"IndexEngine: a real-time fulltext search-and-indexing system designed to
separate relevance signals from document text. This is because the life cycle
of these signals is different from the text itself, especially in the context
of user-generated social inputs (shares, likes, +1, RTs)."

~~~
diego
It integrates anything that can be represented as a number. Prices, number of
badges, importance of titles, it doesn't matter. You can combine any of those
inputs into a relevance formula that is evaluated at query time. Of course
IndexTank won't find those inputs for you, you have to provide them.

~~~
mgkimsal
Are the historical values of those signals kept and queryable? Such that I
could check document ranking with signals X, Y and Z today and 3 days ago and
check the impact of the signal changes?

------
lobster_johnson
Anyone know about how IndexTank's facets scale with the cardinality of the
attribute? We tried using ElasticSearch's facet system for tags, but we have
about 150k tags, and this does not play well with ES. (It's very stupid about
how it caches them.)

~~~
santip
IndexTank categories are not designed for the tags use case, and will not work
properly. It's intended for a relatively small amount of categories for which
each document has a single value. The amount of different values of a category
can be large but the amount of categories cannot. If you want to implement
something like tags, then each tag should be a category because you'll want
more than a single tag per document. We were in the process of designing a new
feature to support this kind of use cases, and maybe we'll start a branch to
implement it and hopefully the community will colaborate.

~~~
lobster_johnson
Thanks for clearing that up.

------
swah
Those kinds of services are mostly being written in Java these days, and
everyone would aggree they constitute awesomer software than another
Javascript blablabla library... so how can Java be dead? I should learn
Java...

------
fufulabs
In terms of ease of installation > working state, how does it compare to
ElasticSearch or Solr?

------
iag
very impressive linkedin. Good move.

