
Ask HN: Open source our multi-database indexing engine? - ChrisDutrow
A friend and I recently built an indexing engine (in Python) for use with sharded noSQL databases.<p>It allows us to take objects with complex hierarchies and create indexes in the database of our choice. It has capabilities similar to google app engine&#x27;s Big Table. It allows you to create O(1) and O(log n) indexes. But doesn&#x27;t allows you to accidentally do anything O(n) (like joins). (It also doesn&#x27;t have a query language so the code base isn&#x27;t super complicated)<p>This is useful because you can put larger indexes that change less frequently in a database that uses the hard drive (like Cassandra). And put smaller indexes that you expect to be hammered in Redis (an in-RAM database).<p>A typical use case is that we&#x27;ll take a large, complex, hierarchal object (like a document), serialize it, then store it in Cassandra. Cassandra uses the hard drive and has multi-node and multi-datacenter replication. So its a good place to store large pieces of data that are important not to lose. Then we&#x27;ll put most of our indexes on a Redis server with a bunch of RAM because its super fast. If the Redis server goes down, we&#x27;ll just Map-Reduce over all of the serializes objects&#x2F;documents in Cassandra and re-build the Redis indexes. We also put some space-heavy indexes in Cassandra, such as tokenizations of the original document.<p>Not sure if this is obvious, but Cassandra and Redis have restrictions on putting multiple indexes on multiple properties of an object.<p>I&#x27;m curious if this is something that other people might find useful that we should release open source or if we are idiots who re-invented the wheel and there is already something better out there?
======
sdesol
> who re-invented the wheel and there is already something better out there?

In business school, you are taught how to create business plans, and in that
business plan, you have a section that talks about your competition. If you
don't know if there is something better or if something like this exists, you
are either not looking hard enough or you have stumbled upon something great.

If it is the former, do more research and I guess asking here isn't a bad
start, but really, you should be posting in some Cassandra specific newsgroup,
if such a thing exists. Or some NoSQL site.

If it is the latter then, hooray and too bad at the same time. Another thing
you are taught in business school, is the innovators, are usually not the ones
that benefits from their innovation. It's usually those that iterates on the
innovative ideas, that benefits. How good of a solution you have, will dictate
your market entry strategy.

I guess my answer to your question is, it depends, but I would have to
imagine, simple benchmarks would go a long way to further validate your
solution.

~~~
ChrisDutrow
I should clarify a few points:

The motive for releasing it open source is so that the code base will become
better and more useful. The goal is not to make money. If a competing code
base came in later and did a better job, this would be a good thing.

As for benchmarks - its not a database. Its a tool to much more easily create
index tables, search against those tables, and keep those index tables
accurate as you update your objects/documents. Especially the updating part is
extremely hard to do manually.

As an added bonus/side-effect, you can easily store your data across multiple
database. A pattern I like is to store indexes on an on-site Redis server with
128gb of RAM, store the serialized objects/documents in an on-site Cassandra
cluster _and_ on Amazon S3. If your hardware ever goes down, you can use the
S3 backup to restore your data in Cassandra and your indexes in Redis.

The performance part comes from the database itself.

------
akbar501
I'm definitely interested in what you've built.

I'm working on more advanced Redis use case patterns and your project fits in
with that theme.

~~~
ChrisDutrow
What use case patterns are you working on?

Its interesting how much you can actually do with Redis for. We actually
implemented binary search in the indexing engine so you can do non-equality
searches against sorted tables of strings.

I haven't found much you can't do with Redis that you can do with other NoSQL
databases. And with the cost of RAM and motherboards that support huge amounts
of RAM going down, you can index huge amounts of data in RAM and then just
hammer it.

~~~
akbar501
Sorry for the slow reply...was in meetings.

I'm working on common use cases from leaderboards, profiles/sessions, voting,
latest items, followers, and who's online. On the more advanced end, I have a
use case for secondary indexes in queue but have not started working on these
yet.

> Its interesting how much you can actually do with Redis for.

I fully agree. Using Redis for only simple use cases greatly under utilizes
its capabilities.

~~~
ChrisDutrow
Haha, so general database stuff then?

I think basically you can use it just like any other database, you just have
to write some more code sometimes. We actually have binary search for it in
our indexing engine so you can use it for non-equality searches on arbitrary
strings.

We might see it used more often with the cost of RAM going down. I think the
high cost of RAM in the cloud is an issue though. I have 64gb RAM servers in
my basement that cost $200, but the same server on the cloud would cost
thousands of dollars a year to rent.

------
ddorian43
So you've built a globally sorted index ? (main point of bigtable imho)

