

Ask HN: Is there a key-time-value database? - yellowbkpk

I'm looking for a way to store kev-value pairs that change over time and maintain the key-value history.<p>I'd also like to be able to retrieve the key-values for a segment of time in the past.<p>Does anything like this already exist?
======
azanar
<http://en.wikipedia.org/wiki/Time_series_database>

I suspect this is what you are asking for. This page has links to a number of
time series database systems.

~~~
yellowbkpk
Thanks, I knew it had to be a lack of the right Google keywords.

------
StrawberryFrog
Yes, products like this do exist.See
<http://en.wikipedia.org/wiki/Time_series_database>

The oil industry uses them to record data points coming from hundreds of
instruments on equipment on their oil rigs (pump pressure, engine temperature,
pipe rate of flow, motor RPM, etc). The oil companies that operate them are
rich, and there's big money at stake to optimise the flow of oil; not to
mention lives at stake - Piper Alpha accident in 1988 cost 167 lives and £1.7
billion.

The one that I saw used was OsiSoft Pi System
<http://www.osisoft.com/Products/PI%20System/>

------
russell
What you want is a database that supports the equivalent of a multiple column
key. You can fake it by concatenating your key with a time stamp. Tokyo
cabinet supports this with its btree+ store. You can select a range like
thiskey-00000000 to thiskey-99999999. The latest is the last returned. Or give
the latest the key thiskey-00000000 and rekey it when you do an update.

------
Maro
This may or may not apply to your case:

About two years ago I implemented a database app (relational, Mysql) where
tracking versions [of certain tables'] was a requirement. I solved it by
adding a ValidFrom timestamp to the table, and now every row has an associated
validity period, eg. the customer's address was X between T0 and T1, Y between
T1 and T2. Unfortunately this is not some kind of convoluted backup solution,
as the application logic sometimes needs to know these values at some prior
time (it's a billing system).

Roughly, the basic problem is that simple queries or enforcing consistency
requirements are now very complicated, since tables have to be JOINed and
WHEREd on the validity periods. People (other than me) are unable to modify
the data per command like SQL, imagine 3 normalized tables each with validity
periods, and you're UPDATEing them but don't want to break time-consistency,
it's frustrating. The client programmer basically failed at creating a usable
GUI that was able to manage these versions, and I don't blame him (think of a
case where you have a versioned table T1, a versioned table T2, and you're
connecting them with a versioned has-a table T3). UI-wise it's kind of a
nightmare.

Lesson learned: _BAD IDEA_. Don't do it. Fight with your manager to have the
requirements changed. I've been thinking ever since what would have been a
good way to do this, and I can't figure out a good way, hence I think this is
a stupid requirement. Also, now I actually have a picture of the business
process that drove this requirement, and although this isn't a trivial issue,
I believe it could be solved by fixing certain business processes that require
application logic to know about past data [in this manner].

~~~
profquail
You're right, it is a huge pain to do this manually, but depending on what
languages and platforms you're developing for, there are libraries to handle
this automatically.

I've done something similar with NHibernate, where you only need to add a
couple of event handlers (maybe 30-40 lines of code + a bit of configuration
XML) to set this feature up. I wish I'd heard of it earlier though, since it's
really quite useful (though as you said, horrendous to implement manually).

There's a wiki on NHForge about it, though I don't know that it's really the
best way to do it: [http://nhforge.org/wikis/howtonh/creating-an-audit-log-
using...](http://nhforge.org/wikis/howtonh/creating-an-audit-log-using-
nhibernate-events.aspx)

~~~
Maro
Thanks.

I've checked out your link, but I'm not sure this would solve my core
problem(s). The problem is that even without versioning, a lot of the system
is made up of long SQL commands containing subtables and JOINs and such. In
several instances I had to optimize the queries. So I wouldn't want to use
some ActiveRecord thing or generators. And this is now further complicated by
versioning.

Technical issues aside, the fact that everything is versioned makes it hard to
reason about the data. Eg. the people in charge of "controling" are always
asking me for help. Many times it turns out that there's no problem, the data
is consistent, they just got confused by the versions.

~~~
siculars
familiarize yourself with time series data, as mentioned earlier in the
thread, and read the amazon dynamo paper specifically the parts dealing with
vector clocks.

~~~
Maro
As my startup is developing a distributed key-value store, I'm quite familiar
with Amazon's Dynamo, and what I was talking about previously has nothing to
do with vector clocks.

------
profquail
If you're storing a key (say, a GUID) and a value (maybe a string), then all
you'd need to do is make another field/column that handles timestamping. If
you're using an ORM like Hibernate, it will handle (most) of it automatically
for you.

Or, you could simply set your database up to use a composite key on the table
(so the key becomes the key + revision number), though that isn't the best way
since composite keys usually incur some performance overhead.

------
bayareaguy
Google BigTable works this way. <http://en.wikipedia.org/wiki/BigTable>

