Hacker News new | past | comments | ask | show | jobs | submit login
The history of Berkeley DB (acm.org)
145 points by yarapavan on Nov 20, 2021 | hide | past | favorite | 25 comments



> Chris eventually went to Amazon and brought Berkeley DB with him there, so we became one of the first backing stores for Amazon's Dynamo key-value store.

BDB was impressive in that you’d file a bug with them & they would have a patch for you by the next day. If you did the same thing with Oracle, stars would have to align even to get an ack.

I think that’s because they kept BDB pretty simple. I happened to sit in a meeting with Margo Seltzer where the (later distinguished) engineer from Amazon was asking for new features. I think Margo must have said no to every feature request because it didn’t align with their roadmap. I’ve never seen anyone do that. It was impressive to watch.

Then again, Oracle acquired Sleepycat in the end.


This story is rather misleading. BDB was at Amazon long before that.

We used BDB at Amazon way back in the beginning days. It was the first DB the company used, before Oracle and before any in-house systems were written. Shel wrote the code to suck the info we got from Baker & Taylor (basically Books In Print) into a BDB, and that was the Amazon bibliographic database, from day one.

He also knew either Keith or Margot and had a few conversations with them along the way about some performance tweaks that we either wanted or had developed (can't recall which).


I worked tangentially with Chris at Oracle's public cloud offering. His work and train of thought on how to approach systems was always inspiring.


Yes, Chris was pretty cool.

Just to clarify, the engineer in that meeting was Peter Vosshall. I wanted to quote the sentence about Amazon rather than the person.

Also, I was a newbie at the time (probably still am) & I consider myself lucky to be in the meeting. They probably let me in on the meeting because I didn’t mind carrying the pager as much. :)


"after I quit a miserable job"

I was there. He was my boss. It was miserable.


Because of him? Or because of the job?


The job was miserable. He was VP of engineering. He, I, the VP of marketing and I think someone else left within the same week because things weren't working out. (It was only a 50 person company.)

The company got bought out, mostly as acqui-hire for the scientific staff.

The main tech was by a Stanford prof. and one of his ex-grad students. An X-Windows/IRIX-only(!) based tool for molecular modeling. It added support for integrating bioinformatics analyses, gathered from public web servers. Which was cool, but companies wouldn't buy it because sending proprietary data to public sites as a big N-O.

Many companies had similar resources in-house. The program could call out to awk or shell as a configuration language for doing those searches. But our customers were all switching to Perl and didn't care for those languages. So, late in the cycle we added Perl integration support.

There was only one person who could really work on the core tool. (I remember fixing ~10,000 compiler warnings, nearly all because it used a home-brew GUI based on void* function pointers.) There had been a push to port it to the Mac, but that took a year and the fork was unmergeable.

We developed an intranet server product for basic bioinformatics tools in perl that would mirror the core bioinformatics databases, process them for FASTA, BLAST, and text searches. Mind you, this was 1998, so CGI.pm and a home-brew templating language I threw together.

We needed this because we were ramping up science consulting (I don't recall why any more), and needed our own internal tools because proprietary data.

So, Mike came in as VP of engineering, with a product that was hard to maintain, nor could it quite pivot to the new hotness of bioinformatics, nor pivot to the new hotness of the web. (Get this: there was a proof-of-concept version that used a persistent version of the tool running on the developer's desktop, in an off-screen display buffer, which presented a clickable image map to the user. Each click would round-trip another screen shot to the running program.) And without the ability to really hire the people to get out of that hole, because funding was running out.

On the organization level, there was also then-new CEO, who sales and marketing referred to as a tornado, because she would come in and everything would get blown around and mixed up. Not a good boat to be in.

This was also Mike's first VP-level management job, so rather like jumping into the deep end.

Mind you, I was a fresh-behind-the-ears 27 year old, with only 2 years of professional experience in an academic lab. I know little about what the politics was with upper-management. I can agree that it was miserable.

Going back to Sleepycat, I actually evaluated BDB vs GDB vs. a few other technologies for one of our projects, and ended up having problems with all of them. Only after I left, when I met Mike after he started with Sleepycat, did I point out that I he probably should have told me about his connections with BDB. He agreed. ;)


> This was also Mike's first VP-level management job, so rather like jumping into the deep end.

pretty sure mike was a vp at britton-lee hardware database company during the mid 1980s in los gatos. i was there.


I am entirely willing to believe my 20+ year old memories are wrong, and I see Mike Olson was at Britton Lee.

Wouldn't he have been rather young to be a VP?


are you thinking of mike ubell? (paula's mike.)


Wow thanks for sharing that


Long time ago, someone used BDB at work for storing data, and replacing with sqlite we've found it to much faster. I don't remember the details, and I know that even sqlite needs some fine tuning such that it's performant and safe for most of the cases you'll see (for example maybe not durable, but at least would not thrash the database file if the process it was running from crashes). Time to re-evaluate!


Oh goodness - I’m just getting Bostics Sleepycat -> WiredTiger play now. Well done Keith.


I've used Berkeley DB for decades, in many projects, many time having to have a yelling match with other people who wanted to use one of the large spaghetti plate "database".. this was before the "noSQL" trend, and even after. How many projects /need/ network access etc, in a LOT of cases you just want a file backed, very quick library and that does the job.

For more complicated setups, sqlite also works wonder.


BDB is brilliant. SQLite is my go-to now for new projects as it’s nearly as good as BDB in most ways, and much better the second you need rich queries.


I used BDB as a Win32 profiler backend. The profiler was lightweight and would write a flat file with profiling data and function addresses to keep the captured data small.

A post processing tool would read the profiler data and create a BDB file with support for extracting call graphs and topN sort of analysis.

The final GUI was implemented in Visual Basic since other developers would not use the TUI/CLI based tools in console.

The next project used BDB to store file system metadata on embedded NAS storage. We implemented a fast ‘find’ like service based on file metadata (stat fields) stored in BDB with support for user defined file metadata.


I always found "Sleepycat" kind of charming as a company name.


It's interesting that various folks decided over history to re-write or clone BDB, I suspect sometimes to avoid dealing with Oracle. e.g. LevelDB, RocksDB, Tkrzw.


Fond memories of using BDB with Software.com/Openwave's InterMail software, which powered a large portion of the world's email back in the day. The name raised a few eyebrows during meetings with customers.


I would have given almost anything to be there and work on this with them or sleepycat’s successor wired tiger.

So much fun to implement this stuff and see it provide so much value.


It was the best of times.


How do BDB and Microsoft's Extensible Storage Engine compare?


I don't know a great deal about JET Blue, but they were developed to serve roughly identical needs (e.g. one us under ActiveDirectory, the other is or was under all the other LDAP servers). I think MS put more into that layer though, for example it can do tuple indexing. With BDB that kind of thing was done in the layer above, in the application.


Berkeley DB is such an important piece of software today . Its one if the essential parts of bitcoin


Initially the wallet and block store, until the block storage caused a fork, it’s now using leveldb. The wallet store is migrating away from bdb as it cannot move past 4.8 and the newer wallet format is more expressive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: