I am more than happy to answer any questions. It has mostly been an academic project while I was doing during my undergrad, but I am now looking at continuing the project as part of a Ph.D. Of course, I would love to coordinate a broader development of this project. :)
You may also be interested in another project I and some lab mates have been working on over the last couple of years, IonDB: https://github.com/iondbproject/iondb.
EDIT: You might also be interested in the initial paper, which can be found here: https://people.ok.ubc.ca/rlawrenc/research/Papers/LittleD.pd...
I investigated query precompilation in another paper I am waiting to here back on, and once everything has been a little better tested, that code will get pushed out as well. :)
And seriously, if anybody is interested in contributing, I would love to have some help. Get in contact with me!
(1) Ramon's (excellent!) lecture notes themselves are currently (but not always) available on his website: https://people.ok.ubc.ca/rlawrenc/teaching/404/notes/index.h...
(2) Garcia-Molina, Ullman and Widom's book and courses taught on their work:
Is it comparable with binary trees [O(n log n)] and hashtables [O(1) on average]?
That said, IonDB was in part motivated to be a storage engine for LittleD. Then you basically get exactly what you described: your choice of underlying dictionary data-structure for the performance characteristics you need. It will also make UPDATE/DELETE a breeze, once we incorporate the two code-bases!
LittleD gives you the bare basics, using well under 100K of compiled code, and under 1K of RAM for average queries, period. You will soon be able to strip out the query translator and "precompile" your queries with parameters to bring the total compiled code down to under 35K on device. This is not always desirable, but in some cases can be very useful.
I understand programming in general and microcontrollers; I was paid to program one. Still, surely nowadays a system with a tiny 8-bit microcontroller is probably considered as auxillary to some kind of larger machine, which should be way better equipped to do the heavy lifting such as SQL databases?
/* Advance a pointer num_times times. Notice, this does no checking, so
you better know what YOU ARE DOING!!!! */
I'm reminded of a commercial home automation platform that had agonizingly slow response times because the LCD controllers were sending raw, unauthenticated Python code to the 70MHz central controller.
While I applaud the effort, and it will certainly be useful for many people for many other reasons, I don't think IoT is the best use case here. If your data is at all valuable/useful, you don't want it sitting out somewhere on some device that will hopefully/maybe be online when you go to query it. Plus now just to be able to query it remotely you will still need to develop some sort of API that lives on the device that can talk to the LittleD database.
Finally if you really are doing 'IoT' you clearly have a bunch of things that you want to view/control from a centralized platform. When you have the devices talking to a server, you can do this. When you have to ask each device individually what is going on with it, this becomes much harder.
As you noted, its not really an IoT databases. It could be used as such, but thats not the reason I built it.
I have also been working on the data transfer problem, because without the ability the share the data it is in fact kind of a useless platform in many applications. I have a job manager being developed that will allow for scheduled or ad-hoc execution of functions. I've also written some small library to encode LittleD results for network transmission. Using some basic networking stack, it would be easy to assemble these pieces into something that could be viewed/controlled from a centralized platform. LittleD could even be modified with relative ease to actually query over other LittleD instances!
EDIT: I would actually like to encourage you to divulge some specific criticism for the IoT application. Are you specifically speaking to your automation system, or to IoT at large? It seems difficult to predict exactly how any one person might apply any given technology.
Open source databases: http://developer.couchbase.com/mobile
Info about how GE is using it: http://www.couchbase.com/nosql-resources/presentations/offli...
Long story short: when you get the network stuff figured out, it's gonna be an interesting product.
Feel free to contact me (info in profile) if you want to chat about how this can fit into the industry.
Or is it possible to run or communicate/sync with Couchbase Mobile directly from a "thing" itself?
Frequently it is exactly when connectivity is gone that you want some intelligence combined with data upon which to act, or a buffer in which to spool data. Otherwise you've just got a pile of sensors that are slightly easier to deploy than the ones with long extension cords.
Put another way,
> If your data is at all valuable/useful, you don't want it sitting out somewhere on some device that will hopefully/maybe be online when you go to query it.
So what do you do with data on your device when it is offline? If it is at all valuable/useful, hopefully you're not just dropping it on the floor when Daddy-node is unreachable. Sounds like a job for which some sort of structured data store might be useful...
What if these IoT devices aren't just dumb sensors, and need to control some process? Then your argument is reversed; you want the data to be available locally, instead of having to contact a remote server that "will hopefully/maybe be online".
Any time you want to store historical data, and query that data later.
Very similar to how you might imagine using a database for a website, just at a smaller scale. The project that inspired this work was a water metering project in Kelowna, BC. Basically, a friend of mine took a bunch of micro-controllers with soil moisture sensors, shoved them in the ground, and tried to come up with better demand-based watering schedules. It took them two months to develop the data management code, and knew it could have been days or hours with a proper database.
(I have an ulterior motive for asking, which is that I will probably end up shoplifting this code and embedding it into the next set of levels for our CTF game, which is serverside-emulated AVR).
EDIT: Please let me know how/if you end up using it! Or if you run into bugs! :)
So rather than 1M soil moisture sensors continuously sending records to a central repo, you're giving the mesh network something exciting to do by distributing "select * from db where moisture<200" or whatever then collating all the responses, if any...
There's a classic crypto thought experiment illustrating SIMD for cracking keys where the Chinese distribute 100M boom boxes (well, its an old thought experiment) and they randomly test keys and when the red light turns on indicating factor found, the owner of the radio turns it in for a substantial reward. And thats how SIMD key cracking gets a 27 bit parallelism speedup over one box. In the IoT era I assume its going to be a normal thing for lightbulbs and toasters to get powned to mine bitcoins and the like.
You could probably extend both the real story and the old crypto thought experiment to help provide some docs. Then again I read your unit tests and they kind of document the system pretty well. Nice tests.
In true supercomputer fashion, by going SIMD you've taken a formerly CPU bound problem, and turned it into an IO bound problem, but for little microcontrollers not classic supercomputer hardware, which I thought was pretty funny.
Love the name, too, haha.
Note: How much of a need is there for a SQL database on 8-bitters, etc? Can't one do that in a front-end at the client side and just have simple commands sent over network to device? What I always did for limited or security-critical devices. No way I'd put a whole 4GL on them lol.
As for the name, I cannot take credit. There is a good story behind it though. ;)
As for the need, I sort of explained the motivation this comment: https://news.ycombinator.com/item?id=10622675. Data management would drastically reduce the development effort associated with data-intensive applications for smaller devices, in an IoT setting or otherwise. I've actually recently travelled to the University of Michigan to talk about this work, and got lots of good feedback. One of the professors invited me and has asked me to study with him, because he feels there is a need. One of his graduate students there is working on integrating LittleD into some of his work already!
Is there a public copy of your paper for people here to read? Might increase interest and contributions.
I actually have a lot of thoughts of ways to improve this too, which is why I am applying to do a PhD in this area.
(In other words, how big is SQLite and what are the size restrictions for IoT devices – I haven't built one before)
For example, https://www.sqlite.org/about.html claims that SQLite can run in "very little heap (100KiB)"; this project targets an Atmel ATmega2560 which has only 8KiB of RAM.
(I doubt "5 times smaller than SQLite" is a useful comparison; the limiting factor is probably RAM consumption, as opposed to executable code which can be stored in cheaper non-volatile memory.)
(Submitted title was "LittleD – SQL database for IoT, 5 times smaller than SQLite").
define a few C structs with standard read() write() function call are most likely be enough and it is easy to design/test/debug.
One has a lot more control on RAM/ROM/Code space with that approach. You can easily find out where/how every single byte/bit of memory is used.
The corresponding paper  claims "SQLite requires at least 200 KB of code space and cannot run on popular platforms such as the Arduino." For comparison, Arduinos based on ATmega microcontrollers only offer up to 256 KB of flash memory at maximum.
I have spent a lot of time with it on ARM Cortex-M-class devices on small RTOSes or even bare metal, and you can adapt it to almost anything by plugging in new VFS interfaces (http://www.sqlite.org/vfs.html).
It does require at least 10s of kb of RAM as a heap to do pretty much anything, as others have stated.
It's rather unusual by today's standards. Certainly quite different from an ARM.
But that's not the biggest problem. SQLite, like most programs, relies on having a C library like glibc available. This is pretty much the intermediary person to the operating system - you can ask it for memory, it will allocate memory for you, you can ask it to open a file, it will facilitate that. Most programs assume it's always present and so the dependency isn't properly abstracted. This is a problem when you don't have an operating system.
Notice also that you can't just go and implement or stub everything that glibc does. To do that, you would just end up having to add an operating system.
(This particular rabbit hole goes much deeper. A lot of programs sadly don't depend on a C library, they depend on glibc in particular. This causes massive problems for systems that run Linux yet can't accomodate glibc, like OpenWRT. They use an alternative C library that works fine 99% of the time, and for the remaining 1% it will cause programs to silently misbehave or crash because of an implicit dependency on whatever glibc does. I had this problem with the hwclock utility which started segfaulting when OpenWRT switched to musl libc)
SQLite itself requires at least 100KiB of Heap space, sometimes many times more memory than already available: http://www.sqlite.org/about.html
Granted. That was 10 years ago, but it was also quite a beefy device for its time. So while naturally the devices have become quite more powerful in the last 10 years, current IoT devices are also much more compact than the old ck1.
As such I would say it's conceivable that you don't have the memory to easily run current SQLite on any of these devices.
SQLite is probably on of the most well tested piece of software on this globe. I do not know if this space saving is worth it.
EDIT to add two links by Ganssle to explain it better:
I think Microchip's 8-bitters doing $1+ billion in business with better profits and dividends vs troubles of many in 32-64 bit markets says a lot.
Bet you didn't know they were even still around, eh? Get's better:
With manuals and Verilog included. :)
The basic computer has so little memory (~3KB total for code/data IIRC) to make anything sophisticated impossible, but there are extensions that keep the thing much smaller than a typical embedded device (like orders of magnitude smaller) but providing enough for there to be interesting work.
My last idea was several hundred 8-bit processors in a multi-core config that could handle many data-processing problems. Turns out it wasn't entirely original:
Doubted about its marketability but supports my assertion these little CPU's & such can go much further on modern nodes. Just gotta experiment. Your work is another example.
Interesting details on how 8-bit companies are doing that from John Donovan the year before:
We'd need a price list for the other thing. Can you get 32's with decent memory & low watts for a few bucks a CPU in low volume? Aside from that, you still think they're dead and useless with the new articles?
I'll add that many chips from old nodes survived for decades due to stability. The new process nodes have all kinds of issues and break faster. So, there is that to consider if the application is long-term, safety-critical, or security-critical. Many in high assurance design stayed with old stuff (esp on SOI) because manufacturing bugs were less and interference/wear issues are lower.
 http://bit.ly/1NbLYov redirects to digikey search.
Also, thanks for the mention of Cortex-R as I hadn't heard of it. A quick glance at their description looks good. About $8 on low end to $50 on high-end. Getting quite cheap indeed for what 8-bits would be required for. Way cheaper than the old champ (RCA 1802 MCU) is today ($150 w/ 1k units).
Ok. So, I'll probably not have to go to 8-bit or drop large $$$ if I take-on certain projects. Good to know. Dirt cheap chips with large ecosystem, lock-step, real-time, and MPU's... it's like the golden era in tech for embedded start-ups, eh? :)
My argument is that the economic cost of production between 8 and 32 bit MCUs is negligible. There is a _need_ for market segmentation, etc. But the cost of the package makes up for the cost in gates. It really is a wash. Much of what we still see in 8 bit designs is because of (no insult intended) 8 bit designers. I love me some 8 bit RISC, but AVR is having to fully embrace ARM even when it has its own AVR32 ISA.
I don't think they are dead and useless, but they are definitely on the out for most designs. Totally agree on old stable designs. Engineering tolerances are "better" now. Something to be said for the Brooklyn Bridge and DC3s.
Cheap and low volume? Yes.
I saw that. They still had almost 10% of a huge market, though. My point was Ganssle's points of them not gone for most cost-sensitive markets seemed still accurate.
"My argument is that the economic cost of production between 8 and 32 bit MCUs is negligible. There is a _need_ for market segmentation, etc. But the cost of the package makes up for the cost in gates."
That does seem to be true. Especially package vs gates.
"I don't think they are dead and useless, but they are definitely on the out for most designs."
I won't argue there. 32-bit space has improved significantly on cost and efficiency. Dropping down process nodes helped a lot, there. ;)