I want to clarify a couple of things. I'm not saying that Paul Graham invented this pattern. Actually after he mentioned it, I remembered a friend of my father to implement exactly that in QUICKBASIC in the late 80s :-) The point is that maybe the Redis design was already inside me, but I needed a trigger: I often think of good things after being triggered. And smart people are more likely to tell about good ideas, old and new. That was the point. Similarly I believe there are a lot of simple fundamental ideas that can be re-applied to today's technology, as the landscape changes many things become relevant again.
I'll just take the opportunity to say how grateful I am that the idea of inventing Redis struck you -– regardless of how it originated. I use it all the time, both professionally and in my free time. Awesome piece of software. An idea is worthless by itself, execution is everything.
Just the documentation to be honest. The basic functions of redis are quite simple to learn and use either using the redis-cli client or a language binding. Basically you PUT keyname value, and GET keyname to retrieve value. There's a ton of additional features and types of structures but the basic use of it is as a key/value store.
Let's say you get an idea—or, as Pooh would more accurately say, it gets you. Where did it come from? From this something, which came from that something? If you are able to trace it all the way back to its source, you will discover that it came from Nothing. And chances are, the greater the idea, the more directly it came from there. "A stroke of genius! Completely unheard of! A revolutionary new approach!" Practically everyone has gotten some sort of an idea like that sometime, most likely after a sound sleep when everything was so clear and filled with Nothing that an Idea suddenly appeared in it.
> I quite like the image of an idea floating somewhere near me so that I can 'think at' it!
For languages like Italian (as well as my own native language), the image of 'thinking at' makes sense as an analogue to 'looking at', which you'd indeed do if the idea were floating by in your vicinity.
I think the closest English analogue is 'think about', which taken extremely literally does also place the thinker _about_ (in the vicinity of, around) the idea.
Man, some of the comments on here are really disheartening. What's the deal with trying to humble people with long-winded, tangential counterpoints, gotchas, and condescending questions? Constantly proving one's intellect seems to be a prevailing theme on HN and I'm not seeing how it adds to the quality of the content. I'm sure someone will unearth the irony in this and let me know soon enough.
Hmm, I thought this pattern was really common? That is, appending everything to a file and reading back from the file when there's a reboot. I constantly use it when a database (or redis for that matter) is simply overkill for my use case.
Here's a 34 line implementation I use on a node production system. It writes struct-like JavaScript objects that represents events to disk. When reading it back I do a fold (or .reduce) to build the state.
And yes –– it could be way smarter (writing to memory and disk), but YAGNI has been working out pretty well so far.
It's pretty common. All the wonderful proprietary file formats from the 90s (and probably before, and probably after) basically boil down to writing raw C structs to disk.
You can try it yourself... mmap a file, memcpy some structs there, do the reverse ... and enjoy!
(Obviously depending on the memory layout of one C compiler on one architecture does not make for portable files. But that was never a design goal of this system.)
I've been using (and developing a fork of) NeDB [0] that does exactly what you describe: an in-memory database with append-only logs of changes for file-system persistence.
On startup, and optionally at regular intervals, it "compacts" the database by reducing all events to a single JSON string.
The README links to an article by @antirez, Redis Persistence Demystified [1]. It's been educational studying how it works.
To offer a slightly alternative perpective on this, I actually think this type of "article"/("listicle"/"tweetacle"?) can have negative effects. In my mind, it lends credence to the very toxic notion of "value of ideas" over "value of execution".
The former is something that has all sorts of knock-on effects: backward IP laws "protecting" ideas, perverse incentives within large corporations with outspoken "idea men" being promoted ahead of doers, non-technical founders with "high-potential ideas" sucking up investment and expecting to execute with technical hires on untested theories.
I'm not saying any of the above applies in this case of course, but the fact is that it is you who built Redis, not pg, nor many others who've had similar ideas, and I think writing the above tweet thread lends undue weight to many of above negative trends in our industry (and also in general in recorded history of IP/invention-credit battles).
It's a matter of interpretation. IMHO the tweet shows how valuable Hacker News itself is, not Paul Graham ideas (but then HN was created by PG, so, yep, also gives credits). If you take a number of people that are good at doing things and put them together, this will result in more things created because of a natural process of ideas exchanges / triggering. I'm a example of a very isolated programmer, so this applies especially to folks in my condition, but at this point I guess there are quite a bit of "us".
> In my mind, it lends credence to the very toxic notion of "value of ideas" over "value of execution".
Ideas are required for execution; they don't have independent value (idea without execution delivers nothing), but neither does execution (you have tohave something to execute.)
Of course. I didn't say otherwise: what I'm talking about is value. Ideas are often (usually?) assigned greater value than execution itself, which is absurd.
In actual fact, even beyond "inception", most execution requires ongoing iteration and innovation. No final product is solely the result of its inspiration.
We might have biased opinion on the subject at least those of us heavily involved on execution side. Execution is much more subject to available resources.
Irrespective of what the perception of pg is, the tweet was pretty clear that pg only "inspired" or "triggered" the creation of Redis. That gives pg no more more credibility than the moon gets for inspiring Van Gogh to paint Starry Night.
Attributing inspiration takes nothing away from an artist, nor from the work. I don't event think it meaningfully changes the credibility of the inspiring object.
I could randomly paraphrase and quote lines from the Art of Computer Programming, for instance, and if I had a wide enough audience I'd probably bring about a Great Renaissance in the field of computer science and programming. History might remember as the Greatest Idea Man of all time, but that seems unlikely.
“Look at how awesome I am for inventing Redis” wouldn’t have been as interesting, though.
Ideas are cheap and plentiful if you have the eye for them, and learning the sources of inspiration other makers had can be informative if you also want to make things.
> In my mind, it lends credence to the very toxic notion of "value of ideas" over "value of execution".
On the spectrum on sh#t hn believes, this one, along with meritocracy has got to go. No one is arguing that ideas have _more_ value than execution. But ideas clearly have value, we entire buildings devoted to them. The internet was built to transport them. Getting exposure to them is deemed critical for the development of our young and the future of the humanity.
You might have some points here (I certainly agree about meritocracy), but the examples are pretty bizarre. How does building devotion prove inherent value? The internet was built to transport information: knowledge, education, documentation, history, insight. "Ideas" are none of these things: ideas are inspirational and transient in nature.
(Sure, the internet's communication protocols do transmit "ideas" in numerous ways: mainly in terms of shared experience and evolving iterative conclusions from shared knowledge, but it's hardly the intent of its creation).
Getting exposure to past knowledge and experience is critical to our young, and the future of humanity, but—frankly—the glorified fantasy surrounding some popular histories does more to further the idea of the privileged "celebrated few" with the genius of inspiration, than to instill any sense of the true study, investigation, work put in by those who have achieved great things in the past. (think even old storiea like Archimedes' bath or Edison's bulb or Newton's apple or whichever more modern example— placing individuals on sort of divine pedestals to be blessed with such ideas).
You're calling out meritocracy, but it's exactly this kind of championing of ideas that leads to it. Salvatore has put the long, quiet, sometimes relatively thankless hours into making this a reality, and it could seem to some that pg is being put on a pedestal for being momentarily inspirational: is that not the very thing you're calling out?
> pg is being put on a pedestal being momentarily inspirational
I am certainly not doing that. I don't think what happened wrt pg and antirez even qualifies as an idea or inspiration. Chain of events? Catalyst? Memcached was already being used like a data structure server before Redis came along. Redis did put deep thought into the design which is why it works so well for so many applications.
Ideas in and of themselves have value and are needed. Really good ones take a long time to create and hone. Edison's bulb is a great example of brute force. Newton's Apple always seemed like a creation myth. History goes a lot deeper than pull quotes.
“System prevalence[1] is a simple software architectural pattern that combines system images (snapshots) and transaction journaling to provide speed, performance scalability, transparent persistence and transparent live mirroring of computer system state.“ — https://en.m.wikipedia.org/wiki/System_prevalence
PG was also involved in the inception of Reddit: It was PG who gave Alexis and Steve the idea to make something like reddit, and also gave them the tagline "the front page of the internet".[0]
PG had vetoed their initial idea to create a food-delivery app and then called them back and asked them to come up with something new.
[0]: https://www.youtube.com/watch?v=5rZ8f3Bx6Po
Unisys/Burroughs mainframes running COBOL ca. 1992 (and probably earlier) had this as a feature by default. You could walk up to one of these boxes right in the middle of a scary finance, payroll, whatever job, yank the cord out (not recommended), plug it back in, and the machine would boot up and return to what it was doing, usually with no ill effect.
Despite switching to a dozen new fad languages since then, programmers have yet to get around to replicating this in any broadly-adopted modern system. If you want to put the effort in to engineering it yourself, of course, you can. But it was nice to not have to engineer it at all.
When greybeards seem cranky, it's because of stuff like that.
If I recall correctly, Emacs currently does this. During the build it (slowly) loads all of the elisp into memory and then dumps out the memory image of it after it’s all been compiled.
Some lisps do this. like sbcl's sb-ext:save-lisp-and-die function is a "quit" that also persists the memory state. When you load this image later, you get it as it was when you "quit" last time.
"Actually, I wonder why we dont have yet a programming language and runtime which, after a shutdown, reload exactly like it was."
It's been tried many times, but you die the death of a million cuts. (Not just a thousand.) After you're shutdown, the world moves on, and then you get restarted. Now you're hardware has changed, your network connections have all changed, your version may have changed so the stored data may be all different, and perhaps surprisingly, worst of all, once something gets corrupted, it's corrupted forever. No "reboot" for you.
It turns out that "reboot" step is inconvenient in the short term, but in the long term, enforced a minimum amount of discipline on programmers to make sure they don't get into an unrecoverable state.
You'll note that if anything, the trend continues in that direction. All the recent operational work lately, in Docker, in things like ansible and chef and puppet, in "serverless", in reproducible builds for binaries, etc. can all be read through the lens of taking things that were previously images of unknown provenance, and ensuring that we can always rebuild them from an initial definition. We're in fact headed even farther away from image-based systems that reload in their previous state.
What strategy do you use to migrate old data to new codebase? Different business logic would probably want different answers in this regard, so I doubt that there's a one size fits all solution.
I use it by default for new Node.js projects (most of which are experiments but some end up in production, and have been running without problems for years).
It's a simple pattern to implement. To make it a bit easier to use repeatedly, I've got a small helper class called JournaledCollection. You pass it serialize+deserialize callbacks for your item type, and it takes care of persistence in event logs.
For a while I was thinking about releasing my helpers as a project called LAUF, short for "Lame-Ass Un-Framework". Then one could say: "Most of my projects are LAUFable, I don't need anything more serious." (Awful dad jokes are a solid reason to publish open source, right?)
Never got around to it though, but if you're interested, I could put together an example.
When the application would be restarted, reading back the log would recreate the in memory data structures. I thought that it was cool, and that databases themselves could be that way instead of using the programming language data structures without a networked API.
That is literally how databases work. In Memory + WAL + Data Files on disk. You could, in theory, live without the Data Files and just a big WAL.
Relational databases (except maybe MemQL) treat the file system as the source of truth. And usually the file system is bigger then the memory so they need to constantly update their cache with more relevant data.
Redis loads everything to memory. And doesn't keep the structure in the file system, only the log, recreating it from log+snapshot.
Except that most databases don't store their data in anything resembling "programming language data structures". You get tables, rows, and columns (or maybe a bit of JSON if you're lucky) instead of native integers, strings, lists, sets, and dictionaries.
The primary purpose of an ORM is to overcome the "impedance mismatch" between relational databases and programming language data structures. There's no need for an ORM if you can store your data structures directly in the database.
> ... if you can store your data structures directly in the database.
ABSTRACT. Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). It provides a means of describing data with its natural structure only—that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other.
E.F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377-387.
You propose to reintroduce a problem that they absolutely wanted to get rid off 40 years ago. Just imagine that you first have to figure out how to painstakingly parse serialized Python dictionaries before you can access the data in another program written in e.g. Rust.
It clearly amounts to UNSOLVING a problem that is now SOLVED ALREADY.
Well, Redis allows me to store a JavaScript array, add some more with a Ruby client, remove some items with a PHP client, and finally read it back as a Python list just fine. What's the problem that has been unsolved? :)
https://msgpack.org is absolutely fine. I wasn't criticizing msgpack. You can also serialize the bytes of an array of C structs, if you see what I mean.
If this is a primary purpose of an ORM, then I wish I knew of one that isn't utterly failing at that.
One thing I've learned about this "impedance mismatch" is that it isn't a syntax thing, it's a fundamental difference in the way of viewing the world. The way you store data about the world is different from the way you model that world dynamically, with objects. I find it safer to always split out the "business model" from the storage layer, so that those different views don't interfere - and once you do that, you may as well implement the storage layer in a relational way.
.. and the code you implement to connect that business layer to that storage layer is an ORM.
The idea that the ORM forces the storage layer to a particular representation of the business layer is only true if they implement the ActiveRecord pattern, which isn't universal.
Other way to say it is that ORM is a workaround to the fact most languages are VERY poor at manipulate data.
Exist 2 main reasons for the "impedance mismatch":
- Paradigms. 2 different paradigms will be at odds. Example: Functional and OO. This is ok.
- Limitations: The relational model is absolutely superior and more expressive at manipulate data than OO/Functional. You need A LOT of machinery to recover that power.
This is not ok.
However, this not change the fact that OO is ok.Similar how a KV store is fine, but certainly, a RDBMS store is much more capable.
> Other way to say it is that ORM is a workaround to the fact most languages are VERY poor at manipulate data.
This is why I love Clojure (and a particular style of Javascript). Destructuring and a good library of object/array manipulation functions make an environment well-suited to transforming data structures (which is precisely what I want to do in most programs I write), and I find I do not need an ORM where this type of data-focused programming is supported.
You can say OO/Functional lean more to the "Algorithms" side but RM to the "Data Structures". OO/Functional not say much how operate on data, most is a exercise for the reader.
Instead, RM give a clear answer and defined operations for that.
You need to "spice up" things to make the one be more useful to the other part of the equation. RM, too pure, certainly is incredible limited, not even can print "hello world!", but that is just too give a solution about how transform data...
I think you're slightly missing the point -- for me, a certain unique "awesomeness" lies in specifically being able to literally pipe stdout back to stdin and get back the exact same data structures.
Of course Erlang has had ETS (and of course you can use gen_servers of various kinds for this) built in forever. Redis is fantastic but I think there have been many examples of prior art before this tweet!
Perl's `Storable` springs to mind here - I know many places who have mini-"databases" that are effectively straight dumps of Perl variables to disk that get loaded in, worked on, then saved back out.
I guess Smalltalk's images are the ur-example here?
I was talking to a database teacher, and I tried arguing about ACID on an in memory database, arguing that there was a minimal window for database corruption in a transaction system, since the worst case scenario would seem to be a loss of very few transaction.
He was not really listening to what I was saying or my arguments, because a system like redis seems like a more than acceptable compromise.
It still seems a few people are reluctant to an in-memory database.
VoltDB is a probably a better example of an in-memory database with ACID semantics. Redis usually won't be deployed in a way where it fsyncs after each write operation because it runs too slow. You need to do some tricks like batching fsyncs to get decent performance and I don't think Redis has support for this. However, SSD fsync performance seems very high. I remember benchmarking an EC2 i3 instance that was giving 20000 fsync/s but if redis is giving you like 80k writes/s then even fsync this fast is going to be a bottleneck. [https://redis.io/topics/benchmarks]
I think people who are deploying redis are willing to tolerate lower durability guarantees for the extra performance. A lot of the time redis is some kind of cache and there is a way of reconstructing the real data from more durable storage in the case of failure. Or people are willing to lose a second of data or whatever fsync interval people are using.
I've had plenty of good ideas that I haven't bothered to implement because I have no need for the solution. That doesn't mean if someone else implements the idea that I'll have hindsight bias about its value. It means it wasn't valuable enough to me to direct my behavior.
What I am saying is that this idea is totally obvious, but until somebody implements it to show it works, nobody dares to try it because it sounds like a bad idea.