
GNU Recutils - carlesfe
https://labs.tomasino.org/gnu-recutils/
======
tptacek
This is really neat. But before you integrate it into anything else you build,
consider that it appears to be a very old blob of C code. It took afl-fuzz
something like 2 minutes to start finding wild free crashes†. This seems like
a worthwhile thing to reimplement in a higher-level language.

† _I don 't know, maybe they've already done this work and ruled out anything
bad from the dozens of unique crashes AFL trivially finds? I wouldn't want to
pretend that I've done any serious inspection here._

~~~
Qasaur
I give it a couple of months before it is rewritten in Rust, just like bat and
ripgrep.

~~~
mongol
There is something here:
[https://github.com/aisamanra/rrecutils](https://github.com/aisamanra/rrecutils)

------
jamestomasino
Blog author here. Glad to see someone submitted this and you all liked it.

Recutils has MUCH more to offer beyond the basic intro I gave here. It has
wonderful org-mode integration for you emacs people.

Here's a recfile of my read books for reference. I generated this from my
Goodreads export csv and a few recutils calls:
[https://ttm.sh/Equ.rec](https://ttm.sh/Equ.rec)

~~~
anthk
A tip: on accented characters such as Spanish names:

    
    
        setxkbmap us -option compose:rwin &
    

Then just press [Right windows key] + [' ] , [a] in order to type in an 'á'.

~~~
andrepd
You can also use an international layout that uses AltGr

~~~
anthk
I forgot the "us" switch. Now it will work fine.

------
ymse
Recutils is really handy when coupled with command-line tools that return
structured data.

`guix search`[^] outputs data in recutils format, so if you are searching for
a database driver for Python, but want to filter out "python2" variants, and
ignore uninteresting fields such as versions or dependencies, you can do:

    
    
      $ guix search python mysql | recsel -q 'python-' -p name,synopsis,homepage
      name: python-mysqlclient
      synopsis: MySQLdb is an interface to the popular MySQL database server for Python
      homepage: https://github.com/PyMySQL/mysqlclient-python
    
      name: python-pymysql
      synopsis: Pure-Python MySQL driver
      homepage: https://github.com/PyMySQL/PyMySQL/
    
      name: python-peewee
      synopsis: Small object-relational mapping utility
      homepage: https://github.com/coleifer/peewee/
    

Without recsel, the output is 100 lines long, with lots of duplication between
the Python 2 and 3 variants.

[^] GNU Guix is a package manager that works on top of any GNU/Linux
distribution.

~~~
bjoli
Brilliant! Thanks. I Will start outputting recfile format from now on.

------
rasengan0
Ah the joy of something old as a new discovery! What a great feeling, where
has this been all this time?

OP/blog author: Great post, thank you for sharing your experiences

I've been sloshing in the text soup of Tiddlywiki Sqlite3(csv) VimWiki OrgMode
Mediawiki Freemind(xml) and now it looks like Rec is the next ingredient to
experiment with.

FZF and moreso Ripgrep
[https://github.com/BurntSushi/ripgrep](https://github.com/BurntSushi/ripgrep)
has been really great to add to the mix.

~~~
PandaWhisperer
Inb4 someone makes or mentions a JavaScript-based version of this, which
operates on JSON files.

In fact, you can already do the querying part using
[https://stedolan.github.io/jq/](https://stedolan.github.io/jq/), I believe
you can make modifications with it as well, but a different front end a la
recins/recdel would make that a bit more convenient.

~~~
kragen
Sure, but do you want to write

    
    
        [{"title": "Opening",
          "lede": "What happens when we \"open\" a file?"},
         {"title": "ltrace",
          "lede": "It turns out that ltrace uses the \"ptrace\" system call."}]
    

or

    
    
        title: Opening
        lede: What happens when we "open" a file?
    
        title: ltrace
        lede: It turns out that ltrace uses the "ptrace" system call.
    

? When you get a parsing failure after you edit it (or after a sector on your
SD card gets a read error), which one do you think will be easier to fix?

~~~
chpatrick
There's also jq for yaml:
[https://github.com/kislyuk/yq](https://github.com/kislyuk/yq)

------
bluenose69
In my (oceanographic) research area, some formats are binary and others are
text-based.

Binary tends to be used for big datasets recorded by instruments that are left
in the field, unattended, for months to years. Since every byte counts, these
instruments cram information in very tightly. The binary nature of the files
makes them a pain to deal with, but it also confers an advantage: the files
are very seldom corrupted by a person who thinks they are benignly viewing the
data.

The text files, on the other hand, sometimes get extra junk inserted because
someone in the data analysis pipeline thinks it's OK to look at information in
MSword or MSexcel.

Sometimes, opaque binary data formats are superior, in terms of data
integrity.

I thought of this, while reading about recutils and thinking of a contrast
with sqlite. Recutils looks great, but if I started sharing data in that
format, I bet it wouldn't be long before derived versions of the files had
become corrupted, as someone edited with MSword.

In this forum, people will sniff at people who use MSword, and I have done so,
myself. But, it's a simple fact that some people who are good at one thing are
not good at another. Some of my colleagues who use MSword for every silly
thing (e.g. seminar announcements) are actually very good at their subject
matter (e.g. the science talked about in those seminars).

------
simonw
I've been experimenting recently with YAML for this kind of thing, and it's
working out really well for me so far.

I have a tool called yaml-to-sqlite ( [https://github.com/simonw/yaml-to-
sqlite](https://github.com/simonw/yaml-to-sqlite) ) which converts a YAML file
into a SQLite database, which I can then use with Datasette (
[https://github.com/simonw/datasette](https://github.com/simonw/datasette) )

My biggest project with it so far has been my site [https://www.niche-
museums.com/](https://www.niche-museums.com/) \- a guide to small and niche
museums. The museums themselves live in a single ~100KB YAML file in GitHub:
[https://github.com/simonw/museums/blob/master/museums.yaml](https://github.com/simonw/museums/blob/master/museums.yaml)

I have a CI script which builds that YAML file into a SQLite database and
deploys it + Datasette + custom templates to [https://www.niche-
museums.com/](https://www.niche-museums.com/)

I've been running the site like this for a few months now and I really like
it. I love having my content in source control, I find editing the YAML to be
reasonably pleasant (I even edit it on my iPhone sometimes using the Working
Copy app) and any YAML errors are caught by CI before they are deployed.

~~~
mongol
This looks at least as nice as the recutils format. But I like it better
because there are more tools around to work with yaml.

------
sj4nz
I especially like this idea where you distill data into its most primitive
form. It isn't quite as hairy as using n3-notation, but the ability to just
collect the data without regard to being CSV or SQL gets you to an interesting
space where tracking these changes with GIT works better. I've worked with CSV
with 400+ columns (don't ask it was a horror) and SQL is really too verbose
for these kinds of collections.

GNU Recutils is a nice metalanguage for data. You can churn out CSV from it
which in turn can be \copy loaded into PostgreSQL. Maybe its too many steps,
but I found it an important format for creating the logical model of
representing data without predisposing it to any other technology or encoding.

Unfortunately its org-mode integration with emacs is broken when using
spacemacs-- to far for me to fix as I'm no Elisp wizard.

~~~
juliend2
> tracking these changes with GIT works better

That's actually the main selling point I am seeing here.

Is there are any better format (apart from yaml maybe?) to collaborate on
datasets, that can also immediately be used with code?

------
smabie
Why haven’t I ever encountered this before? Seems cool and better than all the
bespoke config files lying around on any given unix machine (passwd,
hostnames, fstab, etc).

~~~
DoofusOfDeath
I wonder sometimes if, every 1-2 years, I should review the full catalog of
well-maintained libraries and utilities available on Linux / GCC / LLVM /
Python. I forget about many of them because they weren't relevant at the time
I came across them.

Then I forget about that idea 10 minutes later when a new episode of my anime
watchlist gets dubbed. I wonder how much that has cost me.

~~~
ShamelessC
Off topic, but do you have any good anime suggestions? My favorites are
Deathnote, Attack on Titan, Full Metal Alchemist, etc. Trying to find
something as amazing as Deathnote is challenging.

~~~
severine
Planetes:
[https://myanimelist.net/anime/329/Planetes](https://myanimelist.net/anime/329/Planetes)

Paranoia Agent:
[https://myanimelist.net/anime/323/Mousou_Dairinin?q=paranoia](https://myanimelist.net/anime/323/Mousou_Dairinin?q=paranoia)
(this is amazing)

Samurai Champloo:
[https://myanimelist.net/anime/205/Samurai_Champloo](https://myanimelist.net/anime/205/Samurai_Champloo)

And of course, GITS:
[https://myanimelist.net/anime/467/Koukaku_Kidoutai__Stand_Al...](https://myanimelist.net/anime/467/Koukaku_Kidoutai__Stand_Alone_Complex)

~~~
ShamelessC
Thanks I'll check out all of these!

~~~
severine
> _a cat /mouse detective story with well defined rules and Sherlock vs.
> Moriarty levels of intelligence that _actually convince_ you that the
> characters are geniuses_

Ah, then you'll love Monster:
[https://myanimelist.net/anime/19/Monster](https://myanimelist.net/anime/19/Monster)

------
capableweb
Looks really interesting. Could imagine someone can build a open source Notion
alternative with this and offer the Recfiles as export option, would enable
all types of crazy use cases. Add a read/write API and now you can have bots
acting on your knowledge base. Basically MediaWiki but with a long-standing
underlying format and compatibility with some unix tools already.

Does anyone have any more resources (besides the manual of course
[[https://www.gnu.org/software/recutils/manual](https://www.gnu.org/software/recutils/manual)
]) they can recommend about Recutils and Recfiles?

~~~
eitland
Earlier this weekend I was doing experimental performance testing of an idea I
have for a wiki solution[0].

This, or SQLite could be really useful to embed data into pages.

[0]: yeah, I've had to dig beneath the surface of Confluence lately and
Confluence has this weird property of immediately making me motivated to try
writing a decent wiki

~~~
cpach
FWIW, if you want a ready-made solution, wiki.js might be worth looking in to.

~~~
eitland
Looks really nice except agpl license which I _used_ to think meant that any
integration towards any existing solution.

At this point I've recently read someone from FSF(?) saying this is only
MongoDB and others misinterpretation so at this point I am just utterly
confused.

Also, when somebody uses AGPL that usually means: we found the scariest
license we could find while still calling it open source, _but, we have a
commercial license to sell you._

However I couldn't find any licensing option. Does this mean it isn't just a
way to sell commercial licenses?

I'm completely honest here. I really don't get it, but then again it took a
while before I really understood the GPL as well so I'm ready to be
enlightened :-)

~~~
cpach
TBH I hadn’t noticed that it’s licensed under the AGPL. I haven’t studied that
license closely but yeah, I guess for many companies it’s considered a
liability.

~~~
eitland
Thanks anyway!

I went ahead to study the FSF FAQ but they don't really answer it completely
as far as I can see. The clear cut answers are:

\- if you combine your program with an AGPL program then your program has to
become AGPL as well, just like the GPL.

\- if you use an AGPL program unmodified it doesn't seem like you have to
distribute kt to users who use it over the network

But as far as I can see the FAQ doesn't say what happens if the AGPL program
reaches out to other applications to get data. For some reason I always though
anything that was touched by the AGPL program, either over the network or
otherwise would have to become AGPL.

If that isn't the case - and there is more and more to suggest that, then I
think FSF should point that out clearly.

------
smartmic
Just for answering how and where it could be used: GNU Guix uses recutils
format to display search results and more, for example use the recsel command
to select sessions of interest

    
    
       $ sudo guix processes | \
        recsel -p ClientPID,ClientCommand -e 'LockHeld ~ "perl"'
        ClientPID: 19419
        ClientCommand: cuirass --cache-directory /var/cache/cuirass …

~~~
qbaqbaqba
That's awesome!

------
gavinray
This is so interesting, how is it that nobody seems to have heard of this?

There are so many usecases where this would fit much better than traditional
approaches.

Thank you for sharing!

------
fooblat
I think this gets really interesting when combined with the bash builtin that
supports reading records into variables[0].

    
    
        recsel contacts.rec | while readrec
        do
           if [ $Checked = "no" ]
           then
              mail -s "You are being checked." ${Email[0]} < email.txt
              recset -e "Email = '$Email'" -f Checked -S yes contacts.rec
              sleep 1
           fi
        done
    

0\. [https://www.gnu.org/software/recutils/manual/Bash-
Builtins.h...](https://www.gnu.org/software/recutils/manual/Bash-
Builtins.html#Bash-Builtins)

------
ctz
> and even field-level crypto

Like you might expect, this is quite poor. CRC32 for authenticity, mac-then-
encrypt, no binding between keys and values, using low-entropy passwords
directly as AES keys, and a fairly trivial looking read overflow in the
decrypt function. That's just two minutes looking at one source file.

------
mongol
I like the idea. But I love sqlite. I think the perfect match of this idea and
sqlite is a serialization format to and from Sqlite, in a nice text format. It
could look something like this, but probably closer to the "line mode" that
the sqlite3 tool implements.

------
zmix
I never used 'recutils' before, so, before I start investigating, did anyone
try to pair this with Markdown? Could this be made an extension to Markdown,
so I can encode data fields within markdown?

~~~
fictorial
I had an idea a while back to to this end. Markdown files as a database of
sorts with queries, validations, actions/plugins, etc. written in Javascript.
I documented the interfaces but haven't had time to implement it.

[https://github.com/fictorial/gg](https://github.com/fictorial/gg)

[https://github.com/fictorial/gg/tree/master/doc](https://github.com/fictorial/gg/tree/master/doc)

------
amelius
Are some/most of the operations O(n)? How practical is this database then?

~~~
capableweb
The purpose behind the software (from the authors point of view) can be found
here:
[http://www.gnu.org/software/recutils/manual/Purpose.html#Pur...](http://www.gnu.org/software/recutils/manual/Purpose.html#Purpose)

------
mark_l_watson
That is really cool.

An alternative is using JSON files with tools like jq, but Recutils looks much
more powerful.

EDIT: nice, Python bindings [https://github.com/maninya/python-
recutils](https://github.com/maninya/python-recutils)

~~~
rolandog
I think the most current version is this one [0]; I submitted a bug report to
request the GitHub repository be updated to point to the Savannah repository.
Also, the Python bindings are here [1]

[0]
[https://savannah.gnu.org/projects/recutils/](https://savannah.gnu.org/projects/recutils/)

[1]
[https://git.savannah.gnu.org/cgit/recutils.git/tree/python](https://git.savannah.gnu.org/cgit/recutils.git/tree/python)

------
JdeBP
I do not think that there was ever a period when "the only option for
computing for quite a while" was text files. It certainly wasn't the 1960s,
1970s, or 1980s. dBase databases were not text files, for just one example.

~~~
jamestomasino
I was refering to plain text as the interface to the machine, not the file
formats. When your primary interface is through text, it makes sense that good
tooling would develop.

------
ahnick
So what I don't understand about recutils is in what category of problems does
it really excel at over other solutions?

If I want human readable files I think I would opt for just a bunch of
markdown files in a directory structure before I go the recutils route. If I
want a lightweight database I think I would go for SQLite like others have
already mentioned. In what situation do you really run into that you need
human readable/editable referential integrity?

~~~
yuonotthat
It's like markdown, but for data.

Just like markdown provides a quick way in that to renderize something pretty,
but is still readable in plain text, recfiles do the same for data, it has
data types, keys, integrity, and is easily queryable with its tools, and also
easily read in plain text.

Sure, you could use some type CSV, but that is not pretty to read as recfiles.
The point is the same as markdown, usable as just text, but with good tooling
around it.

with markdown files in a dirs, you would have to provide you own tooling.

~~~
ahnick
Okay, but markdown is data (perhaps not well structured, but data nonetheless)
+ formatting. If I'm editing and going into the files directly anyway why do I
need to be able to do queries? I guess I'm asking what's the point? Can't I
find the info I want just using find and grep on a bunch of markdown files?
Maybe I don't even have to do that if I've organized my markdown files
sufficiently well.

Let me be clear, I get that you don't have db operations with a bunch of
markdown files with a directory structure, but on small scale data
repositories do you really need that? For example, if I have a bunch of
recipes in my "personal database of markdown files" I can quickly find the
chili recipe I'm looking for by going to the directory "Soups & Salads > Chili
> Slow Cooker Chili Recipe" or something along those lines. Or I could have
grepped for "slow cooker chili". Either way I'm going to find that recipe with
no extra tooling on my part.

Where do rec files features add value? Plus, if I use rec files now it seems I
have to build my own formatting because I can't rely on markdown editors to
build it automatically for me. Or is there a way to specify formatting for rec
files?

~~~
GoblinSlayer
Recfile is for tabular data, markdown is for text documents.

~~~
rolandog
Agreed. A Recfiles is to CSV what YAML is to JSON (and what MarkDown is to
HTML).

------
tangus
They made the multiline field value format incompatible with the
email/HTTP/etc headers one...

Why?

------
pasokan
In the 90's there was rdb aka nosql. It was packaged in debian too at that
time. I was a happy user. Then lost track of it. Thanks to this post I tracked
it down again

[http://www.strozzi.it/cgi-
bin/CSA/tw7/I/en_US/nosql/Home%20P...](http://www.strozzi.it/cgi-
bin/CSA/tw7/I/en_US/nosql/Home%20Page)

------
herdrick

        your database is a human-readable text file that you can grep/awk/sed freely, and a line-oriented structure makes it perfect for version control systems. 
    

But in fact it's not great for grep/awk/sed/, etc. For those tools to work
well, you'll find you need to keep each record on its own line.

~~~
dredmorbius
Not for awk:

    
    
        BEGIN {
            RS=""
            FS="\n"
        }
    
        {
            for( i=1; i<=NF; i++ ) {
            nfields = split($i, afields, ": ")
            if( nfields != 2 ) {printf( "Bad fields count: %s %s %s\n", NR, i, $i) | "/dev/stderr"; exit 1 )}
            field(afield[1]) = afield[2]
            }
        }
    

(Untested, but should be generally accurate.)

That parses _records_ based on blank lines, into _fields_ based on lines,
splitting the fields into individual data recoreds based on ": " as a regex.

For more, see the GNU Awk User's Guide:

[https://www.gnu.org/software/gawk/manual/html_node/Multiple-...](https://www.gnu.org/software/gawk/manual/html_node/Multiple-
Line.html#Multiple-Line)

------
frafra
Previous discussion (September 2017): "Recutils – Tools and libraries to
access plain text databases called Recfiles (gnu.org)"
[https://news.ycombinator.com/item?id=15302035](https://news.ycombinator.com/item?id=15302035)

------
jhoechtl
Is recutils available as a library? Are there language bindings? Specifically
asking for Go support.

Never heard of recutils before but the on-disk format looks compelling. There
has been limited (no?) takeup however which makes me fear recutils are
intrinsically broken

~~~
jamestomasino
Yes, recutils comes in 3 forms: a c library, command line utilities, and an
org-mode plugin.

------
sundarurfriend
> (yes, the offical package image is two gay turtles)

It's weird that the FAQ page linked there [1] doesn't seem to link back in any
way to recutils' own page. The only way to get there [2] seems to be to click
on the "Software" header link and Ctrl-F for recutils.

[1]
[https://www.gnu.org/software/recutils/faq.html#whyturtles](https://www.gnu.org/software/recutils/faq.html#whyturtles)
[2]
[https://www.gnu.org/software/recutils/](https://www.gnu.org/software/recutils/)

------
layoutIfNeeded
Does it support values with newlines?

~~~
jamestomasino
Yes, you can do \n, or a literal newline and start the next line with a +. The
second technique is wonderfully readable.

------
haolez
Is there a GUI for viewing and editing recutils files? Or at least a generic
GUI that's easy enough to add new backends?

~~~
GordonS
Given this is human-readable, I'm curious to learn what you're looking for in
a specialised GUI that any text editor wouldn't provide? (vim, nano, notepad++
VS Code, whatever)

~~~
haolez
I'm thinking on using this for me, but being able to interact with non-
programmers used to Excel and similar GUIs. But I enjoy working only with the
text formats :)

~~~
GordonS
Ah, that makes sense now!

I was so focused on "human readable" == "text editor" that I hadn't thought
about GUIs in the more "G" (graphical) sense.

------
jsilence
With the ubiquity and general awesomeness of sqlite I see no real reason
usecase for these toolset over sqlite.

~~~
rabidrat
sqlite databases are binary blobs which don't have meaningful diffs if you
check them into source control.

~~~
typon
I wrote a SQLite <-> Json converter for this reason. Ended up coming up with a
weird convention in the Json to represent relations. It might be cool to
instead write a recfiles <-> SQLite converter. I want to use SQLite from
within my app, but want to check in readable data to a source control system.

~~~
svnpenn
[https://sqlite.org/cli.html#csv_export](https://sqlite.org/cli.html#csv_export)

~~~
typon
The CSV format is really ugly - i barely consider it human readable.

------
mongol
I recall a type of XML-representation that looked like this, i.e. was line
based. And there was a Python (?) tool to convert XML to and from the line
based format. Would be nice to compare with recutils but I can't find it now

~~~
msla
Sounds like xml2, which is in the Debian and Ubuntu repos:

[https://web.archive.org/web/20160510001507/http://dan.egnor....](https://web.archive.org/web/20160510001507/http://dan.egnor.name/xml2/ref)

[https://web.archive.org/web/20100113205649/http://dan.egnor....](https://web.archive.org/web/20100113205649/http://dan.egnor.name/xml2/examples)

[https://packages.debian.org/search?keywords=xml2](https://packages.debian.org/search?keywords=xml2)

[https://packages.debian.org/sid/xml2](https://packages.debian.org/sid/xml2)

[https://packages.ubuntu.com/focal/utils/xml2](https://packages.ubuntu.com/focal/utils/xml2)

------
OJFord
I'm not sure exactly what I'm imagining or how it would work, but reading this
post I can't help thinking some kind of Wikidata integration could be really
interesting.

------
setheron
I remember at Amazon they open sources something in similar vein but worked
with a bunch of various key value formats.

We used it heavily for triaging into log files and such.

------
kurisukun
Am I the only one which has problems with records insertion?

------
IshKebab
This is cool in a kind of "neat idea" way but please god nobody use this for
anything that anybody else might ever use! I feel like it's almost
irresponsible to tell people that this exists.

