Storing knowledge in a single long plain text file

boomlinde · on May 22, 2024

It feels like you left a chapter or two out. You mention in the citations that "Hierarchies are painless in our system through nested parsers, parser inheritance, parser mixins, and nested measurements." Nothing else in the article gives any hint as to what those things are or how your system implements them except nested measurements. It's unclear at all what a parser is in your system. It is however clear that what you call "parsers" aren't parsers. Is the list of "parsers" a schema definition?

Overall it seems like your ideas would make more sense if you used more widely adopted language to describe it. "Concepts" are records, "measurements" are fields.

breck · on May 22, 2024

> It feels like you left a chapter or two out.

I agree with you. More details will come out over time but I wanted to keep yesterday's paper a single page.

> You mention in the citations that "Hierarchies are painless in our system through nested parsers, parser inheritance, parser mixins, and nested measurements." Nothing else in the article gives any hint as to what those things are or how your system implements them except nested measurements. It's unclear at all what a parser is in your system.

Below is a link to a web IDE we built. You can see parsers (on the left), and concepts (on the right). Nested parsers and parser inheritance are demonstrated. Mixins is not currently in that branch yet. Ignore the "cells" stuff at top (that turned out to be an unneeded division between lines parsers and word parsers).

https://jtree.treenotation.org/designer#url%20https%3A%2F%2F...

> Overall it seems like your ideas would make more sense if you used more widely adopted language to describe it. "Concepts" are records, "measurements" are fields.

Yes, concepts often map to records or rows. Measures to fields or columns. Measurements to the cells in a spreadsheet.

There are reasons for my terminology, that should become clearer over time.

the_duke · on May 22, 2024

From a quick scan, it sounds like you re-invented a lot of the concepts of semantic data, just with different terminology and a different text format. (RDF, triples, ...)

breck · on May 22, 2024

It would certainly be fair to add RDF/triples/semantic web, to prior work. I spent many years exploring that stuff.

We are aiming at roughly the same problem. Our implementation has solved some important details.

FabHK · on May 22, 2024

Might be worth highlighting some of the problems solved (particularly those that earlier ideas haven't).

0x445442 · on May 22, 2024

I wish there was a defacto/canonical site that housed free papers that people could go search before embarking on these types of efforts. Perhaps there is but when I attempt these types of searches I get directed to pay walled ACM type links or Github "Papers We Love" type links.

breck · on May 22, 2024

You might enjoy https://pldb.io/, which is a paywall free, open source, public domain, database you can browse completely locally, with information on all of these kinds of prior languages.

:)

sleno · on May 23, 2024

Funny, someone linked me here. I just posted a project that got a bit of traction related to this: papertalk.xyz

mynotaccount · on May 23, 2024

sci-hub

kjksf · on May 21, 2024

I've built a web-based tool for myself that has similar philosophy: https://edna.arslexis.io/

It does support multiple pages but you can use just one.

It has a nifty feature in that you can divide the single file into virtual parts. They just have alternate backgrounds to tell them apart. And each virtual part can have a type for syntax highlighting (plain text, markdown or a programming language).

I've been using it for a few months now and it's my primary note taking / knowledge recording thing.

Even though it's web based, on Chrome you can save notes on disk so it works like a desktop app.

Each note is a plain text file so you can edit them in any text editor.

If you put notes on a shared drive (Dropbox, OneDrive, Google Drive etc.) you can work on notes on multiple computers.

It's also open-source: https://github.com/kjk/edna

gitinit · on May 21, 2024

EDIT: Originally I just looked at the website. Looking at the GitHub repo, I see it's a fork, which makes sense (I also didn't notice the other replies!) Either way, it's cool. I'll probably end up using this myself. I was unable to find a way to store notes in a folder or in encrypted Gists though.

This seems nearly identical to Heynote[0], which was also on HN[1]. Even the example blocks share some content with that used as an example in the screenshot on the Heynote homepage (and I think in the app too)

[0]: https://heynote.com/[1]: https://news.ycombinator.com/item?id=38733968

kjksf · on May 22, 2024

To save on disk you must use Chrome or Edge because only they support necessary APIs.

Initial note storage is in localStorage. To switch to disk: right-click for context menu, `Notes storage` / `Move notes from browser to directory`.

Then choose a directory on disk and we will do one time migration from localStorage => disk.

You can then switch to another directory (some apps call it a "workspace"). Because why not.

Encryption is probably the next feature I'll add because I want to store secrets in my notes and I'll feel better if those notes are encrypted.

More docs: https://edna.arslexis.io/help

Multiple notes is pretty big addition. I loved the concept and implementation of blocks in Heynote but a single note was a deal breaker for me.

I've also added some UI like right-click context menu for discoverability, ability to enable spell checking.

And I'm really trying to optimize for speed of use, including speed of switching between notes.

For example you can assign Alt + 0 .. Alt + 9 as note quick access shortcuts.

By default I create 3 notes: scratchpad, daily journal and inbox and they get Alt + 1, Alt + 2, Alt + 3 quick access shortcuts but you can assign them to any page you want.

Brajeshwar · on May 22, 2024

Jonatan Heyman produces some pretty awesome and useful apps/tools. One should check out his work - https://heyman.info

desio · on May 22, 2024

Looks like that's on codemirror framework? Any good resources you could share on wiring up custom language and view? I've managed to kinda get something working with lezer but the docs aren't great and I want to setup some pretty specific behaviour in the view with folding and validation etc.

kjksf · on May 22, 2024

Yes, Codemirror.

What I know about Codemirror I mostly learned by reading other people's code so I suggest that.

Specifically code of silverbullet: https://github.com/silverbulletmd/silverbullet/tree/main/web... (and a few other directories there).

It implements very advanced Markdown mode, lots of code to learn from.

zcw100 · on May 22, 2024

Looks like a CLI version of a tiddlywiki

porridgeraisin · on May 21, 2024

Just found out this is a fork of heynote! Was looking for one of these with web support

jonatanheyman · on May 22, 2024

Heynote exists as a web app as well :)

https://app.heynote.com/

kjksf · on May 21, 2024

Yeah, I loved the simplicity and speed of Heynote and math mode.

I wanted multiple notes and I didn't get why it was made as a desktop app first given that all functionality to implement it is available in a browser (well, Chrome).

So I forked it and added those features.

Been using it daily so it was worth it.

canadiantim · on May 21, 2024

How does the saving notes on disk work? You mean just downloading it? Or is the content synced? If so how does that work?

kjksf · on May 21, 2024

Chrome implements APIs that allow accessing files on the disk.

So Edna either stores notes in localStorage or in a directory of your choosing on disk.

In Edna you can right-click for context menu to switch between localStorage and disk.

If you ask: "how do the browser APIs work", you can look at https://github.com/kjk/edna/blob/main/src/fileutil.js

Basically, there's `window.showDirectoryPicker()` to ask user for permission to access directory (either read only or read write). And then using that directory handle you can read list of files, read / write files or create new files.

wernsey · on May 22, 2024

Oh, man, many years ago I used Tiddlywiki (and later Wiki-On-A-Stick) as a browser-based note taking app, but stopped using it because the API they used to save the file to disk got deprecated and removed.

History not repeating but rhyming, I suppose...

Anyway, thanks for this. I've just added it to my bookmarks.

smusamashah · on May 21, 2024

This is great. Any plans to add images support? (for screenshots in my case) I use OneNote extensively because it's free form like a white board and allows pasting images (which i often do while debugging).

kjksf · on May 21, 2024

Probably not to Edna. It's focused on being fast and lightweight.

I've been thinking about more featureful markdown note taker that would support images and more.

I've started on such a thing but stalled. It's way more work. The good thing about Edna is that I spent less than a month adding the features I wanted to Heynote fork.

The current version is at https://notedapp.dev/ but don't use it for actual notes.

ralgozino · on May 22, 2024

sounds like Obsidian's canvas: https://obsidian.md/canvas

eMPee584 · on May 22, 2024

inkscape and prezi should already do some of this

pcblues · on May 22, 2024

A thing I used OneNote for was easy OCR.

FredPret · on May 21, 2024

Very cool!

I love the math block. Is there a way to reference a variable elsewhere, or fetch data online? Then you could build a little personal dashboard with it.

kjksf · on May 21, 2024

Not at the moment.

I was thinking about making math more like a mode i.e. make it available in every block type, as opposed to it's own block type.

Then it would be active in plain text, markdown and even code blocks.

As to data fetching - falls a bit outside of scope.

porridgeraisin · on May 21, 2024

Heynote is similar

kjksf · on May 21, 2024

Edna is a fork of Heynote with a bunch of changes.

Mostly it supports multiple notes and it's a web app, not a desktop app.

I could build a desktop app but it would not offer almost any advantages given that Edna can also save notes on disk (that's how I use it).

You can use Chrome's "Install" feature to make it look act like a native app (it opens in it's own window and acts independently of the browser).

jonatanheyman · on May 22, 2024

Heynote also exists as a web app: https://app.heynote.com/

BOOSTERHIDROGEN · on May 22, 2024

Can I self hosted this ?

sphars · on May 22, 2024

Looking at the GitHub repo[0], I don't see why you wouldn't be able to host it yourself (extra configuration may be required). In the package.json, there is a script for running the web app `npm run webapp:build`, so I'd assume you could do that and then host the built web app in ./webapp/dist however you'd like.

[0]: https://github.com/heyman/heynote

jonatanheyman · on May 22, 2024

Yep, that should work!

BOOSTERHIDROGEN · on May 26, 2024

Will you provide an official Docker image in the GitHub Registry or Docker Hub? Thank you.

wodenokoto · on May 22, 2024

I don’t get it. How do I now that something is a data definition and not just more data?

Is “>” a special character together with space and new lines? He calls it a trick, why?

How do I add data with spaces and new lines?

Is “Parser” a keyword that you postfix to names of values? He writes “idParser” and then has a value in each observation that is named “id”

breck · on May 22, 2024

> I don’t get it. How do I now that something is a data definition and not just more data?

In our ScrollSet implementation, a measure definition (what you call a "data definition") is a subset of a parser. You will know something is a measure definition when you see a line starting with a word with a "Parser" postfix, and nested inside that definition is a line like "extends abstractMeasureParser".

Below is a link to a web IDE we built. You can see all of the measure definitions currently powering PLDB on the left. On the right, you can see a concept ("more data", in your terms).

https://jtree.treenotation.org/designer#url%20https%3A%2F%2F...

> He calls it a trick, why?

The current term of art is "Offi-side rule" (https://en.wikipedia.org/wiki/Off-side_rule). I never liked that term. I call it the indentation trick. But I am referring to the Offside_rule.

TZubiri · on May 22, 2024

Xml is too bulky, let's do csv Csv is too limited too strongly typed, let's do json. Json is too heavily punctuated let's do yaml. Yaml is too yamly, let's do this instead.

wruza · on May 22, 2024

Let’s just do json5 after json. https://json5.org/

corn13read2 · on May 22, 2024

I'll wait for version 6

Quothling · on May 22, 2024

Eventually everything becomes toml.

All jokes aside, I think the equivalent for this would be markdown not xml/csv/json/yaml.

samatman · on May 22, 2024

In case you would like to be less (or more) confused, this is an application of Tree Notation, by the same author https://treenotation.org/

I suffer from the same flaw as the author, a tendency towards grandiosity and fervor in describing my good ideas. So I'm in a good position to advise that he knock it off: people don't like that, and it will keep them from using your stuff even if it's good.

Which it might be, actually. The extreme simplicity of the foundation is laudable.

breck · on May 22, 2024

The brevity and grandiosity is not for marketing the idea, it is so the idea can be attacked. I don't want to waste my working hours building a factory out of the wrong materials. If I've made a mistake, I want to know.

If the idea is truly good, the products built on the idea should do just fine.

samatman · on May 22, 2024

It's your project to run as you please, of course.

My guess is that the attacks you draw will skip any basis in technical merit and land directly on the tone, proceeding on an emotional basis. We have an n=1 here with plenty of that behavior on display.

You'd like to believe that someone proposing Tree Notation for a project wouldn't be dismissed with "isn't that, like, the YAML for TimeCube guy?". But this is, in large part, how the world actually functions.

breck · on May 22, 2024

It's been a slog, but I'm very happy with how the ideas in Scroll (which for all intents and purposes Tree Notation and Grammar are Scroll--99% of usage is Scroll) and PLDB have evolved.

I don't mind the pushback.

If it wasn't for the pushback against Tree Notation, I never would have started PLDB. ("Learn to research properly", one commenter once said. And he was right. I think PLDB is the proper way to do research).

It's much nicer to get pushback than crickets. That means people are generously giving their time to consider the ideas.

Crickets is the worst. I should know, I mostly get crickets.

benatkin · on May 22, 2024

Nested Markdown with code fences is plain text.

Alas, vscode will choke on it.

I have a project where a thin wrapper loads from a giant markdown file into a sandboxed iframe. That way you could paste code from an unknown source into it and play with the output and paste private data into it and it wouldn’t be encoded into a URL sent to a server, as making network requests and following links are blocked.

https://codeberg.org/ristretto/pages

notebook.md is huge, output in the project website, link to source in the README.

dflock · on May 21, 2024

This is _so much more_ than the title suggests - this is not about making notes in text files.

breck · on May 22, 2024

You get it ;).

runjake · on May 21, 2024

Caveat from article:

  > For pragmatic reasons, it is best to split your data into 1 file per concept and combine concept files at runtime.

pizzafeelsright · on May 21, 2024

All separate things should be in different files.

And files are just key/values anyway.

egeozcan · on May 22, 2024

I wouldn't say "should" but I agree.

A file is a very abstracted concept and it technically can mean a lot of different things depending on the file system.

However, it is a very good abstraction that's nearly universal and practically there is little to no reason not to use them to organize things.

a_c · on May 22, 2024

If we forego human read-write-ability to gain some interactivity, we got https://tiddlywiki.com/ , a single long html file

fwip · on May 21, 2024

Not sure why the "fast filesystem" links to the M1 processor.

breck · on May 22, 2024

You are right, that is not clear. I've added a note and link (https://github.com/breck7/breckyunits.com/commit/61792237c0b...)

"The M1 laptop was the first consumer machine I tried where the performance of this system wasn't abysmal." - https://breckyunits.com/building-a-treebase-with-6.5-million...

Thank you!

fwip · on May 22, 2024

Thanks for the explanation. :)

bitwize · on May 22, 2024

Reminds me of the Canon Cat. You put a disk in and it would store everything you typed as a single, long document on the disk. You could put dividers in the document to separate sections. Parsers in the Cat's system software allowed for specific actions to be taken on parts of the document; for example, tabular numeric data could be identified and spreadsheet-like functionality could be enabled over that data. The whole document was searchable via a pair of LEAP keys which, when held down while typing, would search for what was typed. Jef Raskin of Macintosh fame was responsible for this UI.

https://en.wikipedia.org/wiki/Canon_Cat

breck · on May 22, 2024

This is fascinating. I don’t think I’ve seen this before. Thank you.

TZubiri · on May 22, 2024

When you still haven't emerged from the covid pandemic and your shutdown project started to take roots deep in your mind.

It reminds me of that scene from The Shining where the character writes the same sentence over and over again.

breck · on May 22, 2024

All text and no syntax makes Breck a dull boy. All text and no syntax makes Breck a dull boy. All text and no syntax makes Breck a dull boy.

fellowniusmonk · on May 21, 2024

I'm so excited for this kind of work. I think there is an alternate history where EMACS or an EMACS equivalent became the dominant OS but the onboarding process was too onerous, and the community has been focused on technical integrations instead of integrating a larger less technical community of people into a sane but simpler default.

With AI I think interfaces will further bifurcate between "users" and "creators" and pretty much all of our "desktop" ui paradigms will be consigned to history in favor of structured collaborative text interfaces.

pama · on May 22, 2024

I thought I wasnt alone but perhaps I live in a sparsely populated alternative history where Emacs gets simpler over time. Once you know the basics they dont change. Some more advanced tools gradually simplify or improve but it takes years. Various ideas are explored by users around the globe and the simplest and best ones survive: we now have magit and eglot and treesitter support. And org, but also markup. The shells are true shells with unlimited context and full access to the OS. Similarily for the REPLs. The only thing I miss is changing tools all the time and losing history, which felt like a refreshing excuse to start over when I was younger —- these days I dont have the patience and time.

myhf · on May 24, 2024

“Every program attempts to get simpler until it can no longer read email. Those programs which cannot simplify are replaced by ones which can.”

rifty · on May 23, 2024

I feel like the focus on language appearance is taking too much precedence over covering other aspects like parser composition. For example mentioning the 'indention trick' feels like a deviation and distraction from the actual point you're trying to convey. The idea here isn't actually dependent on the exact presentation style of the format...

To comment on the appearance though since it seems a focus none the less... I appreciate the ideal of syntax sparseness, but in this case I feel like it loses visual salience in plaintext after looking at some of the .scroll files. It's difficult to recognize the shape and proportion of what the content will be when rendered. The applied meta content lacks visual differentiation in plaintext from the content itself. I don't think total spareness should be the sole goal here; Markdown for example isn't strong in plaintext just because it is syntactically sparse, but because it is sparse in tandem with not supporting applying extensible meta content to content - but this does.

csomar · on May 22, 2024

I understand the author point, but I think this is over-complicating a database table while losing most of the features a database can give you.

This is not some new concept, however. I stumbled upon this concept two years ago with some dude promoting a "Vault" architecture, where you use a single "notion.so" table to store all your data. You create views from this data to separate topics. You'll then be able to "centralize" all you notion stuff in a single file; all while being able to link any two topics or more together.

What hit me is that I can export the notion table to CSV and then this can be fed into an AI pipeline that might be able to predict my tasks better (like code). Only problem was, a couple of months into this and the notion interface became completely unusable.

This can be done with a regular database. Though the views/interfaces to interact are not that easy to create. I didn't find an alternative (I tried airtable too)

dSebastien · on May 22, 2024

One big text file (OBTF) for the win? I'm curious about the pros and cons people think of.

https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....

h2odragon · on May 21, 2024

plain text has so many advantages.

then you need some syntax for the strictures of your use case.

and /etc is reborn.

Terr_ · on May 22, 2024

Then you want to be able to access old copies, and RCS [0] is reborn...

[0] https://en.wikipedia.org/wiki/Revision_Control_System

kriro · on May 22, 2024

It is surprisingly common for very good bug bounty hunters to rely on stuff.txt as their major "knowledge base". At least I've heard this from a couple of high earning guys in interviews. They usually just grep through it or roughly remember where things are. I was quite surprised to hear that.

smarm52 · on May 22, 2024

(Excuse me if this is obvious, I have limited time, but this article grabbed my attention; Fascinating).

How do you handle writes? It seems like an interrupted write process could corrupt a section of text, which could be difficult to recover from.

Given the example at "breckyunits.com", I don't see hashing information associated with each item.

Are you depending on git to prevent such errors from corrupting individual items? If so, then I would be concerned about gits propensity for data corruption [1, 2, 3, 4].

I wonder if adding some ZFS-like hashing and integrity checks would be helpful. Then, as it's one big file, it seems to act like a TAR archive [5], where you append to the end, but have to scan through the previous content to find what you want. If that's the case, then it may be viable to do copy-on-write [6], where information is never modified, but instead referenced with a key, and later modifications supersede older versions.

(Again apologies if this is redundant, I just had the thought and had to get it down. XD)

[1] https://superuser.com/questions/1253830/does-git-prevent-dat... [2] https://superuser.com/questions/1635797/what-if-git-reposito... [3] https://stackoverflow.com/questions/tagged/corruption?tab=Fr... [4] https://www.reddit.com/r/git/comments/oq9wph/power_outage_in... [5] https://en.wikipedia.org/wiki/Tar_(computing) [6] https://en.wikipedia.org/wiki/Copy-on-write

akasakahakada · on May 22, 2024

Educate me if this is not reinventing the wheel of ymal, toml, xml, etc.

enriquto · on May 22, 2024

One would rather say that ymal, toml, xml, etc. are wheel reinventions of plain text files.

zaik · on May 22, 2024

Too many people don't know about Wikidata.

breck · on May 22, 2024

Can you elaborate?

jhoechtl · on May 22, 2024

Glad recutils got a mention. I find it superior to the proposed concept. Sadly recutils never cought on.

zitterbewegung · on May 22, 2024

This reminds me of Org Mode[1]

https://orgmode.org

thorncorona · on May 21, 2024

this is solved better by obsidian

Sarky · on May 22, 2024

Indeed. Markdown files seperated into folders. I organize them into topics. Easy to search with a lot of possible customizations. And even setup without customizations is optically pleasing and functional

breck · on May 21, 2024

Can you explain why?

atrus · on May 22, 2024

Obsidian is a bit closer to an unstructured TreeBase than this single file TreeBase imo.

cpr · on May 21, 2024

I did this for decades (using Emacs) but finally gave up and am using Notes.

ukuina · on May 22, 2024

Interestingly, I am moving more towards plain text notes because they are easier to ingest for LLMs.

breck · on May 22, 2024

Any useful tricks or techniques you picked up along the way?

AtlasBarfed · on May 21, 2024

There's two hard problems in computer science: name spacing and caching.

This ... Is namespace hell, and if you squint at the caching problem, it's actually an indexing problem, which is also related to this.

adtac · on May 21, 2024

The aphorism typically says cache invalidation is hard. Not because you don't know what index to invalidate but because it's hard to invalidate the thing at the right time.

Caching itself is quite easy, just ask the designers of speculative execution at Intel :)

samatman · on May 22, 2024

If a cache doesn't have cache invalidation, it isn't a cache, it's a database.

igtztorrero · on May 22, 2024

I like the blog site, it's using scroll language, cool !

EGreg · on May 22, 2024

What I want to know is, what is the maximum size of a PHP file that can be loaded?

I guess I can TIAS but is it documented anywhere ??

082349872349872 · on May 22, 2024

tangent: http://www-formal.stanford.edu/jmc/elephant/elephant.html

breck · on May 22, 2024

I know what my rabbit hole of the day will be now.

Thanks! (https://github.com/breck7/pldb/commit/83ba14454ed80fa682c85d...)

pimlottc · on May 22, 2024

I have no idea what the visualization means.

racional · on May 22, 2024

See also - Ask HN: How do you store your knowledge? - https://news.ycombinator.com/item?id=40131689

rakoo · on May 21, 2024

Is this a serious article ? The state of the art of knowledge ?

There are at least two (2) existing prior art implementation that have done this for years, only better (as in, with better tools):

- recutils: https://www.gnu.org/software/recutils/

- ndb: https://9fans.github.io/plan9port/man/man7/ndb.html

Please, developers of everywhere, I beg you: please learn what came before you before reinventing the wheel, only triangular this time. Please take the time to appreciate that if it's so obvious maybe it's because of your ignorance and not your genius.

stavros · on May 21, 2024

Maybe the author just independently came up with this, thought it was cool, and wanted to share?

I don't know if "please do a thorough literature review before showing me things" is the right sentiment here.

vineyardmike · on May 22, 2024

Considering the article has a “prior art” section, I assume a literature review would be appropriate.

My confidence is shaken considering the sparse “prior art” section links to Apple M1 as an example of “fast file systems”.

breck · on May 22, 2024

It wasn't clear why I mentioned M1. I updated that. Thank you.

https://github.com/breck7/breckyunits.com/commit/61792237c0b...

A number of things have to best fast for this system to be enjoyable to use (at scale) and before the M1 no personal machine I ever tried came close.

tambourine_man · on May 22, 2024

Yeah, this is really weird.

chipdart · on May 22, 2024

> Maybe the author just independently came up with this, thought it was cool, and wanted to share?

That's perfectly fine, but that's besides the whole point.

The point is that between coming up with something and implementing it, there should be a step to check if anyone already did something similar.

The whole point of researching prior work is to a) don't waste time reinventing the wheel, b) leverage prior work to improve your own ideas, c) make better use of your time by doing meaningful contributions instead of taking a risk on whether you're ripping off someone else's work.

That's the absolute basic standard on scientific publishing, for example. If you pick up any paper at all, you'll notice that right after the introduction and summary you get a bibliographical review listing any relevant work that your peers already contributed. When anyone submits a paper, the reviewers can and outright do reject your submission if it fails to adequately contextualize the paper with regards to prior art and related work. One of the points is to ensure the author is not wasting everyone's time with a novel approach to the wheel.

More importantly, if an author fails to know what's already there, how can they tell their idea is any good?

rakoo · on May 21, 2024

I'm not sure the paper-like presentation of the article shows that the author was in pure discovery mode, eager to share something new and interesting.

My message is an echo to earlier comments of earlier posts that talked about a similar point: nothing is ever new, everything has already been done before. If we tell ourselves we're engineers, we should be studying what came before in order to prove that the new thing is indeed better.

That being said, recutils is the standard method of recording data in GNU, and ndb is the standard method of configuring stuff in Plan 9, a system that any proponent of UNIX mindset should know about. I'm not exactly talking about obscure stuff here.

chipdart · on May 22, 2024

> I'm not sure the paper-like presentation of the article shows that the author was in pure discovery mode, eager to share something new and interesting.

If the author was following a paper-like presentation, the author somehow skipped the section listing relevant prior work. This is something every single journal enforces, as researching prior work is the very first step any author does when they come up with something.

rkangel · on May 22, 2024

> Maybe the author just independently came up with this, thought it was cool, and wanted to share?

Except that the title ("A New Way to Store Knowledge") is leaning heavily on NEW.

breck · on May 21, 2024

Thank you for bringing up recutils and ndb. I had seen them years ago but didn't make the connection when writing this paper. But there are some great connections, and I will definitely be updating the post with a section and links to them.

I am reading through the source and will have more to say soon. If anyone has any links to massive plain text datasets based on these (or other similar tools), I would appreciate more pointers.

I can tell you now (subject to change), based on my preliminary read through the source is that the two systems you mentioned missed some highly important details that I have presented in my paper, with order of magnitude impacts. Not to discredit them at all, rather I think my work gives them credit, in that they were on the right track, and we just have the benefit of some recent innovations, and get to stand on their shoulders (and the shoulders of others).

Edit:

I have updated the paper with a reference to Recutils. Thanks rakoo! https://github.com/breck7/breckyunits.com/commit/71b706d296e...

The added text:

    GNU Recutils^recutils deserves credit as the closest precursor to our system. If Recutils were to adopt some designs from our system it would be capable of supporting larger databases.
     https://www.gnu.org/software/recutils/

    ^recutils: GNU Recutils: Jose E. Marchesi
     https://www.gnu.org/software/recutils/
    - Recutils and our system have debatable syntactic differences, but our system solves a few clear problems described in the Recutils docs:
     - "difficult to manage hierarchies". Hierarchies are painless in our system through nested parsers, parser inheritance, parser mixins, and nested measurements.
     - "tedious to manually encode...several lines". No encoding is needed in our system thanks to the indentation trick.
     - In Recutils comments are "completely ignored by processing tools and can only be seen by looking at the recfile itself". Our system supports first class comments which are bound to measurements using the indentation trick.
     - "It is difficult to manually maintain the integrity of data stored in the data base." In our system advances parsers provides unlimited capabilities for maintaining data integrity.

rakoo · on May 22, 2024

If your research has added value over the existing state of the art I would love to read more about it !

breck · on May 22, 2024

Thanks to your pointer, I was able to explain a bit more about the advances over the SOTA. Thank you! This is the speed at which peer review should happen.

If I'm lucky, I'll wake up tomorrow to someone else pointing out another precursor I overlooked.

rakoo · on May 22, 2024

Thanks for the comparison ! I see there are some advances compared to recutils, those would benefit being put to the front !

mushufasa · on May 21, 2024

The title, "A New Way to Store Knowledge", indicates this is a joke.

knighthack · on May 21, 2024

The moment I read the text I knew the title was satirical.

You know it is when it starts like this: "...All tabular knowledge can be stored in a single long plain text file. The only syntax characters needed are spaces and newlines."

That's fundamentally the simplest way of storing text. And it's nothing new, yet people have long ignored that simplicity for much more complicated ways of storing text.

m463 · on May 21, 2024

I suspect it refers to Wolfram's "A New Kind of Science".

I don't see it as a this-is-all-a-joke thing though, more tongue in cheek.

also I think one-big-text-file has a certain simplicity, like everything-is-a-file on unix (or more properly plan9)

happytoexplain · on May 21, 2024

Is there some context you're leaving unsaid?

mushufasa · on May 21, 2024

a plain text file is the oldest idea for storing knowledge. see unix philosophy: "Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

sprobertson · on May 21, 2024

Did you read past the title? The main point of the article is a syntax for knowledge bases - plain text is just an implementation detail.

chipdart · on May 22, 2024

If you take out plain text from this presentation, what's left? The tree structure? The log aspect? In order to claim any of this is remotely novel, you have to first ignore the whole body of work built around information systems.

breck · on May 22, 2024

Maybe you missed the link in the "Evidence" section to a 7 year open source project containing 172,162 lines of code, and a compiler compiler.

;)

chipdart · on May 22, 2024

That doesn't answer my question.

breck · on May 22, 2024

> If you take out plain text from this presentation, what's left? The tree structure? The log aspect? In order to claim any of this is remotely novel, you have to first ignore the whole body of work built around information systems.

Thank you for the feedback. I've updated the paper with some more links.

The language in which the measures are written in (currently called Grammar. I will like rename it to something like Parssers) is quite advanced.

The improvements over Recutils, the closest precursor I am aware of, have now been added.

The PLDB ScrollSet is now about 500,000 cells of information. Each cell is strongly typed and fully auditable by git. There is a high amount of signal in that dataset. It is an intelligent set of weights, and continually getting more intelligent. And it is read at runtime as a single plain text file and compiled to a single CSV (or tsv, json, etc).

All from using the system documented in the paper (and the advanced language for Parsers).

If you can point me to a similar database or similar scale anywhere in the world (plain text base, >10e5 size, git backed, strongly typed, hierarchical and graphical), I would be grateful as I might learn something.

andrepd · on May 21, 2024

It must be, right?? The whole thing reads like a satire of the exact kind of thing HN would fawn over. Just look at the current comments!

SrslyJosh · on May 22, 2024

I'm not sure myself. I didn't want this to be the second comment on the submission so I'll say it now: I got TimeCube vibes from this.

robertclaus · on May 22, 2024

I hope so...

Biganon · on May 21, 2024

This gives off a TimeCube vibe.

throwthrowuknow · on May 21, 2024

Facts