Hacker News new | past | comments | ask | show | jobs | submit login

I’m looking forward to the post mentioned towards the end, detailing the cutting of features. I’ve been following the work on this for ages and was looking forward to types data as a way to guard against accidental mistakes, so I’m keen to read the author’s reasoning for switching to strings and arrays of strings. I’m sure there’s a non-trivial reason, his work has always felt careful, measured and well though out to me.

Thanks for the support! Since I may not get to those blog posts in a timely fashion, here's a little outline (and maybe this comment will form the seed of them).


I should be more specific: I am ruling out reusing Python data structures, but I'm not ruling out typed data forever. I'm leaving space for types.

Though given all the work I see in the next year, I don't see a ton of work on them happening. If they happen, they won't not literally be Python types, because they don't seem like a great fit.

Background: A major motivation for using Python was that I wanted JSON in shell. JSON is a "versionless" textual interchange format that's ubiquitous on the web, and I see the web as an extension of Unix.

And there's an obvious correspondence between Python data structures and JSON: dicts, lists, string, float/int, bool.

But those structures are still a "model", and I've had many programming experiences that led me to believe that "bytes on disk/network -> data structure in memory -> bytes" is something of a fallacy.

Or at least it's a coarse, limited model, and not very shell-like. (For the purposes of a blog post I should find a name for that fallacy.)

It very much relates to these comments:

- https://lobste.rs/s/uizrgy/making_light_website_02_design#c_... -- a different (old) way of HTML processing I encountered and used that's not (text -> DOM -> text)

- https://news.ycombinator.com/item?id=22156950 -- Making the analogy to the flawed Python 3 string type, which is (bytes -> code points -> bytes)

- https://news.ycombinator.com/item?id=22111403 -- Kragen's "memory models" post got me thinking about these things.

Another way to say it would be: instead of "dict, list, string" as the model, I think a better model for a shell is "JSON, CSV/TSV, and HTML". In other words: (1) tree shaped data, (2) table shaped data, and (3) documents. [1]

In a more literal and concrete fashion -- NOT really high level abstractions for them. Maybe another slogan could be "Bijections that aren't really bijections".

However I'm not promising to get to this in the next year :)


But practically speaking, a lot of programming is spent shuffling between representations. And a lot of the times that's not only inefficient, but also incorrect. It relates strongly to this post:

How to Quickly and Correctly Generate a Git Log in HTML


If you want a buzzword I would say shell is a language where you deal with "concretions" -- that is, you deal with reality rather than abstractions. But a crucial point is that a language can still help you avoid mistakes when programming concretely -- in a large part with principled parsing and quoting/dequoting.

You're not going to "groveling through backslashes and braces one a time" as I call bash's style. The language can provide sharper tools than (JSON -> dict/list -> JSON) or (HTML -> DOM -> HTML).


A related thought is that I noticed that procs and funcs don't really compose (procs being shell-like functions, funcs are Python/JS-like functions).

As an analogy: If you've ever written a Python/C extension or a Node/C++ extension, you will also see that those models also compose pretty poorly. Despite the similar syntax, there's a surprising amount of friction semantically (types and GC being two main things).

So adding typed data and Python-like functions was almost tantamount to adding a "separate" language within shell itself. The language loses its coherence when the primitives don't compose. There can be too much shuffling.


So that's a bunch of thoughts that may go into a blog post. Hopefully at least a few parts made sense :)

[1] This reminds me that there's a paper Unifying tables, objects, and documents that I think influenced C# ? I should go re-read it. An important point is that objects are a often a poor fit for representing tables and documents (e.g. ORM problem, awkward DOMs), and tables and documents are more prevalent in the wild!

You can deal with tables and documents "directly", rather than awkwardly manipulating object/tree representations of them. CSS selectors are a way of doing this, i.e. you're not "shuffling objects". A comparison I would draw is 2004-style DOM API vs. jQuery / HTML5 DOM.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact