Hacker News new | comments | show | ask | jobs | submit login
Cicada – Unix shell written in Rust (github.com)
244 points by mitnk on July 1, 2017 | hide | past | web | favorite | 166 comments



Amazing work! Between this, Alacritty [1], and coreutils [2], we're getting pretty close to a plausible all-Rust CLI stack.

While Cicada is pretty clearly modeled on the "old" generation of shells (sh, Bash, Zsh, etc.), one has to wonder what a more modern shell might look to address some of the problems of its predecessors like pipes that are essentially text-only, poor composability (see `cut` or `xargs`), and terribly obscure syntax.

PowerShell seems to have been a flop outside of the Windows ecosystem, and between overly elaborate syntax and COM baggage, was probably rightfully passed over, but it probably had the right idea. Imagine a future where we're piping structured objects between programs, scripting in a modern language, and have IntelliSense quality auto-completion for every command.

---

[1] https://github.com/jwilm/alacritty

[2] https://github.com/uutils/coreutils


My structured objects aren't your structured objects. In that use case why not use a dedicated program (like say ae python/js/clojure interpreter) to handle more complex pipes and keep the shell level primitves... Well, primitive.

Text is compatible with all systems, past and present, and can be used to model more complex objects. Let's not add features to the core on a "why not" please.


The so-called "plain text formats" are also just representations of complex objects. Often these formats are ad-hoc and hard to parse correctly. Multiple text files in a file system hierarchy (e.g. /etc) are also just nested structures.

So in principle, treating those as complex object structures is the right way to go. Also, I believe this is the idea behind D-Bus and similar modern Unix developments.

However, what's hard is to provide good tooling with a simple syntax that is simple to understand:

* Windows Registry and Power Shell show how not to do it.

* The XML toolchain demonstrate that any unnecessary complexity in the meta-structure will haunt you through every bit of the toolchain (querying, validation/schema, etc.).

* "jq" goes somewhat into the right direction for JSON files, but still hasn't found wide adoption.

This appears to be a really hard design issue. Also, while deep hierarchies are easier to process by scripts, a human overview is mostly achieved by advances searching and tags rather than hierarchies.

A shell needs to accomodate for both, but maybe we really just need better command line tools for simple hierarchial processing of arbitrary text file formats (ad-hoc configs, CSV, INI, JSON, YAML, XML, etc.).


I lean towards wanting all my tools to support at least one common structured format (and JSON generally would seem the best supported) these days. It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps. If you did that, then e.g. a "structured" shell could just default to turning that on, and try detecting JSON on output and offering according additional functionality.

That'd be a tolerable graduated approach where you get benefits even if not every tool support it.

With "jq", I'm already increasingly leaning on JSON for this type of role.

It'd need to be very non-invasive, though, or all kinds of things are likely to break.


I wish there was a way for pipes to carry metadata about the format of the data being passed over the pipe, like a MIME type (plain text, CSV, JSON, XML, etc). Ideally with some way for the two ends of the pipe to negotiate with each other about what types each end supports. I think that sort of out-of-band communication would most likely need some kernel support.

Maybe an IOCTL to put a pipe into "packet mode" in which rather than just reading/writing raw bytes you send/receive Type-Length-Value packets... If you put your pipe in packet mode, and the other end then reads or writes without enabling packet mode, the kernel can send a control message to you saying "other end doesn't support packet mode" and then you can send/receive data in your default format. Whereas, if both ends enter packet mode before reading/writing, then the kernel sends a control message to one end saying "packet mode negotiated", and which triggers the two ends to exchange further control messages to negotiate a data format, before actually sending the data. (This implies pipes must be made bidirectional, at least for control packets.)


As counterpoint if only for the sake of my wetware memory, I'd rather have only a few pipes being as agnostic and generally capable as possible than complex plumbing.

I don't mean to argue there is no value in specialised adapters, but I believe the default should be as general as can be. Let me worry about parsing/formats at the application level and get me simple underlying pipes. If I need something specific I should be prepared to dig into docs and find out what I need anyway, so default text vs install or explicitely configure your system to use something else seems like a sane feature/complexity segregation in the general case.

EDIT: good quote from another thread to illustrate my point: > Removing the responsibility for reliable communication from the packet transport mechanism allows us to tailor reliability to the application and to place error recovery where it will do the most good. This policy becomes more important as Ethernets are interconnected in a hierarchy of networks through which packets must travel farther and suffer greater risks.

replace Ethernet with pipes and the point still has merit IMO. Lifted off of https://news.ycombinator.com/item?id=14675115


Programs can determine if the receiving output is a TTY or not and enable e.g. colored output accordingly, sounds like something similar.


This is called a "Unix Domain Socket" they're 20+ years old.


It's an innovative and interesting idea. But just to play devil's advocate for a minute, for this to get widespread uptake, the way it is implemented would have to be agreed on as being the right way, by a lot of people (i.e. users). And there could be lots of different ways of implementing what you describe, with variations, which makes it more difficult (though not impossible) to get accepted widely, compared to the existing plain pipes and passing data as text around between commands, because not much variation is possible there.


I'd prefer a "crash if receiver doesn't support sender's MIME type" model for debugability.

Which could still be ergonomic if all the common piping programs supported some specific collection of formats and there was a relatively robust format-converting program.


> It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps.

Look at FreeBsd's libxo. It's supported by most of the base system.


It looks like libxo does the output in structured format, but what about input? (For chaining commands together with pipes). Did the FreeBsd porting work[1] implement the input side with libxo, independently of libxo, or not at all?

(Not to diminish libxo --it looks pretty cool, and I didn't know about it before-- just curious.)

[1]: https://wiki.freebsd.org/LibXo


On the input side, FreeBSD prefers libucl https://github.com/vstakhov/libucl


I don't think input meant configuration file syntax here, but actual

    echo $somejson-jobspec | ifconfig
instead of

    ifconfig em0 inet 192.02.1/24 alias


That looks very interesting, though I actually think that standardising on the command-line/env variable API is more important than an implementation. It's the moment you can "automagically" "upgrade" a pipe to contain structured data for apps that support it that you get the most value.

But given that it's there, I'll certainly consider it when writing tools, and consider supporting the command-line option/env var even if/when I don't use the implementation...


Amazing! I did not know that.


Nice idea! Maybe this issue is similar to that of command completion, but could also result in a similar mess. Every command has its "-h" or "--help", showing the syntax, available options and so on. But since even that is not really standardized, separate command-line completion rules are written for every command. And for every shell.


Define a standard format for describing command syntax. I'd suggest basing it on JSON, but you could use XML or Protobuf or Thrift or ASN.1 or Sexprs or whatever. Embed in executables a section containing the command syntax description object. Now when I type in a command, the shell finds the executable on the path, opens it, locates the command description syntax section (if present), and if found uses that to provide tab completion, online help, smarter syntax checking, whatever–if you've ever used OS/400, it has a feature where it constructs a fill-in-form dialog based on the command syntax descriptions, which you can call up at the press of a function key. (Obviously the shell should cache this data, it shouldn't reread /bin/ls every time I type "ls".)


There isn't a 1:1 mapping of binaries to documentation or completion scripts. That idea won't work out.

"Defining a standard format" hasn't been a successful practice in the Unix world. Many standards consist essentially of the bare minimum that everybody can agree with.

Unix command-line interfaces are already structured (as a list of strings), and more structure would be hard to support at the binary level. There are too many possibilities of doing that, and most CLI programs wouldn't even need that.


If the binary is using one of a few common libraries, e.g. GNU getopt, to read its command-line parameters, it should be possible to extract somewhat useful completions automatically.

Then you are essentially relying on the de-facto standard of not rolling your own argument parsing, as opposed to some product of a standards committee.



Getopt is just barely flexible enough to be applicable to many projects while still encoding some conventions (dash-prefixed options and optional arguments).

Still it's too opinionated to be applicable to all programs. And it's by far not structured enough to support automated completions.

Every programmer goes through this phase of trying to abstract what can't be abstracted. Essentially for non-trivial programs you need to encode what you want yourself. You can't write a library that is suitable for the infinitely many thinkable programs.

By the way, even handcoded completions are, often enough, more confusing than helpful to me (and not because they are handcoded). I've disabled them entirely.


> By the way, even handcoded completions are, often enough, more confusing than helpful to me (and not because they are handcoded). I've disabled them entirely.

Huh? Can you give some examples? I'm on zsh, and except for broken auto-generated shell completions like that provided by ripgrep, I've found zsh completions to be incredibly competent. Much more competent than myself usually.


I only know bash, but I doubt zsh can be so much better (yes, I know, they have colorful suggestions and better interactivity for selection).

The context sensitivity is just very confusing. Sometimes completion hangs, maybe because an SSH connection is made in the background, or the completion is just not very efficient.

Then sometimes I have a typo and get the weirdest suggestions and it's much harder to track down the error statically than just running the program and get a nice error message.

Then sometimes I am more intimate with the complexities of the command-line invocation than the completion script can appreciate, and get no completion, when all I need is just a stupid local file completion. This results in the (untrue) assumption a file wasn't there, which easily leads to further confusion.

No, thanks. I know how to use the tools that I use. Dear completion, just help me with the files on my disk (which is the fast-changing variable in the game) and let me type commands as I see fit.


> I know how to use the tools that I use.

Interesting. I use too many different tools too rarely to remember all their options, so being able to type -, hit tab and see a list of possibilities with their descriptions is simply more frictionless than having to pull up the command's man page and search through it. The cases where the suggestions don't work don't bother me, because it just means I don't save time compared to no completions at all.


This is actually a very interesting problem. The format would be an IDL, since you're describing a "struct" of options, array-valued fields, and "typed" fields that are constrained to certain grammars (numbers, &c.).


This sort of exists. The command syntax format is the standard man page "Usage" and "Options" sections, and an implementation is docopt (http://docopt.org/)


I still miss AmigaOS, where command options parsing was largely standardised ca. 1987 via ReadArgs() (barring the occasional poor/direct port from other OS's) [1]. It wasn't perfect (it only provided a very basic usage summary), but it was typed, and it was there.

[1] http://www.pjhutchison.org/tutorial/cmdline_arrgs.html


If it was only that, what about libraries for handling all kinds of file formats via the OS plugin system?


There are lots of things I miss. Datatypes is definitively one of them. AREXX ports everywhere (I hate the language, but you don't need the language to make use of AREXX ports; Dbus is like what you get if you take AREXX and give it to a committee; the beauty of AREXX was the sheer simplicity that made it so trivial to add support for it to "everything" to the extent that many tools built their central message loop around checking for AREXX commands and built their application around dispatching AREXX like commands between different parts of the app).

Assigns is another one (for the non-Amiga aware that stumble on this, on the Amiga, I'd refer to my files with "Home:", but unlike on a Unix/Linux, Home: is not an environment variable that needs to be interpreted like $HOME or something that needs to be expanded like "~" - it is recognised by the OS as a valid filesystem path, assigned at runtime; similarly, instead of having a $PATH that needs to be interpreted, all my binaries would be accessible via "C:" for "command" - "C:" is an assign made up of multiple actual directories, one of which would typically be Sys:C, where Sys: is the System volume, and Sys: itself is an assign pointing to whatever drive you booted; Assigns are like dynamic symlinks with multiple targets, or like $PATH's interpreted by the OS)


jq is awesome. "json lines" data format (where each record is json without literal newlines, and newline separates records) nicely upgrades unix pipes to structured output, so you can mix old unix tools, like head, tail etc with jq. maybe what's needed is some set of simple json wrappers around core utils to increase adoption.


> "json lines" data format (where each record is json without literal newlines, and newline separates records)

Good point, but jq handles this already. If the payload is an array of object, simply `jq -c '.[]'` to get an object per line


jq can't handle big numbers, it silently mangles them, beware! Was bitten by this twice already..


That's per the spec[1], enjoyably:

    Note that when such software is used, numbers that are
    integers and are in the range [-(2**53)+1, (2**53)-1]
    are interoperable in the sense that implementations will
    agree exactly on their numeric values.
2^53 is only 9007199254740990, so it's not too hard to exceed that, particularly in things like twitter status ids.

The recommended use is to have big numbers as strings, since it's the only way to reliably pass them around. (Yes, this is kind of horrible.)

[1]: https://tools.ietf.org/html/rfc7159#section-6


I have to admit, I was not familiar with this limitation. Thanks for pointing it out.

And I'd go from "kind of horrible" to just "horrible".

Integral bit limits are an implementation detail that every language has a way to overcome - they shouldn't be built into an encoding spec.


Again, not saying this is a good thing, but it's an understandable consequence of the standardization partially involving "let's document how the browsers existing JS implementations deal with running eval on a string". It's easier to write the standard so that existing implementations fit it, rather than requiring changes from everyone.


I imagine it depends on your aims. If your aim is just to document existing behavior, that's one thing. But if your aim is to create a new, simple, data interchange format, documenting existing implementations is only going to limit its usefulness. It could have forced the current implementations fixed instead of setting the bar.

As pointed out upstream, we're dealing with a lot of data these days, and such limitations will only cause JSON to become marginalized or complicated with implementation-dependent workarounds, like integers in strings.


The point is that JSON started out as "data you can eval() in Javascript" pretty much. It gained the traction it did because it is literally a subset of javascript object notation so it was trivial to support.


By the way is this whence the "stringly typed" phrase originated or are there previous instances?


>A shell needs to accomodate for both, but maybe we really just need better command line tools for simple hierarchial processing of arbitrary text file formats (ad-hoc configs, CSV, INI, JSON, YAML, XML, etc.).

Start with record based streams first. Text streams requires [buggy] parsing to be implemented everywhere. It should be possible to have escaped record formats that allow the right side of the pipe to use AWK style $1, $2, $3, etc.

After removing the need to parse the fields, the next priority, imo, would be to introduce integral types so the actual data itself doesn't need to be parsed. u32 on the LHS can just be 4 bytes and then read out on the RHS as 4 bytes. This could save a lot of overhead when processing large files.

Only then would I want to get into hierarchies, product types, sum types, etc.


This is dead on.

Half of all parsing work consists of splitting things into records, by lines, delimiters or whitespace. That's where the great escaping headache begins.

In a better shell the following command would Just Work™:

> find | rm %path/%filename


Powershell does two things right: it uses structured output, and it separates producing the data from rendering the data.

It also does one thing wrong: it uses objects (i.e. the stuff that carries behavior, not just state). This ties it to a particular object model, and the framework that supports that model.

What's really needed is something simple that's data-centric, like JSON, but with a complete toolchain to define schemas and perform transformations, like XML (but without the warts and overengineering).


> What's really needed is something simple that's data-centric, like JSON, but with a complete toolchain to define schemas and perform transformations, like XML (but without the warts and overengineering).

What would it be? Is there a real alternative to XML with those features out there? I don't think so.

When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

Usually implementing some functionality yields every time the same level of complexity regardless of how you implement it (given that none of the implementations isn't out right stupid of curse).


> When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

I don't think so. The problem with the XML stack is that it has been designed with some very "enterprisey" (for the lack of better term) scenarios in mind - stuff like SOAP. Consequently, it was all design by committee in the worst possible sense of the word, and it shows.

To see what I mean, take a look at XML Schema W3C specs. That's probably the worst part of it, so it should be readily apparent what I mean:

https://www.w3.org/TR/xmlschema11-1/ https://www.w3.org/TR/xmlschema11-2/

The other problem with XML is that it's rooted in SGML, and inherited a lot of its syntax and semantics, which were designed for a completely different use case - marking up documents. Consequently, the syntax is overly verbose, and some features are inconsistent for other scenarios - for example, if you use XML to describe structured data, when do you use attributes, and when do you use child elements? Don't forget that attributes are semantically unordered in XDM, while elements are ordered, but also that attributes cannot contain anything but scalar values and arrays thereof.

Oh, and then don't forget all the legacy stuff like DTD, which is mostly redundant in the face of XML Schema and XInclude, except it's still a required part of the spec.

I guess the TL;DR version of it is that XML today is kinda like Java - it was there for too long, including periods when our ideas of best practices were radically different, and all that was enshrined in the design, and then fossilized in the name of backwards compatibility.

One important takeaway from XML - why it was so successful, IMO - is that having a coherent, a tightly bound spec stack is a good thing. For example, with XML, when someone is talking about schemas, you can pretty much assume it's XML Schema by default (yes, there's also RELAX NG, but I think calling it schema is a misnomer, because it doesn't delve much into semantics of what it describes - it's more of a grammar definition language for XML). To transfer XML, you use XSLT. To query it, you use XPath or XQuery (which is a strict superset). And so on. With JSON, there's no such certainty.

The other thing that the XML stack didn't quite see fully through, but showed that it could be a nice thing, is its homoiconicity: e.g. XML Schema and XSLT being XML. Less so with XPath and XQuery, but there they had at least defined a canonical XML representation for it, which gives you most of the same advantages. Unfortunately, with XML it was just as often a curse as it was a blessing, because of how verbose and sometimes awkward its syntax is - anyone who wrote large amounts of XSLT especially knows what I'm talking about. On the other hand, at least XML had comments, unlike JSON!

Hey, maybe that's actually the test case? A data representation language must be concise enough, powerful enough, and flexible enough to make it possible to use it to define its own schema and transformations, without it being a painful experience, while also being simple enough that a single person can write a parser for it in a reasonable amount of time.


> A data representation language must be concise enough, powerful enough, and flexible enough to make it possible to use it to define its own schema and transformations, without it being a painful experience, while also being simple enough that a single person can write a parser for it in a reasonable amount of time.

Seems like a standardized S-expression format would fit the bill.

You could even try to make it work with existing XML tools by specifying a way to generate XML SAX events from the S-expressions.


I'd prefer a format that distinguishes between sequences and associative arrays a bit more clearly. You can do that with S-exprs with some additional structure imposed on top, but then that gets more verbose than it has to be.

JSON is actually pretty decent, if only it had comments, richer data types, and some relaxed rules around syntax (e.g. allow trailing commas and unquoted keys).


Maybe something like EDN? https://github.com/edn-format/edn


The brilliant thing about the status quo is that it already does accomodate both line-based and structured pipelines. There's no reason we should expect to use the same set of commands for both purposes. In fact, we most definitely shouldn't. A tool like jq is a great example of this: it works great for JSON, and it doesn't attempt to do more. When we want to work with JSON in a pipeline, we pull out jq. When we want to work with XML, we pull out xmlstarlet or the like. When we want to work with delimited columnar data, we can use awk or cut. I'm failing to see what's missing from an architectural point of view.


> in principle, treating those as complex object structures is the right way to go

I do not see this as a given. It's a matter of abstraction vs performance tradeoff and that is highly subjective. Unless you pioneer a new standard form for complex object notation this'll just end up back to a format flamewar.

(And if we go that way, I'd argue for s-exprs or -dare I say it- xml)


On phone so unsure if it's been mentioned in the thread already, but I've been enjoying learning a js library JSONata [0] after finding it included in Node-RED.

I remember finding jq a few weeks ago and thinking "wow, this will probably come in handy for a specific kind of situation" and filing it mentally for later use, but I haven't used it for anything yet so I'm not super familiar with the extent of it's features.

I have been using a lot of JSONata one liners to replace several procedural functions that were doing data transforms on JSON objects, and I'm very impressed. It's a querying library but it's Turing complete - it has lambdas, it can save references to data and functions as variables, etc.

It also seems relatively new/unknown; I've found hardly any blogs or forums mentioning it. The developer is active - he fixed a bug report I submitted in less then a day.

I'd love to have that kind of functionality in a CLI tool. Maybe jq is equally powerful, I don't know.

I haven't had time to run any performance analysis on JSONata and haven't found anyone else online who's done any yet. I'm very curious how its queries compare to efficiently implemented procedural approaches.

0: http://jsonata.org/


Thanks! Will have to try out `jsonata`, as `jq` never clicked for me (too complex). Alternately I've been using a node.js program `json` [1] which has a basic but straightforward cli which covers 95% of my daily needs. I believe I found after trying to figure out the tool Joyent.com uses in there SmartOS sysadmin setup. Not sure if it's the exact same tool, but it's very similar.

As an example, basic manipulation is just:

`echo '{"age":10}' | json -e 'this.age++' #=> {"age": 11}`

1: https://github.com/trentm/json


For XML there are xml-coreutils[1].

[1] http://www.lbreyer.com/xml-coreutils.html


btw. one of my favorite config is HOCON. it allows ini style and json style.https://github.com/typesafehub/config/blob/master/HOCON.md


sexp ?

ps: I kinda like powershell (the few hours I toyed in it


S-expressions


Likewise, your text objects aren't my text objects. For a fair comparison I think we should first assume compatibility for both unstructured, and structural data. Otherwise I would challenge that your bash script is not my bash script.

The sole point of structural data is to turn the storage into a formal system, and enable more math operations on the entries, which align to the desired semantics. Python/js/clojure scripts extract some structural aspects from text, and then the story becomes the same as structural data processing.


This hits the mark right on the head. Instead of ruining the shell with unnecessary complexity because it doesn't fit every use case, how about knowing when not to use the shell?


You're right, I consider Cicada is the "old" generation shell. I intend to keep it simple (for speed etc). Like in readme, it won't introduce functions or other complex stuff. But still can add some feature for my own needs.

For modern shell, please check out xonsh: http://xon.sh/ - powered by Python - It's super cool!


> like pipes that are essentially text-only

Pipes work fine with binary data, you just need to be piping to a tool that works with binary data rather than text e.g.

cat <somebinaryfile> | openssl base64


The trouble with that example is that it could more simply be written:

openssl base64 <somebinaryfile

On the plus side, you do get a prize for suggesting it [1]

Something involving bulk binary data moving over a pipe which isn't an indirect redirection would be:

gzip somefile | ssh user@host "gunzip >somefile"

Although, again, there's a perfectly good -c flag to SSH that would do the same thing.

Probably the most common example of tool-to-tool binary communication over pipes is:

find -print0 | xargs -0

[1] http://porkmail.org/era/unix/award.html


> On the plus side, you do get a prize for suggesting it

Good catch! I actually use another command that generates binary output that I pipe to OpenSSL to convert to base64 but rather than type out the full command I thought 'what's the easiest way to get some binary data on stdout' and 'cat binaryfile' was it.


The problem with modernising the shell stack is that there are a lot of things that depends on the current stack, be it the shell (e.g. see the amount of work it's taken for Ubuntu and Debian to switch to dash as /bin/sh; and dash has a pedigree going back to the 80's), or the upper layers (try to alias "ls" to something that acts differently, and see how many tools depend on parsing its output).

You'd either have to be very careful (e.g. feature flags guarding every change) or expect a surprising amount of things to fail.


This matches my experience trying to switch the login shell from bash to fish: a surprisingly amount of things failed. macOS users won't have this problem, as the login shell is only started when Terminal/iTerm is started, but in Linux pretty much every process rely on the fact that your $SHELL is POSIX-complaint.

It seems like the way to go to use non-POSIX shell as the default shell without changing login shell, is to have .bashrc `exec`'d that shell when it is running in interactive mode. Regardless, I would really love to see bash-like shell that provides out of the box experience in the same level as fish.


How come? Having your user's shell sth. non-compliant should not affect other scripts, as they are supposed to specify what interpreter they wanna use via shebang lines; and as long as programs are run with the correct variant of the exec* functions they should be fine.


I meant the login shell, the one that get executed during console login (i.e. the "-bash" process).

The login shell need to source /etc/profile and /etc/profile.d during logins to populate environment variables. Some apps that expect those environment variable to be present wrote to /etc/profile.d with an expectation that a POSIX-compatible shell and will read the directory (by source-ing those files). The environment variable getting set there ranging from PATH, LANG, to application/distro-specific like MOZ_PLUGIN_PATH.

So when you changed the login shell to something that no longer reads /etc/profile.d (like fish, which actually say it's "interactive shell" in the man page) then application that rely on those variables will start to behave in some funny way, which was the point I was trying to make.


> what a more modern shell might look to address some of the problems of its predecessors like pipes that are essentially text-only, poor composability (see `cut` or `xargs`)

A new system and set of utilities. None of the Unix environment works in that world, and shoehorning it in would feel incredibly clunky.


If you use bytes that are invalid in UTF-8 (e.g. 0xF5-0xFF) as delimiters / structure characters, you can use text-only tools to do structured operations on UTF-8 strings without ever having to escape anything -- the structural "characters" you would need to escape can never appear in the encoded bytes.


> If you use bytes that are invalid in UTF-8 (e.g. 0xF5-0xFF) as delimiters / structure characters

No need for that. UTF-8 is a superset of ASCII, and ASCII already includes a handful of control codes that could be used here.


The Unix shell operates on binary streams, not on text.


Ah, I thought this was about structured text (like JSON) vs. plain text -- binary content is a bigger issue!


If you want to replace the shell with a structured one, the change needs to be pervasive, down to replacing the typical file formats, imo.


The approach I took to my shell was to pass JSON objects about as the primary data type but fall back to text streams when the data doesn't look like JSON.

The problem with passing objects around though is it can cause issues with the parallel nature of pipes. ie the programs in a chain of pipes can run concurrently since the data is being passed is just a stream. But with objects you need to read the object in its entirety to ensure it is a valid object. This means each process is waiting for the previous process to finish outputting it's object before the next can start processing it. While this is less of an issue with smaller chunks of data in a chain, it would become really noticeable really quickly as your data scales up.

I'm sure will be workarounds for the above problem but my project is still young so there are other bugs I'm more focused on (for me I'm more interested in getting an IDE-level auto-complete implemented so writing one liners in the Shell is as natural as writing your favourite language in your preferred development tool.


The problems could be tackled nicely by a clean functional solution like Iteratees[1], maybe with a hight-level API like Reactive Streams[2].

[1] https://github.com/playframework/play-iteratees [2] http://www.reactive-streams.org/


Thank you, I will take a read of them.


What sort of features do you feel that shell needs to support structured data? I've toyed with this idea, and in my view the shell itself has relatively minor role here. What is needed are utilities that produce and process structured data, and a terminal that can display it. But shell, it just glues the pieces together and doesn't really need to be aware of the data and the structure of it.


Here's a very simple example that I'd consider the holy grail. Say I want run `ls -lh` on a directory:

    -rw-r--r--   1 brandur  staff   6.5K Mar 25 15:37 Makefile
    -rw-r--r--   1 brandur  staff    63B Jul 10  2016 Procfile
    -rw-r--r--   1 brandur  staff   1.6K Mar 11 07:20 README.md
    drwxr-xr-x   4 brandur  staff   136B Oct 16  2016 assets/
    drwxr-xr-x   4 brandur  staff   136B Jul 15  2016 atom/
    drwxr-xr-x   4 brandur  staff   136B Apr 26  2016 cmd/
I'm interested in getting the date fields out of this structure. Currently that's possible with some disgusting use of `awk` or maybe `cut`.

What if I could do something like `ls -lh | select 6` (select the 6th column of data):

    Mar 25 15:37
    Jul 10  2016
    Mar 11 07:20
    Oct 16  2016
    Jul 15  2016
    Apr 26  2016
It doesn't matter that the date comes in three parts and would act as three separate tokens for `awk` and `cut` because in my structured object backend, it's all just one logical timestamp.

Now say I want to sort these dates and pick the earliest. Once again, shells make this extremely difficult. Because the dates are not lexigraphically orderable, I'd have to use some fancy parsing, or try to step back to `ls` and have it somehow order by date.

Imagine if I could instead to something like `ls -lh | select 6 | sort | first`:

    Apr 26  2016
In this case, although the shell is still displaying a string to me that's not lexigraphically orderable, the data is being passed around in the backend in a structured way and these timestamps are actually real timestamps. It's then trivial for the next utility to apply ordering.

The part that the shell would have to provide (as opposed to the programs) is a "smart pipe" that understands more than just a basic text stream. It would also have to know when there is no pipe connected, and print as pretty strings when it knows the output is going to `STDOUT`.


Not quite a "Unix" shell, but the [Ammonite Scala Shell](http://ammonite.io/#Ammonite-Shell) lets you do this trivially:

    lihaoyi Ammonite$ amm
    @ import ammonite.ops._
    import ammonite.ops._
    @ ls! pwd
    res1: LsSeq =
    ".git"              'LICENSE            'ci                 'project            'sshd
    ".gitignore"        'amm                'integration        'readme             'target
    ".idea"             "appveyor.yml"      "internals-docs"    "readme.md"         'terminal
    ".travis.yml"       "build.sbt"         'ops                'shell
    
    @ ls! pwd | (_.mtime)
    res2: Seq[java.nio.file.attribute.FileTime] = List(
      2017-06-21T12:24:41Z,
      2017-06-11T13:48:45Z,
      2017-06-21T12:29:05Z,
      2017-06-11T13:48:45Z,
      2017-05-01T03:41:14Z,
      2017-06-18T14:06:33Z,
      2017-06-11T13:48:45Z,
      2017-06-18T08:01:20Z,
      2017-06-18T12:34:00Z,
      2017-06-19T06:05:51Z,
      2017-06-18T09:06:07Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:15Z,
      2017-06-18T14:07:20Z,
      2017-06-11T13:48:45Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:33Z,
      2017-06-19T06:06:03Z,
      2017-06-18T14:06:33Z
    )
    @ ls! pwd | (_.mtime) sorted
    res3: Seq[java.nio.file.attribute.FileTime] = List(
      2017-05-01T03:41:14Z,
      2017-06-11T13:48:45Z,
      2017-06-11T13:48:45Z,
      2017-06-11T13:48:45Z,
      2017-06-11T13:48:45Z,
      2017-06-18T08:01:20Z,
      2017-06-18T09:06:07Z,
      2017-06-18T12:34:00Z,
      2017-06-18T14:06:15Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:06:33Z,
      2017-06-18T14:07:20Z,
      2017-06-19T06:05:51Z,
      2017-06-19T06:06:03Z,
      2017-06-21T12:24:41Z,
      2017-06-21T12:29:05Z
    )
    @ ls! pwd | (_.mtime) min
    res4: java.nio.file.attribute.FileTime = 2017-05-01T03:41:14Z
    @ ls! pwd | (_.mtime) max
    res5: java.nio.file.attribute.FileTime = 2017-06-21T12:29:05Z

    @ ls! pwd maxBy (_.mtime)
    res6: Path = root/'Users/'lihaoyi/'Dropbox/'Github/'Ammonite/".idea"
Ammonite certainly has problems around JVM memory usage and startup times, but it has a sweet spot for this sort of not-quite-trivial filesystem operations that are a pain to do in Bash but too small to be worth futzing with a Python script


Doesn't ammonite suffer from the same "problem" as powershell that it is limited to single runtime (namely JVM) and single process? In contrast in traditional Unixy pipelines each component runs in its own process and can be programmed in any language.


You can also do it in PowerShell, which is on linux now.

  PS C:\Users\erk> Get-ChildItem | Sort-Object LastWriteTime | Select-Object -first 1
  
  
      Directory: C:\Users\erk
  
  
  Mode                LastWriteTime         Length Name
  ----                -------------         ------ ----
  d-----       17-06-2016     16:08                Tracing


Part of me thinks that PowerShell has seen less adoption in the Unix world because of the prevalence of camel case. The few times I used PowerShell, the discoverability of what I could do was low and the verbosity of examples was high. It took me too long to do my most common tasks.


I've seen several people who like PS say its discoverability is actually pretty great compared to Unix shells - for instance, the convention for cmdlet names is always "verb-noun" ("action-object") and (maybe, don't quote me on this) the set of blessed verbs people are supposed to use in cmdlet names is fairly small. So, when you want to do something, you try the verbs that seem closest to what you want to do and the names of the (usually application specific) objects you're working with and that gets you quite a long way down the road.

Also I think PS commands aren't case sensitive, though for readability one would probably want to keep the camelcase anyway.


Yeah PowerShell is not case sensitive, and it also have a lot of unix-like aliases you can use, Another way to write the command of post is:

  dir | sort LastWriteTime | select -first 1
It gives the same output.


But if we have a program that produces structured data ("ls"/"get-childitem") and a program that similarly consumes structured data ("select-column"), then I'm not sure what sort of "smartness" you would need from the pipe in between? Why would the shell need to understand that the data is somehow more than just stream of bytes?

As far as pretty-printing the final output to the user, in my opinion that is something that is better handled at terminal level instead of the shell.

Of course these are just comments based on how I imagine a reasonable object shell would function, and I really would like to hear more views on this subject because it is very likely that I might have overlooked something essential.


'ls -lh | column --table --output-separator "," | cut --delimiter="," --fields=6-8 --output-delimiter=" " | sort --month-sort'

I statred this to show how easy it was, but it quickly became pretty hacky.


Your point is taken, but in this particular example, you should just use the right tool for the job. In this case, the "stat" utility. For example:

  > stat --format='%z' *
  2016-05-27 12:39:46.559137232 -0300
  2015-07-26 14:37:51.714193856 -0300
  2016-04-15 19:55:33.329346654 -0300
  2016-03-21 02:59:05.377041620 -0300
  2015-11-22 19:27:56.868541801 -0300
Sorting them and picking the earliest would just be:

  > stat --format='%z' * | sort -n | head -n 1
  2015-07-26 14:37:51.714193856 -0700


Powershell?


A modern shell with properly structured data streams would look like caml-shcaml:

http://users.eecs.northwestern.edu/~jesse/pubs/caml-shcaml/

PowerShell definitely made a lot of mistakes they should have known needed fixing. Like the need for signed scripts.


>Imagine a future where we're piping structured objects between programs, scripting in a modern language

May not be quite the same as what you mean, but check out osh (a Python tool, a sort of shell, for doing distributed command-line and scripting operations on clusters of nodes). I had blogged about osh briefly here:

Some ways of doing UNIX-style pipes in Python:

https://jugad2.blogspot.in/2011/09/some-ways-of-doing-unix-s...

Interestingly, later on, when I blogged this:

Akiban, new database, supported by SQLAlchemy:

https://jugad2.blogspot.in/2012/10/akiban-new-database-suppo...

the creator of osh, Jack Orenstein, who was then at Akiban (the company behind the Akiban product), commented on that post, giving more details about both osh and what it was used for at Archivas, later acquired by Hitachi Data Systems, and also said something about Akiban. And now I just saw by googling that it (Akiban) was acquired by FoundationDB.


I certainly did like the idea behind Powershell's structured-object pipelines, but it relies on that "COM baggage" and "elaborate syntax" that you say are its downsides. I don't think you can achieve the goal behind Powershell of a universal standard binary object format without an elaborate infrastructure like COM and without making a bunch of assumptions and requirements on how those objects must look and behave, which will vastly limit the usefulness of the pipeline.

All of that said... nothing is stopping you from piping binary objects between tools in Unix shells. In fact, it works great, and I'm curious what functionality you are looking for that doesn't exist. The standard tools like cut and grep don't work great with binary data, but they aren't meant to. There are format-specific tools aplenty, and stuff like bbe exist for generic work. And for structured text, tools like `jq` are phenomenal.


Powershell is available on Linux these days, it was in 'alpha' version for a long time, apparently it's now 'beta'

https://github.com/PowerShell/PowerShell


You may be interested in https://github.com/xonsh/xonsh


Personally the only structured data i have an urge to work with pipes are json via jq.

However, you should check out fish; it improves on bash/zsh/whatever syntax in sane ways.


You've remindeded me of this talk: https://www.destroyallsoftware.com/talks/a-whole-new-world

Near the end, he starts talking about terminals and making a new terminal standard (~ 17:30 mins in).


In terms of interactive session,s I’ve always wanted an asynchronous shell thta gave me a prompt back as soon as it started executing the command. It would have real job control, letting me see what was running and page through the output separately. Idk.


It seems a bit premature to say that PowerShell was passed over. The journey outside Windows has just started.

Indeed, PowerShell is onto something. Such delight it is to work with. Especially when you start to build your own tools for it.


Powershell made me actually switch to Windows. I was playing with WSL when it came out and learn of Powershell. WSL has some warts but Powershell means I cannot use Linux anymore :(

I'm hoping Powershell on Linux will soon be at par with the Windows counterpart (right now it's pretty buggy)


It's been 16 years.


16 years that it has only been available for Windows, the parent says.


Not to be condescending, but looking at the feature set, this has a long way to go before becoming a viable bash replacement. I don't see the amazing aspect to be frank.


Security? I would love to see whole classes of bug eradicated (stack overflow, use after free, memory leaks,...)


hotwire tried it too. it's been dead for awhile though. https://code.google.com/archive/p/hotwire-shell/wikis


One thing lacking in traditional shell utilities is difficulty having a universally understood way of passing metadata about the content of the pipes/streams. Other commentators have (rightly!) pointed out the difficulties of requiring a single "structured object" in the core of the ecosystem... but if there were a way to gracefully support multiple structured object formats and let tooling build on top of that.

Got me wondering what would happen if we embedded modified form of HTTP response headers into the beginning of every UNIX command line stream. Let's tag this version `HTTP/UNIX.1`. This is to differentiate that this HTTP isn't due to a request, but as part of a standard 'UNIX' command response. Otherwise programs could follow RFC2616 section for HTTP response formats [1].

Speaking to the existing ecosystem of tooling, it'd be straightforward (trivial?) to add native support into existing shells to support this. For example now when you run `ls` your shell settings would be `SHELL_DISPLAY_HTTP_UNIX_PRETTY="terminal/text:terminal/json"`. Programs without native support built in could be handled by adding a "http_strip" program. Otherwise adding support into command line programs would be simple as libraries to handle HTTP/1.1 exist in most any language/platform and only a the "Response" header section would be needed of those. Alternately a generic wrapper could created to handle the http response info.

Examples:

    HTTP/UNIX.1 200 OK
    Date: Mon, 27 Jul 2009 12:28:53 GMT
    Server: ifconfig lo0
    Content-Length: 248
    Content-Type: terminal/text
    Connection: Closed
    lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    	options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
    	inet 127.0.0.1 netmask 0xff000000
    	inet6 ::1 prefixlen 128
    	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
    	nd6 options=201<PERFORMNUD,DAD>
Next:

    HTTP/UNIX.1 200 OK
    Date: Mon, 27 Jul 2009 12:28:53 GMT
    Server: sysinfo 
    Content-Length: 107
    Content-Type: terminal/json
    Connection: Closed
    {
      "aggr0": {
        "LACP mode": "active",
        "Interfaces": [
          "e1000g1",
          "e1000g0"
        ]
      }
    }
    
Given this system could be in place, many tools could start adopting "json" or "messagepack" or other structured formats and generic tools could be built to translate in between formats. This is exacerbated now in that there's no way for programs to say "I'm outputting JSON" and "My json is using this json-schema://unix/some/random/filedescription/system". Or tools like `xargs` and `find` could attach header metadata of "Null-Terminator: \0" or "Null-Terminator: \n". Currently shell tools rely on command flags to set that, but if that feature flag could also be set by "Null-Terminator" header flags in the input/output streams it'd make the shell a lot more intuitive without reduce visibility. For example a common `find -print0 /etc | header_inspect` could be used to override shell settings and print out the header.

1: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html

edit: formatting whitespace?


One might not need native support to get started: with a wrapper that recognized the coreutils, for example, one could translate the data appropriately.

    corehttp ls -l ...


Seems like it'd be worth trying. Been wanting an excuse to try Rist.


There's also this project:

https://github.com/redox-os/ion

It's intially build with RedoxOS (in Rust as well) underneath, but now also runs on Linux. Seems to be a bit more mature.

I wonder what sets these two projects apart, and/or if the devs of Casada know of Ion's existence.


Cicada seems to be more POSIX-y, while Ion "it is not, nor will it ever be, compliant with POSIX".

What they have in common though… is that job control (Ctrl+Z) is not implemented yet :(


I've been giving thought lately to writing my own shell, because there's stuff I want to fix that no current shell does right, but job control is one of the big reasons why I haven't actually started writing code yet (namely, I haven't yet figured out how to reconcile the stuff I want to do with the need to let the user suspend or background tasks including shell scripts).


Apologies for the off-topic nature of this, but:

Something that's been capturing my imagination recently is sshing into a machine, pushing an appropriate shell binary to /tmp, and then executing it. If you have an exotic preference for shell (or vim config, or whatever), we've presumably generally got sufficient bandwidth these days to do a virtually instant user-space install and execution, no?


With a judicious copying of certain dotfiles and directories, this should be quite possible. Just put your shell in ${HOME}/bin, and set your ${HOME}/.ssh/authorized_keys to execute that shell on login, and you should be in good shape.

Just expect a few questions from the admins when you do this.


I wouldn't copy binaries between arbitrary systems (specific systems is fine), but copying configs is pretty easy with SSH as-is:

  #!/bin/bash -x
  HOST=$1
  scp -o ControlMaster=auto -o ControlPersist=30 ~/.vimrc $HOST:~/.vimrc
  ssh -o ControlMaster=auto -o ControlPersist=30 $HOST
With the above, you will only create one SSH connection, and you will only authenticate once.


The team I work on had similar needs†, so we built a tool, atop the Python library fabric, to sync a project to (multiple) remote machines, build it, and execute it with configurable command line arguments.

http://github.com/Oblong/obi

†: the specific use-case was for development of multi-machine GUI applications that run in configurations like "display walls" or "immersion rooms". for every developer to have their own "display wall" or "immersion room" would cost lots of $$$, so many of the design decisions were based around the needs of a team of N developers that "shares" these systems.


Arx is intended to allow exactly this instant user-space install and execution: https://github.com/solidsnack/arx


That's a great link, thanks


Looks fully featured! Cool stuff.

I think people are going to ask "Why?" and "What's the difference between this and bash?" Might want to cover that in your README.


The fact that there is a "Won't do list" stating that functions won't be supported...

I don't think this is anything more than a toy to play with rust.


Generally speaking anytime you start to breakout bash/zsh/sh/csh functions you've already entered a zone where just writing a python/perl script would be the easier and a more maintainable solution in the long term.


Not if I want to actually affect the current shell environment.

For example:

    export MARKPATH=$HOME/.marks
    function jump {
        cd -P $MARKPATH/$1 2> /dev/null || (echo "No such mark: $1" && marks)
    }
    function mark {
        mkdir -p $MARKPATH; ln -s $(pwd) $MARKPATH/$1
    }
    function unmark {
        rm -i $MARKPATH/$1
    }
    function marks {
        ls -l $MARKPATH | sed 's/  / /g' | cut -d' ' -f9- && echo
    }
    _jump()
    {
        local cur=${COMP_WORDS[COMP_CWORD]}
        COMPREPLY=( $(compgen -W "$( ls $MARKPATH )" -- $cur) )
    }
    complete -F _jump jump
This would be much harder to do in a script.


I have a dozen of functions in my bashrc that are basically a bit more sophisticated aliases. Why would I want to load a whole interpreter to run them while bash can easily do that?


> I have a dozen of functions in my bashrc that are basically a bit more sophisticated aliases.

Me too. Looking at them, they all start with something like

    test $# -eq 2 || return
...since in sh/bash you cannot even name your function's arguments or indicate how many you expect to receive.

Speaking of primitiveness, in strict POSIX sh, there's no such thing as local variables in functions!!


Shell functions are useful for much more than scripting. They are like aliases++, letting you add small commands to your shell where aliases don't quite cut it. They let you handle arguments, export environment variables, use conditionals, etc. I have 100 or so that I've added over the years. An interactive shell without functions would not cut it.


Or the author justifiably thinks that a shell shouldn't support functions?


Why shouldn't it? The shell is arguably the most convenient way to interact with your system's programs, seconded only by something like Perl. That level of interaction ought to mean you can write functions such as for common tasks. Or should I now have to use Perl for that?


Because fragile shell scripts have been an endless source of security vulnerabilities and bugs.

I would probably use Python, Go or Powershell.


You can use functions outside of shell scripts - they can serve as a better kind of alias, or as a way of modifying the current shell without having to type the '.' before the script.


Shells languages arguably should be usable programming languages. So start with a good, expressive one add add scripting as a library:

http://users.eecs.northwestern.edu/~jesse/pubs/caml-shcaml/

Incidentally, that would also dramatically decreased security and maintainability problems.


This.


Escaping and passing strings as arguments (without losing e.g. " characters) is often a problem. It would be so nice if this would be solved in a shell the way it works for function calls in propper programming languages like Python or Rust (i.e. various ways to specify strings with special characters and a good format method for string composition) with Haskell-like function call syntax.


And lists, and maths, and error handling, and ...

To fix all those things you wouldn't end up with a traditional Unix shell, which I think would be a great project, but there are clearly two mutually exclusive projects to do:

1. A Unix-like shell (this)

2. A robust shell suitable for writing scripts


Note: Rust environment is needed for installation.

Why?


A rust installation is needed due to you having to compile it from source manually.

If binaries were to be distributed, a rust installation would not be needed.


Yes, I haven't provided the pre-built binaries yet. Maybe I should..


Check out snapcraft, it can build rust binaries!


Oh, bummer, I was really hoping it would support Windows.


You need xonsh - http://xon.sh/ :)


" Won't do list * functions * Windows support "

/cry


Being able to do arithmetic right in the shell is handy.


  $ echo $((1 + 2 * 3 - 4))
  3


With zsh, you can even save a few keystrokes.

  function c() {
    echo $(($@))
  }
  alias c='noglob c'


zsh can handle floating point arithmetic too, while bash can't.


They way they chose to do it though seems to create some ambiguity. Like, what does this do? I can think of 3 distinctly different things it could do.

$ echo test 1 + 2>err

Or what if there are files with these names?

$ cp /bin/grep 42

$ 42 + 1


I just have to say, while I respect the work and have been looking forward to nix tools being ported to rust, the first thing I do is see if it's GPL or not and if it's not it immediately loses points (not all of them), because I have been working hard to free up my stacks. I wish people would consider making core system tools like this GPL instead of MIT/BSD styles. Tivoization is a real threat to the freedom of users and devs. (which is what mit license enables, despite it being listed as "gpl compat")

For example, I forced myself to learn screen really well because even though I like tmux and it has some features lacking in screen, I would rather support a tool I know is more aligned with freedom for the user. Same with i3 vs awesome, etc.

An interesting side benefit of this is I have noticed the complexity of my stack has been reduced because of this selectiveness.


I just have to say, while I respect the work put into tools of the sort that you use, the first thing I do is see if it's GPL or not and if it is it immediately loses points (very nearly all of them), because GPL is awful. If I'm going to use a tool like this, I'm going to want to be able to check out the code, perhaps submit PRs, etc, but if it's GPL then I don't want to so much as look at the code because GPL is viral and aggressively removes so many developer freedoms.

In fact, I genuinely don't understand the "freedom for the user" aspect. The only "freedoms" the GPL cares about are freedoms for developers (end users don't care in the slightest about whether they can get the source code, etc), but GPL is far and away the most restrictive license I've ever seen in terms of taking away developer freedoms. So when people say the GPL provides freedom for the user, it makes no sense, because what it's doing is taking away freedoms from anyone who actually cares about source access. In fact, all the GPL really seems to do is protect the "freedom" of the original developer to ensure access to any changes made by other people, at the cost of taking away the freedoms of all of these other people.


There are two points I want to address.

1. "the GPL really seems to do is protect the "freedom" of the original developer to ensure access to any changes made by other people" The GPL literally does not do that, the "other people" only have to give the modified sources to people who get the binary from them; they don't have to give the modified sources back to the original developer. As a non-theoretical example, the Grsec guys only give their Linux kernel modifications to their customers, and do not give the modifications back upstream to Linux. It's about making sure that the end user gets access to the code, which I know you said is silly because the users "don't care in the slightest", but...

2. You re-defining user to be someone who doesn't look at the code is silly. Given a piece of software I have absolutely no intention of being a "developer" for, I would still like to receive the source code. I would like to be able to study it to figure out how it works, the same as when I took apart and studied clocks as a kid (an activity that didn't magically transmute me from a "clock user" into a "clock maker"!). When some software on my computer (or the computer of a friend or family member) breaks, I would like to be able to pop open the source and see what's going on, the same as when I pop the hood when something goes wrong with a car--I'm not interacting with that software as a "developer", just as I'm not interacting with the car as a "car maker". Sure, the fact that I am a developer means I'm pretty qualified to know what I'm looking at when I look at the source, just as being an engineer at an auto company would make me pretty qualified to know what I'm looking at when I pop the hood of my car. But fundamentally, the relationship I have with that software/car is that of a user, not that of a developer. And even if I weren't a programmer, I would want to receive the source, so that when it breaks, and I ask my nephew or whoever to look at it, that he can see the source and isn't locked out from helping me out. We don't call farmers "engineers" for fighting for the "right to repair" their own farm equipment; and we shouldn't call computer users "developers" for fighting for the right to repair their own computers.


No, "freedom for the user" is exactly what it says. GPL cares about end user, not so much about developer. Original developer just gets to decide which group is more important in his particular case.


Read what I said. The "end user" doesn't care in the slightest about source access. Everything the GPL is concerned about only matters to other developers, not to end users.


As an end-user, I do. Being prevented from fixing my own (expensive) device is not something I would voluntarily subject myself to. For context, here's a comment I posted a while ago (https://news.ycombinator.com/item?id=13527205):

I used to be a big fan of permissive licenses until I bought a $700+ android phone a couple of years back and discovered that it did not "support" my native language (it could render the glyphs but system-wide support was not enabled).

Having extensive experience with unicode and how text is usually rendered, I knew exactly how to fix the issue; the fix was likely as simple as injecting an SO that hijacks a specific system library function. However, because the phone was locked down, I was unable to fix the problem myself. All important system apps including SMS and the browser displayed gibberish.

It was the most expensive brick I ever bought. This experience taught me the true value of the GPL and why user freedom far outweighs the freedom of developers


> Being prevented from fixing my own (expensive) device is not something I would voluntarily subject myself to

If you can fix it, you're a developer, not a plain old user.


You're suggesting that developers can't be users? Also, end users don't have to be the ones to fix things to benefit. If they're unhappy with something, they would have the freedom to pay a developer to customize their software however they like, or even get a skilled friend to work on it.

Being able to work on my own car is an important freedom, even if I don't know anything about cars. I can get a knowledgeable friend to look at it, or I can hire a mechanic, and it doesn't have to be a mechanic from the manufacturer.

Also, it doesn't matter how much of the population uses their freedom for it to be important.


In fact, the complexity of your system has probably increased. I find the quality of software generally available under the GPL is inferior to that available under more permissive licenses, and in most cases that's due to huge amounts of unnecessary complexity. This may just be bias introduced by GNU, though. In some cases the GPL gets in the way of progress, such as the decable over GCC being able to export its AST for other tools to interact with - a feature RMS rejected on the grounds that it would make it possible to make nonfree tools that integrate with GCC.


The GCC AST thing was due to politics inside GNU, not explicitly related to the GPL.


I'm not too familiar with licensing, can someone ELI5? Eg, normally I just choose MIT because licensing is not a concern of mine. I'd like credit, but beyond that, do whatever the hell you want with my work. Now, I'm not making core nix tools, but should I be choosing a different license?

Man do I hate licensing.


BSD style licences dont require source code changes to be released. For example, the PS4 OS is based in part on FreeBSD, which is distributed as binaries. Sony is under no obligation to release any source code changes or contribute anything upstream. Copyright holders for people who wrote FreeBSD code have no legal standing to sue for access to these changes either (I think, but I am not a lawyer).

By contrast Android devices all run the Linux Kernel, which is licenced under the GPLv2. Android device makers are obligated to release the kernel code they use, so users or upstream developers could use it. It's a bit more complicated than that, because the code doesnt necessarily have to be able to be loaded on the device (GPLv2 doesn't say anything about locked bootloaders or cryptographic signing, for example, and binary blobs that work in tandem with GPL code are a bit of a grey area, as far as I know).

Parent commenter prefers the latter style, or possibly even GPLv3, which imposes additional restrictions on what you can do. See Tivoization [0].

[0]https://en.wikipedia.org/wiki/Tivoization


"The Software shall be used for Good, not Evil." clause made JSLint's license incompatible with MIT [1].

"He has, however, granted "IBM, its customers, partners, and minions" permission "to use JSLint for evil", a solution which appeared to satisfy IBM's lawyers."

:D

Given IBM's history of making Jew-counting machines for the Germans during WWII, that exception is particularly disturbing.

Also JSON [2].

[1] https://en.wikipedia.org/wiki/Douglas_Crockford

[2] http://www.json.org/license.html


> "The Software shall be used for Good, not Evil." clause made JSLint's license incompatible with MIT [1].

This is so frustrating. One of Crockford's most persuasive points in his writing and lectures about Javascript is that computer programs are no place for self-expression. When he explains what is wrong with the language, he gives astute suggestions on foolproof ways to workaround the language's weak points by avoiding ambiguity. He's even written JSLint to help the programmer do just that without having to research every idiosyncratic pitfall that would come from just winging it.

Yet here he is-- in a domain for which he has absolutely no expertise-- expressing himself in a licensing clause. Causing measurable hours of waste because Debian or IBM or insert-org-here has no experience with his idiosyncratic license and no easy way to predict what are its effects.


Any recommendation of a "default" license to use if you don't care too much about licensing will always carry strong opinions of whoever is recommending it. So, instead, here's a decision tree of good recommendations based on the two biggest decisions that factor in to choosing a license:

                          start
                            |
                            V
           if I modify it, and give someone a
          compiled binary of that, should I also
         have to give (or offer) them the source?
                       |        |
                   yes |        | no
                       |        V
                       |      MIT/X11 license
                       |     (alternatively,
                       |      Apache 2.0, or
                       |      2- or 3-clause BSD)
                       V
                is it a library
                 or a program?
                   /      \
          library /        `----, program
                 V               \
    can it be used by programs    \
     that DON'T give/offer the     \
       source with binaries?        \
           |         \               \
       yes |          \ no            V
           V           `-----------> GNU GPL
        GNU LGPL                    (alternatively, GNU AGPL)
Whether or not you think the "default" choice to each question should be "yes" or "no" is something that people argue about plenty; whichever you choose, someone will tell you you chose wrong.

As for the "alternatives" in parenthesis at the leaves of the chart: if you don't care about licenses, the default is probably fine, but if someone told you that you should use one of the alternatives at that leaf, I wouldn't argue. (Apache 2.0 comes in over MIT if you care about patents, AGPL comes in over GPL if you care about SaaSS)


It's a good chart, but I'd add: is the source code meant to be an example/educational and/or is trivial[1]: cc0 - somewhere on the top right.

[1] by definition trivial code isn't copyrightable - but "trivial" is subjective. If you, as the author feel it is, might as well clearly signal that and use cc0.


If you don't care about licensing, MIT, or BSD, is a fantastic choice. Definitely don't choose GPL. The only people who should ever choose GPL are people who do care about licensing and have specifically made the decision that they like what the GPL does. But if you don't care about licensing, then you can't make an informed choice about whether GPL is appropriate, and it's far better to err on the side of using a permissive license (e.g. MIT or BSD) than using a restrictive viral license like GPL.


If you don't care about licensing, GPL is a fantastic choice. Definitely don't choose MIT or BSD. The only people who should ever choose MIT are people who do care about licensing and have specifically made the decision that they like what the MIT license does. But if you don't care about licensing, then you can't make an informed choice about whether MIT is appropriate, and it's far better to err on the side of using a copyleft license (e.g. GNU) than using a permissive license like MIT.

You've just said "if you can't make an informed decision, then you should make the choice that I agree with" (from other comments, you clearly are someone who does care about licensing, and don't like the GPL). It works just as well both ways. You even slipped in some loaded propaganda words ("restrictive viral"), just in case anyone had any doubt that you weren't offering objective advice.

Fundamentally, there's a choice, and there's no getting around that with "if you don't care" defaults.


No, I said if you can't make an informed decision, go with the licenses that are widely regarded as being the least restrictive.

> You even slipped in some loaded propaganda words

GPL proponents describe the GPL as viral, and I'd be surprised if anyone tried to argue that it's not a restrictive license, so I'd hardly call that "propaganda".

> If you don't care about licensing, GPL is a fantastic choice.

This is incredibly wrong, and it's kind of horrifying to me to see you try and trick uninformed people into picking something viral and restrictive like the GPL. You should be ashamed of yourself.

If you don't care about licensing, do not pick GPL. The license exists specifically for people that believe in what that license does, and it's literally the worst possible license choice for someone who doesn't care about the license.


> it's literally the worst possible license choice for someone who doesn't care about the license.

Why is that? Because with the GPL you can't as easily capitalize commercially on the work of others at no cost? Or because picking a more permissive permissive license means that other people can keep their improvements to the source you provided proprietary?

The GPL seems like the safest choice if you don't know what to pick.


It's only "safest" in that it puts a bunch of restrictions that the original developer hopefully cares about. If the original developer doesn't care about licensing, then they don't care about those restrictions and so having them in place is counterproductive.

GPL is the worst choice because it's basically the most restrictive license I can think of, and it's intentionally viral, which makes GPL-licensed software dangerous to work with (the source of) by anyone who hasn't already bought into the GPL ecosystem.


I'm a big proponent/believer in copyleft - to the point that I would often recommend gpl v3 or agpl. But I also think you're right - if you don't know/care, permissive is the way to go.

But I would actually recommend cc0. Especially for small, trivial projects - I guess I can see that people want attribution - but I think cc0 can sometimes makes things much easier. This is especially true for projects that are meant to be educational/example code.

It clears up any and all confusions about copy-pasting and so on. The main counterpoint is that mit, apache (Apl) and 2/3-clause bsd are all well-known.

If I had to recommend just one licence it would probably be Apl, due to the patent grant.


I'm agree that something like cc0 is a good idea for sample code, or code that is otherwise intended to be copy&pasted. I don't think libraries are usually intended to be copy&pasted though, so a license like MIT or BSD is more appropriate.

I'm not familiar with the Apl license.

My personal preference at this point is to dual-license under both The Apache License, Version 2.0 and MIT. The Apache license has a patent grant and it also has the nice property where it doesn't include the name of the copyright holder so you only need one copy of the license even if you're using 20 different Apache-licensed libraries. And the dual license under MIT is just because MIT is a simpler and more well-known license, so this is to avoid scaring off anyone who isn't familiar with the Apache license.


Apl (or rather "APL"): apache public licence :)


AFAIK there is no license called the "apache public license". There is the "Apache License, Version 2.0" and the older "Apache License, Version 1.1". I actually searched DDG for "APL license" and came up with something I'd never seen before called the "Adaptive Public License".


Might be an outdated acronym:

https://opensource.org/licenses/apachepl.php (Uses APL)

https://en.wikipedia.org/wiki/Apache_License (Uses AL)

[Ed: Or indeed, just an old misspelling/error that I've repeated here. Anyway - I meant the Apache License v2]




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: