Hacker News new | past | comments | ask | show | jobs | submit login
The Bash Hackers Wiki (bash-hackers.org)
304 points by dvfjsdhgfv 39 days ago | hide | past | web | favorite | 88 comments



Bash was the first scripting language I ever got "good" with.

(I put "good" in quotes because while I could get basically anything done, I promise that it was not good looking).

However, I feel like it left me a bit brain damaged, I still think in terms of quotation quirks and "sub-shells" even in proper languages such as python or rust.

It also left me a bit weird when thinking about concurrency, because I never consider IPC, I just think in terms of jobs+join so my code ends up performing as a series of bottlenecks as everything rejoins the main thread.

That said; getting good with bash has helped my career a lot as a sysadmin; but maybe this is part of why sysadmins struggle with real programming.


I've come to use more and more Python for tasks I would have used Shell in the past. Because of weird syntax with anything but simple variables or endless fiddling with quoting.

You can get a lot done quite easily and concise in Python 3, for instance with the pathlib module that has methods for recursive globbing and stuff like that.

https://docs.python.org/3.7/library/pathlib.html


The nice thing about Bash which can obviously not be said about Python is that Bash scripts will keep working (assuming whatever they call is also there of course) since Bash goes out of its way to preserve backwards compatibility. Considering its legacy, Bash can run scripts more than a decade before Python even existed.

The only other scripting language i know of that has similar backwards compatibility is Tcl. Perhaps Perl too (ignoring Perl 6 that was renamed to something else again due to it being practically a different language).


I guess it's the trade off between backwards compatibility and readability/correctness/maintainability. Both languages will have technical debt overtime. In my opinion, I would much rather have to rarely update python scripts in exchange to for peace of mind that I can fix/update the script as need be. In my experience, *sh scripts can be fragile and hard to ensure correctness.


Having migrated something recently from python2 to python3, I'll take bash. I actually ported two of the three scripts to bash, and the third one I reluctantly moved to python3 (I needed python-pil). String handling in python3 in my opinion is a tire fire compared to bash.


> Perl 6 that was renamed to something else

Indeed, it has been renamed to Raku (https://raku.org using the #rakulang tag on social media).


Python needs to be installed, maintained, and every single library will have its list of dependencies (along with breaking API changes once in a while). There's a reason Bash exists: it's everywhere by default and you know it will still work as it works now in x years.


I'm sorry, but for the type of work he is suggesting (bash script equivalent), you don't need anything besides the standard library. I cannot think of a single system I have used in recent decades that didn't come with python installed by default. For all intents and purposes, Python is as ubiquitous as bash at this point.


Balderdash and hokum. z=1 ; while test ${z} -lt 4 ; do a=${RANDOM:1:3} ; dd if=/dev/urandom count=12 bs=4 of=${a} && b=$(cat ${a}) && echo ${b:$(expr 1 + ${RANDOM} % 4):12} ; ((z++)); done

You have one tool that you know well. I have thousands of tools that I've learned daily and I wrap with a bash script..except when I need python/tcl or C. Most of what I do in python/tcl/C is truly complicated, needs to be fast or use a data structure other than an array: as it should be. Most of what I do in the shell is as simple as the userspace tools and bash will let it be. Write the one liner above in python without importing a library.


Pretty sure it's just this:

> python -c "for i in range(4): print(__import__('os').urandom(12)[__import__('random').randint(1,4):12])"

Or something like it, that bash one liner is pretty unreadable and full of side effects that may or may not be needed. A small 3 line python file would be cleaner and more readable.


TIL about Python __import__ function. Thanks!


What's the point of not importing a library/module? It's not like cheating or something.


Eh. Pros and cons.

I think most people are spoiled by being able to install anything; I work in a locked down environment and getting anything non-sanctioned in is very difficult, and the sanctioning process is tedious if nothing else.

That, and the fact that libraries are evolving- which is why we end up with horrible things like package/library version pinning.


Sure, but we're talking (with the problem at hand) about Python standard libraries that get shipped with the Python distribution and not some NPM or pip universe.

For instance the "os" module has an "urandom" function that does essentially the same as the "dd" in the bash script. Of course you could also read from the device so you wouldn't need to import "os". You would need to import "random" to get access to the random number generator, though, but it's still part of the standard library.


We use AIX at work and it didn't have Python installed.

We had to install it for a code review tool I built.

But despite Bash's weird syntax, I love how we can do very complex tasks within a few lines.


My phone and router have bash.

Even if pyhton is installed it isn't certain if it is v.2 or 3.


My router has busybox (ash), but not bash.

Neither bash nor python seem to be preinstalled on BSDs.


Sounds like Stockholm syndrome to me!


That's great, as long as you're happy with the POSIX Unix utilities (probably, wouldn't bet my life on it) and whatever Bash 3.x provides.

This can break quite fast if you rely on something from Bash 4 (released 11 years ago, still not on OS X by default), or some command-line tools that's either not installed or uses switches/functions only available in their GNU incarnations.

For anything that involves more than one file or with a line count going into triple digits, I wouldn't recommend bash. And heck, for those kind of scripts, I'd go one step further and stick with plain 'sh', just to be really sure. Bash adds little. GNU missed the train of transitioning to something like ksh93, probably because RMS thought that everyone would be using Guile once Hurd is out. (I once worked with a pretty large piece of software for packaking/administration mostly written in ksh93, as this was more palatable for commercial Unix users -- not as important these days where you won't run into AIX, HP-UX and Irix anymore)

For anything more involved than a few straightfoward lines, I'd probably still pick Perl 5. Very common, especially as some operating systems/distros have it in their core tooling (e.g. CentOS) and because it's included in git, so even available on Windows. Standard functionality is good enough and it's no big jump from sh/awk/sed. Rather do that before I depend on some obscure gawk function or gsed switch, especially as it's really hard to tell whether that's proprietary or in what version it was first introduced. Cross-browser checking is easier these days than cross-distro/unix.

Python does have a better/bigger standard library, but isn't as common and you might even run into Python 2/3 shenanigans these days.

Still, the situations where you have something moderately complex and can't install stuff should be pretty rare. Might not be able to do it for the whole system, but things like perlbrew/rbenv/nvm/pyenv-du-jour solve that mostly. At that level, Go might be an option, too.


Actually I have recently analyzed some python code that does some shell calls and if I can eliminate them with pure python.

It was quite frustrating. A lot of stuff that the shell has simple functionality for can't be done easily in python.

Examples:

* Call to chown -R. Sure, python can change file owners (os.chown()). But it has no parameter to do this recursively. You need to build a loop around it over some file iteration function.

* Call to setfacl. Can't be done with the python standard library, needs pylibacl and you can't rely on it being installed.

* Call to killall -HUP [processname]. You can send kill signals with python, but only to pids, not to process names. You'd have to parse the system process list or something like that in order to simulate.

In all cases I decided to keep the shell call, it just wasn't worth it converting that to less readable python code.


Fair enough, choosing the right tool for the job. On my own servers I can take care that the requires modules are installed and even use virtual envs for stable library versioning. This is how we do Let's Encrypt renewal because I can use a Python library for Route53 authentication that works.

Your first example could be solved with something like

[os.chown(p, uid, gid) for p in pathlib.Path('.').rglob('*')]

Obviously this is more verbose than chown but I think it's ok in a script.


Having watched people hack together utterly bizarre bash scripts with awk and grep to handle json responses I'd very much agree here.

And before someone raises jq, what % of servers already have python installed vs jq?


> And before someone raises jq, what % of servers already have python installed vs jq?

How many of those servers have something like requests installed out of the box vs needing to fetch it via a package manager or PyPI?

jq is available in Ubuntu's universe repository along with requests.


it's python, requests is one of it's many libraries for the purpose, would actually be quite concerned hiring someone who doesn't know the multiple stdlib solutions for this and still claims ability in python.

import json

import urllib.request

tough hey


I'd be as concerned with someone not opting to use what the documentation recommends[1] in lieu of the stdlib, as I would someone who puts together an awkward Bash script to parse JSON instead of just installing the 50kb package for jq.

> The Requests package is recommended for a higher-level HTTP client interface.

[1] https://docs.python.org/3/library/urllib.request.html


It's a lot easier to hire people than fire them friend.

The documentation recommends nothing of the sort, it's a direction for consumers. We all live in varied and different worlds, how many varied server environments have you been responsible for all at once? Is it really beyond your amazing programming skills to use the tools at hand in every circumstance? Perhaps it's a chance to demonstrate yet another 3rd party tool which fixes non-existent problems.

It's a few bytes difference in your script that is using requests over the numerous and plentiful stdlib options there.


Would never use python in lieu of curl/jq for json handling unless I am writing something more complicated than a shell script. catch 22.


If it gets the job done good on you.

Each to their own. Simply saying what I would look for.

Have any examples of your apt install jq script? How would you install jq on a few hundred machines also?

I'd be happy to share an example of an out of the box python script that does the same job and works virtually everywhere.


Bash was the first scripting language I became “good” in too! The skill seems to be regularly devalued in this community, and there is no surprise why (what is collaboration and maintenance for $500).

However, it’s hard to ignore why bash masters are born in the first place if better tools exist - shouldn’t the best tool win in the long run?

My take is lone wolf hackers working in corporate /_/nix constrained environment who become proficient, prolific and efficient win the corporate version of “survival of the fittest”.


I used to know someone who worked for a hugeco with really ancient proprietary unix systems. He wrote huge, complicated things in awk.

I think the experience did things to him, but he was the guy to go to for awk questions.


Google's perforce client wrapper was thousands of lines of bash for a long time. It was both amazing and terrible. It stood up to thousands of engineers and was easy to customize, if youbwerw good in bash. Fun times...


I love Python, I am a Python trainer and wrote an into book on it.

But few years ago, I finally learnt how to use shell and it blew my mind.

With a single command, I can split a really large files in 100 based on a particular column or I can aggregate and do whatever I want with a few piped commands, preview any file with sed '3p' file

Sure. I do not write 100lines of awk code, but I believe I have started utilizing some potential of bash and I don't anymore fire up Python shell and write ten lines which then someone has to maintain!

In my org, our Program manager discovered Python and he looks to replace entire unix scripts whoch do exactly what they are required to do, never fail and will work without any changes to Python.

I don't understand that.

I don't want Python to "win" because there are things you can do in bash, if you learn it correctly


Every language has its place.

I have a workflow I maintain, it takes an image from a file path or a url, converts the image to 2-bit color, adds control codes, and sends the job to the printer - and there were two other utilities did various calibration options for the printer.

So, originally it was written in python2, and used an IPP library that didnt make the leap to python3, so I ported the 'core' (downloading the image, and processing it, and adding the needed control codes) part of the job to python3, and then wrote a bash wrapper to run this script, and pipe the output to lp, it also uses $PIPESTATUS to provide intelligent error codes back to the calling application.

When I approached this, I realized that I could have ported the whole thing to bash, but the more I thought about it I realized it'd probably have greater overhead. First I'd have to run the script, then curl, then next imagemagick (which has its own issues), then insert the formatting around the image, and then run lp. I looked at it and figured python has lower overhead to do all of these things, as everything but the image handling is part of the python standard library.

There were two other scripts however, that all they do is output a bunch of strings, and send them to be printed, these were also python2 and used IPP, these however I just reimplemented in bash, and send the jobs directly to cups.

My attitude is use whatever language you feel comfortable with to accomplish the task, you can do serious programming in bash - people are incredulous at this, but the basic unix toolset is so powerful, you can do almost anything with it.


I think in bash still. I know python and php, but I still think in bash.


> Bash was the first scripting language I ever got "good" with.

Same here. However, it wasn't until I used other languages to manage systems, or glue utilities together, that I really started appreciating Bash.

Bash can be ugly and pretty gross at times, but there are some problems that can be solved quite elegantly with it.

> It also left me a bit weird when thinking about concurrency

Funny, one of the things that I like the most about Bash is how simple it makes concurrency, especially when you pair it with GNU parallel.


> Funny, one of the things that I like the most about Bash is how simple it makes concurrency, especially when you pair it with GNU parallel.

Or xargs :-)


Sub shells are something I don't think get used enough. I would love it if it was easier to spawn extra processes, particularly if those processes could be strongly sandboxed.

Who cares about overflows in your video encoder, if all it can do is make you output broken / different video?


side note -- jobs+join is not a bad thing in many cases !


It would be remiss to talk of bash and not bring up Greg's Bash Wiki (https://mywiki.wooledge.org/BashFAQ)


This and the bash man page are all I need in general.



I recently came across this static site generator written in Bash: https://github.com/hmngwy/jenny

It's kind of cool, but what makes it really nice is that it's going to work on any computer I use, there are no concerns about dependencies, and no concerns about future compatibility. (Obviously not 100%, but I'm not worried in the least.)


It seems like there is potential for making bash faster.

1. JIT compilation, caching and invalidating on updated source files

3. Rearchitecting similar a-la busybox all of coreutils, sharutils, grep, awk and sed as internal components to avoid forking as much as possible. Tool symlinks back to itself for compatibility.

4. Rearchitecting to simulate sub-shells with threads and local contexts, rather than globals/singular locals. With formally-proven safety methodologies, there only needs to be one "shell" process running, a server. All other invocations become thread children if called directly or interact as a client if exec()ed. Subshells become just lightweight threads with Copy-On-Write contexts. No more shell fork-bombs, scripts get crazy fast and reduce first PID.


Regarding 4. it's an interesting point, and this actually exists, few people know that ksh93 (which was a big inspiration for bash and zsh) does not fork when it doesn't really need to, for instance if you strace the following:

echo $(echo $(echo echo))

Due to those command substitutions, on zsh/bash/mksh/dash/yash/busybox/osh you'll see 2 forks/clones, but on ksh93 there will be none, and this allows ksh93 to be insanely fast on "pure" shell scripts that have little use of external binaries and pipes, even faster than dash which is usually known to outperform most shells.


Shell has DSL like nature to it so sh actually is better any other language. But it is not easy to perform introspection or ffi. So basically issues develop when you call alot programs with pipe and environmental variables and you need to also have unshare or poll.

Like one program I had. Basically calling bunch programs in a pipeline but need to poll output of some shell async. C or python makes the pipe code complicated while emulating async Io with tmp is also complicated. In a few cases, is to have second program in python or c or perl and send messages over a pipe.

3. and 4. are helpful. JIT not so much since bottleneck seems to to be parsing rather than executing.


I think that there is really good reason why you don't want to do any of that: bash shell and not general purpose language/VM. And I say that as the author of what essentially is point of sale system implemented in bash with ui based on dialog/whiptail...


If you really care about speed, you probably shouldn't be using Bash.


Also true of Python, but I almost never hear people say that. Strange.


Python's a bit faster than Bash.


For what? Is it faster than (g)awk? How about sed? How about dd? How about any of the thousands of tools in userspace that you wrap in bash and are faster than python at the same task? Use the right tool is being lost in the 'you are too stupid to make a good decision, do it this way' generation.


It depends, of course! But usually the issue with shell scripts is that they create many new processes, which has high overhead. In the instances you're spending more time in dd than bash, in Python you're probably inside an in-process memcpy.


Hi, murex author here. Murex is not a POSIX compliant shell but there are some lessons I've learned developing murex which relate to your points:

1. I do this in murex but actually the performance improvement barely registers in benchmarks because that's not where the real bottleneck is. It's forking and it's passing data via standard streams (stdout et al) rather than sharing memory. There's lots of expensive syscalls to do stuff that you wouldn't need to do in a monolithic service. It's also worth noting that shell scripting is designed to be a highly dynamic language - even command names can be variables(!) - so there is a limit to just how much up-front compilation and caching you do (at least if you want to retain POSIX compatibility).

2. No 2?

3. I also do this to some extent in murex too, except it's not coreutils but rather a set of additional utils specific for more complex data wrangling (JSON, YAML, TOML, CSV, etc). Unfortunately it comes with a set of additional complexities, any of which are deal breakers:

i) If you don't fork then processes don't register PIDs so you can't easily kill a long running process or command run in error. To get around this in murex I had to create a new, separate, function list. This list only includes processes launched from murex but it has both external files (awk et al) as well as builtins. This works for murex because murex isn't going for POSIX compatibility but this would never fly with Bash because you then need to train users to use different `kill`, `ps`, etc commands.

ii) Many users take advantage of job control, which is where you can stop, resume and background already running processes. This works via signals (like SIGINT which you'd be familiar with) and is controlled by UNIX/Linux and not the shell. To get this working with non-forked builtins takes a hell of a lot of hacking and, once again, breaks POSIX compatibility. In fact I ended up removing support for SIGSTSP from murex builtins because it wasn't reliable (though I do plan on adding it again at some point). I also have a bug where external commands don't get sent a SIGTTIN (signal to tell background processes to pause if they're reading from stdin) and that's largely due to some of the hacks I've made regarding your first point.

iii) re-implementing coreutils is non-trivial. In fact calling it "non-trivial" is a massively understatement. And that's without touching the other utils you've highlighted. This effort alone is enough to be a deal breaker without taking into account i and ii.

The reason busybox works is because it's a subset of coreutils. In fact busybox can be a little jarring at times because it is missing so many common flags. However it does just enough to be a useful emergency / embedded shell.

4. This is how murex approaches sub-shells but you're still limited by the caching problems (raised to point 1) plus the forking requirements (addressed in point 3) in terms of just how much additional performance it will squeeze. However out of all the points you have discussed, this is the easiest to implement.

It's also worth noting that fork-bombs (or equivalent exploits) will always be possible as long as your code has functions. Removing sub-shells wouldn't prevent someone from writing a self-referencing function.


I like bash, but I sort of have a rule...

If I am thinking of using advanced features of bash, or I'm over-using grep/sed/awk within the script.... switch to python (formerly perl)

Mostly it's a win with respect to quoting, path manipulation and proper data structures.


Python's major glaring problem as a systems language is dealing with processes. It is just verbose, ticky and annoying.

I have my other issues with it as a systems language that are more idiosyncratic, but for replacing bash, process management is just way too much hassle.


I agree, I basically wrote the same thing in the FAQ for Oil:

http://www.oilshell.org/blog/2018/01/28.html#shouldnt-script...


I agree. I basically fall back to a subroutine with try and subprocess.check_output() as shorthand.

However I like this lots:

   subprocess.call(['cmd','with','precise','args',file1])


99.9% of all process management can be done with oneliner subprocess.check_output. Not verbose at all and much safer.

Biggest mistake people do is bringing bash idioms into python, like trying to pipe the output through head/cut/tail/grep/sed instead of just using string operators like endswith/in/startswith/re. If you do that then process managment will be more verbose, but it is in bash also unless you ignore decent error handling like $PIPEFAIL[0] which many many scripts actually do unknowingly. For those other 0.1% of the places where you actually need a pipe then just accept that you have to write 4 lines instead of 2, you will save thousands of lines in other places. For parallel running processes &, jobs, nohup and wait in bash isn't exactly convenient either if you are interested in the results, so there it's a tie vs POpen if i'm being generous.


Do you really think the python popen() workalike with optional shell inclusion is better than running in a shell? More secure perhaps? Easier to read? I forced a developer from my team who would not stop using subprocess calls (including $SHELL) in random new python scripts with hard coded arguments instead of writing a single shell script that could be reused. subprocess is a tarpit and is more misused than any other python function.


My bar for switching to my scripting language of choice is much lower; as soon as I need any sort of control flow, I switch to Ruby.


Clicking around, I am glad to see portability to non-bash shells covered in some topics, despite the name of the site.

It seems to me like sometime in the last 10-15 years, what we used to call "shell scripting" became "bash" in popular discourse. I myself learned to write shell scripts in times and places where bash was the most popular default, but I never thought of it as writing intentional bashisms, but a set of skills applicable to multiple possible implementations, so I find that shift in jargon a little disappointing or incorrect.


An even more important point since the two mainstream desktop Unixen (Mac OS and Ubuntu) ship with zsh and dash, resp., by default.


dash is shipped as a replacement for the Bourne shell, not bash. Thus Ubuntu does also ship bash too.

Alpine doesn't have bash installed. It alsonpretty common not to have bash as part of the base image on non-Linux systems too. You've already mentioned macOS but the same is true for many of the BSDs.


I think bash has become the new name for "shell", in the same way kleenex and hoover became names for product categories, rather than brands.


I think the biggest distinction between shells is whether they are C-shell derivative (csh/tcsh) or Bourne shell derivative (sh/ksh/bash).


This, Greg's wiki and #bash@freenode is how I became a bash nazi to my peers.

Just hanging out in #bash@freenode you learn so much best practice. Like sub-shell quotation, variable capitalization, parameter expansion, read usage and much more.


I clicked the config article and it reminded me of 2 major pet peeves:

1) No good way to store config values, nor ones that can be shared with other languages. I might have a set of python applications running in the cloud and a shell script to deploy them, and guess what, I'd like them to share the same config, so I don't have to keep to update resource names in 2 places.

2) No simple way of templating files. A script that would replace {var} with value of $var in a file seems like an obvious thing for shell to do for you, but I haven't found a good way of doing it.


1) There's a few ways you could skin this proverbial cat:

option 1:

A bit python specific but you can use python-dotenv[1] for Python; and `source` that file in bash for shell support.

option 2:

Alternatively you could just pass environmental variables between applications (that's kind of the point of them).

option 3:

You could still use a dot env file and import that in other languages as TOML. This is a really nasty hacky way of doing things though because you'd need to read that file in, then prefix a key at the start of the file (eg "[vars]") to adhere to the TOML standard before you could run it through a TOML parser. I wouldn't personally recommend this approach but it would work as a lazy solution.

2) Sure there is, `envsubst`[2]

hello.tmpl:

    This is an exmaple
    Hello, ${HELLO_NAME}!
hello.env:

    HELLO_NAME="wodenokoto"
hello.sh:

    #!/bin/sh

    source hello.env
    envsubst < hello.tmpl > hello.txt
Run hello.sh and it will create a file called hello.txt:

    This is an exmaple
    Hello, wodenokoto!
One caveat is that this is part of the gettext package which doesn't always ship with every distro. But it's a pretty small package (small enough that I don't have any issue installing it into a docker container for one CI pipeline).

[1] https://pypi.org/project/python-dotenv/

[2] https://linux.die.net/man/1/envsubst


> just pass environmental variables between applications

That would require me to 1) somehow ensure that shell is sourcing the the environment variables from a file, and 2) we are back to all the hackery mentioned in the article [1]

[1] https://wiki.bash-hackers.org/howto/conffile


You don't need a shell script to initialise an application with specific environmental variables. systemd and docker will both do this (and both supports the same env file too). AWS lambda supports environmental variables, so does most CI/CD solutions. It's a pretty standard way to do things in UNIX/Linux. In fact if you're having to resort to "hacks" to get environmental variables loaded then that might be a symptom of a bigger problem with your orchestration rather than an issue with Bash / Python.

Also that was just one of three options I listed; and there will be a plethora of other alternatives I've not mentioned too.


Thank you for your patience in answering my woes.

You’ve given me a lot of food for thought and some good ideas on how to attack some of my problems.


Maybe I’m misunderstand the points here but shouldn’t point 1 be covered by environment variables mostly?

And for point 2, basically the same thing, you can set variables to have a default value and then try to read the environment.

export variable=${variable:-“some_default”}


This website has been my go-to reference when hacking together Bash scripts. I can never remember the exact syntax of parameter substitution, and how do I do arithmetic substitution again? What's all the 'set -o' options?

I've since switched to Zsh, and while its documentation is extensive, it's basically one fat book and is far from being as accessible or navigable at this excellent wiki.


https://xkcd.com/927

Is there room for a lightweight scriptable language that takes the place of, say, bash/zsh/korn/fish shell scripts?

Please don't say perl/python or sed/awk.


That's what Oil is, except it also runs bash scripts and lets you upgrade seamlessly.

That is, bin/oil is the same as bin/osh with a bunch of shell options on, including 'shopt -s simple_word_eval' which eliminates a lot of the quoting hassles.

You can try it now, but some things will be cut out after optimizing the prototype interpreter (in the name of time):

You Can Now Try the Oil Language http://www.oilshell.org/blog/2019/10/04.html

I'm looking for help too: http://www.oilshell.org/blog/2019/12/09.html#help-wanted

----

Also, I think there is more room than ever, because both Python 3 and Perl 6/Raku are worse for shell-like tasks than their predecessors! Mainly because of startup time and the string abstaction.

We discussed that a couple weeks ago here, and I think it's on the blog somewhere too:

https://news.ycombinator.com/item?id=22156151


> Python 3 and Perl 6/Raku are worse for shell-like tasks than their predecessors! Mainly because of startup time and the string abstaction

Startup time can always be better, agree. But full support for Unicode is needed in this day and age, and that brings overhead, whichever way you do that, and especially so if you want to do it 100% correctly. If you're still living in an ASCII world, then by all means, go for it.

Additionally, in this world of source version control, and virtually unlimited diskspace for source-code, I think using scripts, rather than one-liners, will help better in maintainability rather than using one-liners.


"Full support" for Unicode is almost never what you want unless your program literally interacts with font rendering, and then it's not even close to enough. Most of the time the important requirement is that you don't mangle data as it passes through your program, which e.g. early py3 spectacularly failed and is still not better at than py2 in the very common case where your input is not actually necessarily well-formed Unicode, but just an unrestricted byte array that by convention usually contains utf-8.

Most of what people hope for Unicode support to achieve beyond that (case-folding, homoglyph elimination, &c.) is not only not achieved, but not even possible.


Oil has better Unicode support than Python 2 or 3 for shell-like tasks, because of the way file systems, libc, and the kernel work.

Explained here:

http://www.oilshell.org/blog/2018/03/04.html#faq

which links:

http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/ (by Armin Ronacher)

The summary is that there are two kinds of Unicode support:

- UTF-8 based: Go, Rust, and Oil (and Perl 5 it seems, not sure about Raku)

- array-of-codepoint based: bash/zsh/libc, Python, Java, JavaScript, Windows (JavaScript notably requires surrogate pairs, Python used to have build time configuration, now has complex storage heuristics, etc.)

This isn't a theoretical problem -- the Unicode problems in the other comment thread I linked are real and show up in practice.


Perl (5) is array-of-codepoint based, at the logical level. Those codepoints might be internally stored as their encoded-to-UTF-8-bytes, or they might not, but this does not affect the usage of the string.

Many don't really understand the string model (because many don't really understand character encoding) but it comes down to: all input and output is going to be bytes, which by default is stored as the codepoints sharing the ordinals of those bytes, and there are several mechanisms by which you can manually or automatically decode/encode those byte ordinals to the represented characters; for most text processing, you do this on STDIN and STDOUT.


Unicode support of Raku is grapheme based (NFG or Normalization Form Grapheme). The unicode theory: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundarie... , about the implementation: https://6guts.wordpress.com/2015/04/12/this-week-unicode-nor...


Once Oil has better documentation I'll probably give it a try. The documents outlined in "Docs I Want to Write" seem promising and are still only visible as drafts in Zulip which requires logging in.


That's fair, in the meantime this post gives some more examples:

http://www.oilshell.org/blog/2020/01/simplest-explanation.ht...

as well as other posts tagged #oil-language:

http://www.oilshell.org/blog/tags.html?tag=oil-language#oil-...


I use POSIX /bin/sh exclusively. It's reasonably lightweight but because of this you loose a lot of the "bash-ism"s. There's ways to work around it though, and the pure-sh-bible[1] is great.

1: https://github.com/dylanaraps/pure-sh-bible


Your use case is unclear, but you may want to look at Tcl?


Lua. Very small runtime, fast, powerful.


Not installed anywhere by default


If that is a requirement, then you have restricted yourself to shell and bash.


Ruby? It's installed by default on Mac OS X and many Linux distributions.


From now on, neither perl, ruby nor python will be installed by default on OS X

https://news.ycombinator.com/item?id=20109469




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: