Hacker News new | comments | show | ask | jobs | submit login
Things Every Hacker Once Knew (catb.org)
539 points by ingve on Jan 27, 2017 | hide | past | web | favorite | 321 comments

I always thought it was a shame the ascii table is rarely shown in columns (or rows) of 32, as it makes a lot of this quite obvious. eg, http://pastebin.com/cdaga5i1

It becomes immediately obvious why, eg, ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position of the letter (or 60h for lower-case). Or that we shift between upper & lower-case with a single bit.

esr's rendering of the table - forcing it to fit hexadecimal as eight groups of 4 bits, rather than four groups of 5 bits, makes the relationship between ^I and tab, or ^[ and escape, nearly invisible.

It's like making the periodic table 16 elements wide because we're partial to hex, and then wondering why no-one can spot the relationships anymore.

The 4-bit columns were actually meaningful in the design of ASCII. The original influence was https://en.wikipedia.org/wiki/Binary-coded_decimal and one of the major design choices involved which column should contain the decimal digits. ASCII was very carefully designed; essentially no character has its code by accident. Everything had a reason, although some of those reasons are long obsolete. For instance, '9' is followed by ':' and ';' because those two were considered the most expendable for base-12 numeric processing, where they could be substitued by '10' and '11' characters. (Why base 12? Shillings.)

The original 1963 version of ASCII covers some of this; a scan is available online. See also The Evolution of Character Codes, 1874-1968 by Eric Fischer, also easily found.

I stumbled across the history of the ASCII "delete" character recently: It's character 127, which means it's 1111111 in binary. On paper tape, that translates into 7 holes, meaning any other character can be "deleted" on the tape by punching out its remaining holes.

(It's also the only non-control ASCII character that can't be typed on an English keyboard, so it's good for creating WIFi passwords that your kid can't trivially steal.)

> It's also the only non-control ASCII character that can't be typed on an English keyboard

Don't count on it. There's a fairly long standing convention in some countries with some keyboard layouts that Control+Backspace is DEL. This is the case for Microsoft Windows' UK Extended layout, for example.

    [C:\]inkey Press Control+Backspace %%i & echo %@ascii[%i]
    Press Control+Backspace⌂
This is also the case for the UK keyboard maps on FreeBSD/TrueOS. (For syscons/vt at least. X11 is a different ballgame, and the nosh user-space virtual terminal subsystem has the DEC VT programmable backspace key mechanism.)

It's actually easier to add two spaces at both ends of the password :)

Wow, I never knew it was an actual character.

Sure, think of it this way: you're sitting at a terminal connected to a mainframe and press the "X" key; what bits get sent over the wire? The ones corresponding to that letter on the ASCII chart.

Now replace "X" with "Delete".

(too late for me to edit; took me a while to find online)

Another good source on the design of ASCII is Inside ASCII by Bob Bemer, one of the committee members, in three parts in Interface Age May through July 1978.




That Fischer paper does look interesting - Thanks!

I do understand that I've probably simplified "how I understand it" vs "how/why it was designed that way". This is pretty much intentional - I try to find patterns to things to help me remember them, rather than to explain any intent.

Yeah, there's not much 4-bit-ness that's an aid to understanding what it is today. One is that NUL, space, and '0' all have the low 4 bits zero because they're all in some sense ‘nothing’.

I started programming BASIC and assembly at 10 years old on a Vic-20, so I don't qualify as a wizened Unix graybeard, but I've still had plenty of cause to look up the ASCII codes, and I've never seen the chart laid out that way. Brilliant.

  > on a Vic-20
Which, weirdly, used the long-obsolete ASCII characters of 1963–1967, with '↑' and '←' in place of '^' and '_'.

PETSCII was a thing throughout the 8-bit Commodore line of products. It was based on the 1963 standard, but added various drawing primitives. I spent a lot of time drawing things with PETSCII for the BBS I ran from my bedroom.


Going on my deep (and probably fallible) memory; I remember seeing the ASCII set laid out like this on an Amoeba OS man page (circa 1990).


Had one as well, how did asm work?

>"It becomes immediately obvious why, eg, ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position of the letter (or 60h for lower-case). Or that we shift between upper & lower-case with a single bit."

I am not following, can you explain why ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position? Can you elaborate? I feel like I am missing the elegance you are pointing out.

If you look at each byte as being 2 bits of 'group' and 5 bits of 'character';

    00 11011 is Escape
    10 11011 is [
So when we do ctrl+[ for escape (eg, in old ansi 'escape sequences', or in more recent discussions about the vim escape key on the 'touchbar' macbooks) - you're asking for the character 11011 ([) out of the control (00) set.

Any time you see \n represented as ^M, it's the same thing - 01101 (M) in the control (00) set is Carriage Return.

Likewise, when you realise that the relationship between upper-case and lower-case is just the same character from sets 10 & 11, it becomes obvious that you can, eg, translate upper case to lower case by just doing a bitwise or against 64 (0100000).

And 40h & 60h .. having a nice round number for the offset mostly just means you can 'read' ascii from binary by only paying attention to the last 5 bits. A is 1 (00001), Z is 26 (11010), leaving us something we can more comfortably manipulate in our heads.

I won't claim any of this is useful. But in the context of understanding why the ascii table looks the way it does, I do find four sets of 32 makes it much simpler in my head. I find it much easier to remember that A=65 (41h) and a=97 (61h) when I'm simply visualizing that A is the 1st character of the uppercase(40h) or lowercase(60h) set.

This single comment has cleared up so much magic voodoo. I feel like everything fell into place a little more cleanly, and that the world makes a little bit more sense.

Thank you!

I can't believe I've only just realised where the Control key gets its name from. Thank you!

The article linked mentions that the ctrl key (back then?) just clears the top 3 bits of the octet.

Awesome, yes this makes total sense. I'm glad I asked. Cheers.

Basically, that modifier keys are just flags/mask e.g. ESC is 00011011, [ is 01011011. CTRL just unsets the second MSB and shifts the column without changing the row.

Physically it might have been as simple as a press-open switch on the original hardware, each bit would be a circuit which the key would connect, the SHIFT and CONTROL keys would force specific circuits open or closed.

if you press a letter and control, it generates the control character in the left-hand column.

the letters in the third column are A = 1, B = 2 etc: 40h + the position in the alphabet.

Awesome to see ^@ as null and laying it out this way makes it easier to see ^L (form-feed, as the article says: control-L will clear your terminal screen), ^G (bell), ^D, ^C etc etc

This is so that control characters (and shifted characters — see https://en.wikipedia.org/wiki/Bit-paired_keyboard) could be generated electromechanically. Remember a teletype of the era (e.g. Model 33) has no processing power.

There's a longer explanation on Wikipedia: https://en.wikipedia.org/wiki/Caret_notation

ESC is on the same row as [, just in another column. So Ctrl ends up being a modifier just like Shift, in that it changes column but not row.

The 40h offset is 2 columns' worth.

I made and printed out a nicely-formatted table that's adorning my office wall right now, for when I was trying to debug some terminal issues a while back (App UART->Telnet->Terminal is an interesting pipeline[1]), because I was frustrated with the readability of the tables I could quickly find online, and they didn't have the caret notation that so many terminal apps still use (quick, what's the ASCII value for ^[ and what's its name?[2]).

Cool story bro, I know, but I meant to put the file online in response here, but I can't find the source doc anymore >_< Edit: actually, I found an old incomplete version as a Apple Numbers file. If there's interest I can whip it back up into shape and post it as PDF.

[1] For example, when a unix C program outputs "\n", it's the terminal device between the program and the user TTY that translates it into \r\n. You can control that behavior with stty. I know this is something ESR would laugh at being novel to me. On bare-metal, you have no terminal device between you program and the UART output, so you need to add those \r\n yourself.

[2] That's ESC "escape" at decimal 26/hex 1B, and you can generate it in a terminal by pressing the Escape key or Ctrl-[

Consider just taking a photo of the table you found.

It's entirely possible that someone reading this thread will be able to source it.

But it's just a text table, right? It should be fairly trivial to reproduce it from a decent picture.

By the way, thanks for clarifying the existence and purpose of the now-in-kernel "terminal device." I've understood the Linux PTS mechanism and am aware of the Unix98 pty thing and all of that, but identifying it like that helps me mentally map it better.

The name that you need to know is line discipline.

Right... and that's the collective name for the terminal device's settings. I see, thanks!

Minor nitpick: isn't ESC decimal 27? I distinctly remember comparing char codes to 27 in Turbo Pascal to detect when user pressed Esc key... :)

LOL you're right. I guess I still managed to misread my table ^_^;

The 8x16 layout is more compact and fits on a single screen or book page, so it became the standard. You're absolutely right that the 4x32 layout makes the relationships more obvious. But once you've learned those relationships, you've lost the incentive to change the table.

Oh, so that's why C-m and C-j make a line break in Emacs, and C-g will make a sound if you press it one extra time.

^V will put DOS (and cmd.exe) into "next character is literal" mode, so "echo ^V^G" will make a beep. I think you needed ANSI.SYS loaded for it to work though (?).

Speaking of DOS, I've never forgotten that BEL is ASCII 7, so ALT+007 (ALT+7?) will happily insert it into things like text editors. I remember it showed up as a •. I'm not quite sure why.

Amazing. That is a thing of beauty. Can't believe I've never seen it like that before.

Thanks. With a bit of deduction from your post, I just figured out why ^H is backspace.

And ^W deletes words.

Both shortcuts work in terminal emulators.

(As an aside ^W is much easier to input^Wtype as a "fake backspace" thingamajig^H^H^H^H^H^H^H^H^H^H^Hmnemonic than ^H is.)

i didnt realize that ^W did words (and i didnt know what "ETB" was either). But thats a useful one to know!

A lot of hardware still uses serial, and not just industrial stuff. Everything from sewing machines to remote controlled cameras.

If you work on embedded devices you will still encounter serial/RS-232 all the time. Often through USB-to-serial chips, which only adds to the challenge because they are mostly unreliable crap. Then there are about 30 parameters to configure on a TTY. About half do absolutely nothing, a quarter completely breaks the signal giving you silence or line noise, the final quarter only subtly breaks the signal, occasionally corrupting your data.

Still, there is nothing like injecting a bootloader directly to RAM over JTAG, running that to get serial and upload a better bootloader, writing it to flash and finally getting ethernet and TCP/IP up.

Arduino got so popular because it addressed those exact problems: realiable USB to serial built in, foolproof bootloader.

Still, there is nothing like injecting a bootloader directly to RAM over JTAG, running that to get serial and upload a better bootloader, writing it to flash and finally getting ethernet and TCP/IP up.

I'm happy to have gotten mostly rid of this. Gone are the days of choosing motherboards based on the LPT support, and praying that the JTAG drivers would work on an OS upgrade.

> I'm happy to have gotten mostly rid of this. Gone are the days of choosing motherboards based on the LPT support, and praying that the JTAG drivers would work on an OS upgrade.

It's still there, and not going anywhere - the only thing that's past is LPT. Last time I used a Linux-grade Atmel SoC, it had a USB-CDC interface but the chain was still the same: boot from mask ROM, get minimal USB bootloader, load a bootstrap binary to SRAM, use that to initialize external DRAM, then load a flashing applet to DRAM, run it, use that to burn u-boot to flash, and then fire up u-boot's Ethernet & TFTP client to start a kernel from an external server and mount rootfs over NFS. Considering the amount of magic, it worked amazingly well. The whole shebang was packaged into a zip file with a single BAT to double-click and let it do the magic.

As for COM and LPT - FTDI and J-Link changed the embedded landscape forever, and thanks for that.

Get yourself a Black Magic Probe. It's JTAG/SWD on the device end, USB GDB server on the host end. https://hackaday.com/2016/12/02/black-magic-probe-the-best-a...

disclaimer: I'm friends with the BMP manufacturer.

Arduino got so popular because it does everything the Atmel STK does at a fraction of the price.

Yes! I'm currently working on a Windows based building access system dating back to the early 90s. Most of the codebase in MFC C++. The hardware/firmware that runs the door controls used to talk RS232 to the Windows boxes, but now use TCP/IP. STX is used as the start char for a msg. The system is used by some very prominent govt and biz orgs.

One pleasant upside of such systems is that the protocols are usually so simple you can reverse engineer them in a couple evenings. I can't count the times I ended up deriving the serial protocol from some undocumented, baroque source code (or sniffer dumps) and writing a quick & dirty implementation in a few dozen lines of Python.

That's exactly what I've done :) The checksum algorithm that suffixes each msg with a 2 byte check took a little bit of figuring out. I'm using tornado.tcpserver.TCPServer and tornado.iostream. Another plus is generating logs of the msg flow; useful for diagnostics, and the after the fact specs you often need with crufty old systems.

Just about every professional display or projector has a RS-232 control interface. In arenas, museums, airports, restaurants, any place you see higher end digital signage, you'll likely see a RS-232 serial cable connected to the display device for control and maintenance.

Definitely. Just this week actually I'm writing an interface to a _new_ RS232 device for work.

Serial is so easy to program for, and really has first class support on the most common/classic SOCs. USB is kind of tough TBH.

On Plan9 you can run TCP/IP over RS232 and RS232 over TCP/IP with ~zero effort too.

So your sensors can talk across the internet.

I wish Plan9 were practically usable (good graphics support, modern chipset support, etc); it's the lack of those types of critical features that limit its widespread use at the end of the day.

From `man socat', which runs on Linux:

  (socat PTY,link=$HOME/dev/vmodem0,raw,echo=0,wait-slave
  EXEC:'"ssh modemserver.us.org socat - /dev/ttyS0,non-block,raw,
  generates  a pseudo terminal device (PTY) on the client that can
  be reached under the symbolic link $HOME/dev/vmodem0.  An appli-
  cation  that expects a serial line or modem can be configured to
  use $HOME/dev/vmodem0; its traffic will be directed to a  modem-
  server  via  ssh  where  another  socat  instance  links it with

import /n/kremvax/net /net.alt

now kremvax' network stack is bound over my own no need for nat

Okay that's really impressive.

By "bound over", do you mean tunneled, or "made available alongside"?

In Plan9 the file system is per process so I can choose to bind kremvax' network over mine in this shell window, then pentvax' network over min in another.

I think I'm right in sayig that if one has /net and /net.alt then /net is asked first and then /net.alt

I'm a bit rusty on the details

To make a connection a process open's its /net/tcp/ctrl file, writes a connection string, gets back a response. If that response is a number then it opens /net/tcp/1/ctrl and /net/tcp/1/data and read / writes data to the data and sees out of stream messages on ctrl. To close the connection one closes /net/tcp/1/ctrl

Everything is done via the 9p protocol. So if I can write code on my Arduino that understands 9p and a way to send/rec that data (e.g. serial but even SP1 or JTAG) to a Plan9 machine locally, then the Arduino itself could just open a tcp connection on kremvax.

Once you start layering these things on top of each other it gets a bit mind blowing and you get drunk on power. Then when you are forced back Linux or Windows you realise how dumb they are even with their fancy application software.

Okay, wow. That is really awesome.

And nice. You talk to the kernel via plain text. No ioctls! :D It's like they made an OS designed to make bash happy.

And I can sum up my conclusions of 'modern' Linux in one word: systemd. It's almost worse than Windows now. I'm not surprised this kind of thing doesn't work on Linux :P

...But I'm sad now, that I can't do all of this on Linux.

Hrm. Maybe someone should pull the Plan9 kernel-level stuff out and make a FUSE-based thing (or even a kernel driver!) that emulates all of this on Linux. There's already plan9port for the utilities... okay, they'd need to be modified back to being Plan9-ey again, but it could be really interesting.

Here's an irc bot written in shell


which shows some of the concepts.

That's almost alien. rc is quite different, wow.

It's really sad that this isn't available for Linux. :(

Because this just makes so many cool things possible.

Like I said, between the 100 of us we only scratched the surface of the power of that idea.

It is heartbreaking to know it sits there, unloved outside of our small community.

Very curious who you mean by "us." Are you part of a Plan9 user community? Interesting.

I remember firing Inferno up inside a Java applet a few years ago. I think something that might generate some interest is a) compiling Plan9 (kernel + rootfs) to JavaScript, and b) interfacing that boots-inside-JS environment to Web APIs for mouse, audio, graphics, etc. It would essentially be [all the work of] a new platform port (albeit one without a CPU architecture (!), and using JavaScript hardware/"peripherals" :P), but I think that making the effort needed to implement this would send a good message that Plan9 is built by people who are interested in keeping their work relevant, and that they think it's worth that (which of course it is ^^, but - you know, advertising). Yeah, it won't significantly sway the status quo (sadly), but I'd definitely argue that the lack of a JS port of Plan9 is - as crazy as it sounds - a real hole nowadays.

Here's the NetBSD kernel running in JS, although it doesn't have a proper shell/tty interface (it's just a proof-of-concept): https://ftp.netbsd.org/pub/NetBSD/misc/pooka/rump.js/

Browsers will never do raw TCP/IP due to security concerns and that is a fairly noteworthy roadblock. WebSockets goes over HTTP+TCP, and WebRTC data channels are SCTP. But if you wrote a Plan9 driver of some sort that handled WebSockets/WebRTC (WebSockets would be significantly easier) and let you talk 9P over that, and then wrote an appropriate driver (that listened on a local port) for native Plan9, you could talk from an in-browser Plan9 instance to a native instance running on real hardware or a VM.

I'm not sure how far plan9port goes in terms of 9P emulation, but it might be worth to figure out how to make the native (aka non-browser) side of the WebSocket/WebRTC<->9P transport work with plan9port as well as "real" native Plan9. That way users who are running just the one copy of Plan9 in browser can talk to it with plan9port tools. If plan9port can't already do that though (can it?) then maybe that would be asking a bit much (adding functionality to plan9port is a bit of a tall order).

This would make firing the system up and playing with it a whole lot easier, akin to how you can run DOS games from the Web Archive in-browser because they compiled DOSBox to JS.

While I expect that the most straightforward (and non-insane) path to doing this would be Emscripten, it may be a very fun research topic to make the Plan9 C compiler and/or the bytecode generator JIT directly compile/generate WebAssembly code :> - with a fully capable toolchain you could even compile the kernel directly to wasm, skipping the translation overhead of Emscripten!

WebAssembly is still in the early stages, although there is some active (albeit early) browser support for it floating around, and you can build and test things without having to recompile your entire browser AFAIK (yes, if I understand correctly, browsers already include alpha-stage wasm parsing code). I suspect your best bet is likely to be to skip the likely-outdated noise on websites and just chase the Chromium/Firefox/WebKit/Emscripten teams (particularly Emscripten, which is spearheading compile-to-wasm) for the latest advice.

It would be especially awesome for you to be able to immediately follow the "Chrome officially supports WebAssembly" HN post (maybe a few months away) with "Plan9 runs in-browser and compiles directly to wasm" :D - while most people will just punt and follow the "put your .js on your site, run it through $compiler and put the .wasm file next to it like this" tutorials, you'd probably get a bit of traffic and exposure from directly generating wasm. A LOT (LOT) of people - mostly hacker types - are going to want to know how wasm works, and... showing off Plan9 as the tech demo of the fact that you're directly generating wasm... well......

It's not yet perfect; see https://github.com/WebAssembly/design/blob/master/FAQ.md#is-.... At this point wasm specifies no way to talk to the DOM but only allows you to invoke high-speed code execution from JavaScript. It may be worth waiting for DOM access capabilities, although as that link mentions, it's already viable to compile language VMs to wasm, so OS kernels sound reasonable too.

Us - yes I am part of the plan9 community - though somewhat inactive at the moment - I'm doing a degree in Supply Chain Management so my focus is on that.

That's all well porting a plan9 to JS but it's the kernel services that make plan9 different.

"Everything is a file" is the core concept. The rest is just things built on top of that.

We have a boot CD, it boots in Qemu, we have an emulated environment that runs on on Linux/BSD in 9vx, we have mounting 9p in the Linux kernel, we have plan9ports in Linux/BSD.

If you want plan9 it's not hard to get!

I hope to explore Plan 9 in greater depth than I can right now too.

I agree that the kernel services are what make Plan 9 different. I think the "everything is a file" concept is something that has not been explored nearly as much as it could be.

Thanks for mentioning 9vx, I completely forgot about that project! Just downloaded it and fired it up (managed to figure out the correct invocation without needing the (nonexistent) manual :P). That would be a really interesting port to JS.

I agree that Plan9 is not at all hard to get at - but if you can run it just by visiting a website, a LOT more people will play with it. Even if they just fire it up, go "huh, neat" (or more likely "how on earth do I use it") and close it, that does increase platform exposure.

I can't help but remember all the DOS games that are directly playable on the Web Archive. Inferno used to run in-browser because it's the perfect tech demo of the platform. Plan 9 should be able to too IMHO.

At this point in time Web browsers are not the perfect environment for general-purpose x86 emulation, I can't argue that. If things were different we'd run everything inside the browser. But I think Plan9 is sufficiently resource-light, fast, and cleanly-designed enough that it would make an excellent candidate.


I seem to recall that Linux have a implementation of 9P these days. Can't say i have taken it for a spin though.

thats only for mounting 9p file systems, it doesnt offer kernel services

It wouldn't surprise me if RS-232 will still be around long after USB falls out of use.

Does anyone know the reasons why USB was not made backward compatible with RS-232? It would only take a very short negotiation to determine if both endpoints support USB.

I don't know but my guess is to add compatibility for RS232 voltage levels would add significant cost and complexity to the USB PHY, the thing that actually drives and measures voltages on the wire. USB was specifically designed to have a simple and cheap PHY so that it could be used in very cheap peripherals.

To expand a bit, the typical fully compliant RS232 setup has a special level translation IC (for example the MAX232) to deal with the huge voltage range and convert it to something more low voltage digital logic friendly. And that IC usually requires power supply levels that the typical cheap USB device would not have, which means it would have to add more circuits to generate them, adding even more cost to the hardware.

And the USB committee probably figured to be not fully compliant with RS232 would just confuse people, and damage hardware, so it was better to be not compliant at all.

It would have been really awesome if USB was compatible with TTL-level (5v, maybe 3.3v) "serial". This nonstandard variant of definitely-not-RS232 is everywhere.


> And the USB committee probably figured to be not fully compliant with RS232 would just confuse people, and damage hardware, so it was better to be not compliant at all.

...you are sadly right.

USB controllers have to support 3.3 V anyway (on the D+/D- pair, not on the SS pairs), for supporting USB 1.x modes.

I used to work on x-ray equipment that still used serial/rs-232 ~4 years ago. I'm sure they are still around. I had to set up multiboot laptops that ran win 95/98/xp, because we couldn't virtualize the serial connection properly. And yes, the USB-to-serial rarely worked either.

Man that job was frustrating, but it sure was a lot of fun!

Isn't there something like, just one company in the world that makes 95% of all USB to Serial chips?

There are several. FTDI, Prolific, WCH (CH340)...others as well.

It's a mess at the moment, because there are unauthorized clones of both FTDI and Prolific, and both companies release drivers that purposefully don't work (or worse...brick them) on the clones. But, there's not really a way for the end buyer to know for sure they are buying the real thing.

The SiLabs CP2102N is useful for serial to parallel. It will talk to Linux with the standard Linux serial driver, although you need a free Windows program from SiLabs if you want to reconfigure it. (This is needed only for unusual applications.)

I use them because they'll go down to 45 baud for antique Teletype machines. They're popular for Arduino applications, and there are lots of cheap breakout boards with 0.100 pins for Arduino interfacing.

Interesting. I'd gotten the impression the FT232 was "rock solid" and filed them away as "they're good, use them", but that was before the bricking incident, and now I'm not really sure anymore.

I guess on the surface the big thing I really like is device differentiation. Do CP2102Ns have unique serial numbers, or can that free utility burn in info I can use to differentiate?

Going a bit deeper, can I bitbang with it?

You can set vendor ID, product ID, product string, serial string, release version, and max power requested. "Manufacturer string" is set to "Silicon Labs". After doing that, you can lock the device against further changes, if you want. This is all done via SiLabs "Simplicity Studio", which is a big IDE for their microcontrollers into which they wrapped up some of the device-specific tools for their simpler devices.

Thus, you can force the host machine to demand a device-specific driver if you need to. By default, it appears to the OS as a USB to serial port device. Linux and Windows recognize it as such, without special drivers. Linux mounts it starting at /dev/usb0; Windows mounts it starting at COM3.

No bit-banging, though; it doesn't have the hardware.

Ah, thanks! That's pretty cool.

Very nice that I can change the serial number! That's actually kind of better than the FTDI route, where the serial numbers are hardcoded; I get to use my own serial numbering scheme.

I kinda expected no bit-banging. FWIW, if I really needed that I could probably build something with an Arduino (or similar microcontroller), and there are probably devices out there that do offer that functionality. I've never practically needed it; it's just my catalyst.

I can confirm the FTDI drivers are total shit. We had to obtain hardware info under NDA, and write our own driver in order to get a reliable solution build using their parts.

It used to be "just" FTDI... but then other manufacturers and endless copycats jumped in, which resulted in a XKCD #927 situation. As if we didn't already have a USB-CDC standard...

(For an interesting side story, search for "ftdigate". At some point FTDI decided they're fed up with copycats and released a new driver pack (through automatic Windows Update) that bricked counterfeit chips. This led to a lot of angry people and number of amusing situations, including someone jokingly submitting a patch to Linux to do the same.)

FTDI is not and never was especially big in the USB-to-Serial market, and their chips tended to implement CDC in variously broken ways not to mention that their windows drivers are major PITA. What they are big in is market for "just add USB to this custom device" because they supply mostly NDA-free documentation since forever and produce various ready-made modules.

They also put the legwork into getting their drivers into nearly everything, which makes them great to use in a custom or small batch product that has users that aren't technically oriented or wont be installed by a knowledgeable installer. If you already have a client side install (like a software package) that can bundle the drivers for you, it's not a big deal.

And with the emergence of USB-CDC, it's not nearly as necessary as it used to be, since most modern OSes support that now.

And then killed that convenience by literally bricking counterfeit chips. Most users have no way of knowing whether or not their chips are genuine - and, what's worse, most electronics manufacturers have no way of knowing either, because entire supply chains are rotten. These days we have our trusted supplier of FTDI chips but we still check every single shipment. We already had one supplier send us fake chips and then flat out exclaim they did it because "no one would pay premium to get the real thing".

We would.. but since they already cheated, we had to switch suppliers.

Can you share which suppliers have you been burned by? We've seen counterfeit Prolific parts, but I'm not aware if we've had similar problems with FTDI hardware.

I don't remember, unfortunately, as I wasn't the one handling logistics, I just spec'd out parts we needed. If anything, here's a bit of advice: don't ever assume your supply chain is 100% clean - more than once we've been sold thousands of dumpster-grade components by a seemingly very reputable company. And don't ever assume suppliers won't try a bait-and-switch; we've been bitten by this more times than I can count.

Can the bigger names like Digikey be relied on? Or do their suppliers sometimes give them bad parts, too?

My point is that even if your supplier plays fair, the manufacturer may change the designs slightly and you either dont't get the PCN or fail to understand the implications of a change. We had a case once where a manufacturer added a minuscule hole on the case of a relay, presumably a pressure vent. Our devices are potted in resin in order to withstand rough conditions; in this particular case the resin penetrated the relay and blocked the contacts. We were halfway through a production run before anyone noticed. Needless to say, the run had to be scraped because there's no economical way to remove the resin once it's hardened. All for a saving of a few cents per relay.

QA lesson: new batch of anything = tight quality control. When you go into manufacturing, it becomes a game of numbers. Manufacturing 10000 of whatever is statistically bound to generate some duds and failures.

Speaking of prototype/hobbyist amounts: stick to the big suppliers and you should be safe, but if something feels fishy, don't be shy to order the part from a different supplier. Generally, identifying counterfeit parts in prototype amounts can be tricky without prior experience because your manual soldering skills will always be sub-par to a pick'n'placer; same goes for ESD precautions, accidental shocks applied to the circuit, passerby finger-pokers and your curious boss.

Thanks so much for clarifying the specifics of what happened to you; this is one of those situations that's hard to imagine, making it that much harder to envisage exactly what can go wrong, where and how.

What I'm wondering now is whether it's possible to somehow financially insure a specific level of quality control (translation: sticking to the blueprints!) in a way that doesn't scare everyone off. I'm guessing the scrapped run had to be eaten on your side? :/

Whenever I see an inexperienced young hardware startup striving to build even just 1000 of some widget, I feel sorry for them. They often go twice over the planned budget both in terms of time and money... and only after several screwups they hire some experienced guys to help them with the process.

The answer to your question? It simply experience. Start small, try to build bigger things, and put in actual time and effort to learn. Don't cut corners; learn how the big industry does things (and especially: why). Don't guess tolerances and sizes: find relevant standards. Read datasheet thoroughly and with comprehension. Ask your assembly house for guidance. Get a book or two on process & quality control.

The effort you invest will pay itself off sooner than you think. Not in revenue, mind you - but greatly reduced losses and delays.

PS. The world of manufacturing is wonderful. It's vastly different from programming, as it involves much more interaction with suppliers, vendors, teams and assembly line employees - but the feeling of holding a finished product in your hand is worth it.

Thanks very much for this info.

For a few years now I've wanted to build a handheld device that captures the essence of Lisp machines, Forth, and systems like the Commodore 64 and Canon Cat, in a portably accessible/usable form, wrapped in a highly pocketable but ruggedized enclosure similar to the old Nokias that lasted forever. I envisage it primarily as a teaching device and something people could hack on for fun, but the whole idea has never been especially practical or marketable. Now I know what it might be for (when I have a bit of money) - manufacturing education :) since device production has always been something I'm interested in and I do want some experience.

I also want to build a handheld device with a 2G+ baseband, secure boot, and an open-source firmware (perhaps seL4, most definitely not Android). The possibilities start with end-to-end encrypted SMS and trail off infinitely. I haven't really thought of what might be possible; I'm just stuck on the academic problem of secure boot - which is quite an issue, as not even Apple (just checked, $641B valuation right now) seem to be able to get this right: https://ramtin-amin.fr/#nvmedma. I'm saddened by the fact that all secure boot implementations seem to either be NDA-laden, based on security by obscurity, or both. I'm yet to find something I feel would be hard for even someone with a very very large pile of money (for arbitrary scaling of "very very large") to break. I realize that given infinite money everything is breakable, but current "secure" defenses seem to fall over much too readily IMO. (Eg, secure boot implementations have to have test modes; have these passed stringent code verification? A properly-formed legal case could subpoena any secure boot implementation's source code. This is assuming the likely-overly-idealistic case where there are no deliberate backdoors.)

My advice? Start small. You might want to build a car, but in order to do this, you first need to build a dozen skateboards before moving on to bicycles. As for Secure Boot - it takes expertise in breaking things to build something unbreakable. I have broken commercial copy protection using nothing but an adjustable power supply and some assembly code - protection that, on paper, seemed "good enough". If you think you can build something secure without decades of experience, you haven't really understood how much power a determined engineer equipped with a fast FPGA wields over your puny electronics.

I was curious what you'd say :)

As for the first idea I mentioned, disasters there could be easily tolerated since it's just a side-project thing, so using it as something to (slowly!) work towards could be interesting.

With the Secure Boot idea, I now understand that this would absolutely need to be a group effort, and I'd need the (great) assistance of others with significant experience in security for it to work. That makes perfectly logical sense now I think about it (I'm crazy for thinking I could manage it on my own...) - now I know what direction to go in! (And also the fact that I need to do quite a bit of thinking about this.)

I must confess my curiosity at the type of copy protection you referred to. I was thinking you electrically glitched the EEPROM in a dongle, but that doesn't explain the asm code.

Thanks again.

And what you say about fast FPGAs vs puny electronics is very true :D - and in all fairness, Apple haven't been in security for very long.

PS. You seem to me like a person bent on building The Next Big Thing, reading up on things, accumulating knowledge, having big expectations... I used to be like this most of my life. If you want to actually cash in on that knowledge, you need to BUILD THINGS.

Want to get into hardware security? Buy a hardware glitcher, break some chips, write up on it. Find out it's been done before. Feel confident that you can now break harder, better protected chips. Try it. Succeed or fail. Repeat.

Thinking about secure enclaves? Implement one for some hardware of your choice. Document it. Put it up on Github. Submit to Hacker News. Get feedback. Repeat.

Dreaming about a C64-style machine? Get a devkit for a suitable platform. Write some kernel code. Write examples. Breadboard a second prototype. Design a PCB and an enclosure, have it 3D printed. Heck, get a 3D printer yourself and use it all the time. Write a game. Play it until your fingers hurt. Find out how to build a better keyboard that doesn't hurt your fingers. Get a graphic designer to make some advertising templates. Ship one piece of it and bask in glory for five minutes. Pack the whole thing in a cardboard box, stash it in the attic and go thinking about the next one. Repeat.

The important part? Get something from the idea to a finished thing, repeatedly. It doesn't have to be big - but it has to be 100%. Getting it "just working" and moving to the next big thing won't cut it. There's no other way.

(Took me a bit to get back to this)

I used to have a really bad case of The Next Big Thing, but I've slowly started to come round to the idea of taking the time to study what already exists and consider where I might be the one who needs to learn and adapt. I've only just started with this train of thought, but I think this mindset is one critical of the process of doing things that are accessible and successful.

Someone once told me that to get anywhere you have to come up with a pie-in-the-sky idea that's absolutely crazy and then go for it. While taking that literally is a recipe for superfast burnout, it seems to me that that mindset tends toward system-based rather than goal-based motivation so might have some reasonable benefits for creativity and creative discipline. Not sure. Still figuring it out.

I definitely am interested in absorbing as much as I can. I've been figuring out how to build a tab/bookmarks/history-management extension for a while now, hopefully I get the courage to start (Chrome's APIs are so verbose and complex, and JavaScript requires so much boilerplate, I can't say I like it). But I currently have 652 tabs open that I need to bookmark and close, and something like 20k bookmarks that I need to tidy up (!), so it's on the todo list. Heh.

The first time I heard about hardware glitching was "Many Tamagotchis Were Harmed in the Making of this Presentation", https://youtu.be/WOJfUcCOhJ0. (I also just discovered and watched the update, http://youtu.be/mCt5U5ssbGU.) That was fun to learn about, but now I realize this sort of thing is widely applicable it's even more interesting. Thanks for the headsup! The concept of hardware glitching is something I've been interested in for a while actually.

It never occurred to me that I could implement a secure enclave myself. I thought you needed a secure processor for the design to even be viable. I'm only aware of things like the ORWL secure PC (https://www.crowdsupply.com/design-shift/orwl, https://www.orwl.org/wiki/index.php?title=Main_Page), which does use a secure processor (http://www.st.com/en/secure-mcus/st33g1m2.html) to manage the system's various security features.

It's mostly my complete domain ignorance, but I can't envisage a way to build a truly secure processor setup, mostly because of limited access to secure parts. A fast OTP microcontroller with enough space for a burnt-in key and the ability to interface with external Flash could work, but if I just used this for storage I/O going to another CPU, you could simply tap the bus lines to achieve untraceable information leakage.

The secure processor would need to deal with everything between keyboard input and LCD update, and only output encrypted data to the 2G radio. The chip I linked is only 25MHz, which would make for quite a limited device. It most definitely would work - I have an Ericsson MC218 that's that fast, and the EPOC OS (forerunner of Symbian) on it is incredible - but it would be much more accessible if the CPU were 250MHz or so instead. I'm not aware of secure processors that are that fast - and does the chip I linked even require an NDA to use? It doesn't look like it but I wouldn't be surprised if it did.

Ideally, I'd love for a way to use one of SiFive's RISC-V chips as the secure processor when they release their high-speed designs. But implementing secure boot on one of those would depend on both how the chip is designed (eg, boot sequence sensitivity considerations) and how the chip is physically constructed (I expect RISC-V chips with active meshes etc will eventually exist).

My pie-in-the-sky step-up from this basic concept would be to make a dual-chip system, with a secure CPU sitting alongside an off-the-shelf ARM CPU running Android. The secure CPU can take control of the screen and keyboard in such a way that the ARM CPU cannot attack (the keyboard would be easy - just route all keyboard I/O through the secure processor - but I fear that wielding the video side of things would be incredibly nontrivial to implement securely). Then when you wanted to do secure tasks you can simply tell the system to switch to the secure processor, which takes over the screen and keyboard until you tell it to return control to Android.

My ultimate goal would be a secure processor fast enough to capture medium-resolution video (something like 640x360 max, to begin with) from a camera module, and then play it back on an LCD, all without any sensitive data leakage (or depending on external processors that would require that). Ideally I'd like to go higher, but I think these are reasonable (beginning) expectations for a device that I would rather not put a GPU in. (Yes, crazy, but GPUs require NDA'd firmware, so the best ever case scenario I could manage is getting access to the BSP source and looking it over, but I'm most definitely not a security researcher, so I don't consider it viable. I can get away with Wi-Fi+cellular because the data going over that would already be fully encrypted with keys those chipsets cannot access, regardless of how malicious they are.)

Regarding the handheld not-quite-sure-what-it'll-be-yet thingy, the keyboard has been my biggest perplexion for a while. :) Tactile switches with low force actuation is one simple solution, but will never feel as professional as a proper rubber-dome actuator setup or similar. I've never used one, but the original Blackberry (pager) looks really close to what I want (in fact I've heard that thing runs on an 80386-compatible CPU - not sure of the manufacturer - and that there was once a devkit for it floating around and generally available). I wonder whether it uses a rubber actuator system or tactile buttons.

I completely understand your closing key point about actually manufacturing stuff though. I've gone for a very long time with just pondering and wondering, and no actual iteration, and I can't help but acknowledge that there is a lot of truth in the idea of "quantity over quality" - or more accurately our brain's ideas of "quality."

The idea that studying a subject with the notion that improving our understanding of that subject will make us better at it does hold true for a lot of areas and domains, but I think it tends to break down in a lot of the creative process. The process of making - whether that thing is something intangible like a piece of software, or a physical product - is generally something that must always be learned as a discrete subject nowadays. Unfortunately, this seems to be a rather hard idea to grasp, and there's a bit of a learning curve to it.

We depend on so many tools now, and those tools have developmental and process histories of their own that we need to appreciate in order to take the best advantage of those processes.

But our brains are likewise tools, and to use them most effectively we have to figure out how they work best. That process is a bit like jumping off a philosophical/psychological cliff :)

As for practically running off with any of these ideas and actually getting started with them, that's a ways off yet. I unfortunately don't have the budget for those things right now due to fun, expensive medical issues that make it impossible for me to get a job (yeah).

I'm going to keep it short and sweet - not because I don't care; conversely: I do and I want to get the message through. This is an "I'm an old hacker and I'm here to set your straight" message and it's not gonna be pretty. You're free to disagree; I'm not here to argue, only to offer a heavy bit of advice.

1. Your P/PC balance is completely lopsided. You seem to be focusing only on acquiring ideas and knowledge but not actually using them.

2. 500+ tabs, 20k bookmarks? Are you aware that at this rate you'll never get anything done because consumption of information will take 100% of your life, with an ever-growing TODO list of things to read? This is borderline addiction.

3. Execution is a skill. If you ever tried to do any of the things you read so much about (as opposed to just reading & talking), you'd find you completely lack experience in doing. You seem to be living under an illusion that you're acquiring skills. You are not.

You sound smart. Very smart. Almost too smart for your own good. But intelligence - and knowledge - is not enough. You need to jump off that psychological cliff before you build it up so high the fear will stop you from ever making the first step.

Having said that: close your web browser. Open your editor. You already have enough inspiration; now you need code. That's all you'll hear from me.


PS. If you follow my advice and start building things instead if just thinking about it, you'll find your creations don't even begin to live up to your expectations. That's normal; it simply shows the discrepancy between what you are and what you could be.

First time I've ever seen "That comment was too long." on HN.

This is part 1 of 2.


Mentoring is something I'm admittedly a bit lacking in, so this is highly appreciated! The to-the-point approach is an even bigger benefit.

I'm not sure if you'll respond to this - it isn't needed, unless you want to continue this conversation (even in a few weeks or months, maybe) - but I actually agree with most of what you've said.

> 1. Your P/PC balance is completely lopsided. You seem to be focusing only on acquiring ideas and knowledge but not actually using them.

Ah, production vs. production capability. Very interesting concept.

Quite some time ago, when I didn't have a mental map of a new thing, I would glitch out and keep trying to find the edges of that thing so I could conceptualize it, predictably and consistently getting stuck in infinite loops until I'd explode from stress. My ability to summarize has historically been horribly broken, and the side effect of that here was that it took me way too long to realize that a lot of things cannot be summed up without relevant mental mnemonics already in place - so mental-mapping must be multi-pass.

This meant that I was atrociously imbalanced (like, practically vertically so) toward acquisition/observation/spectation over participation. In my case I did want to participate, but my attention span didn't permit me the mental stack space to automatically create and interconnect component details as I went along, making me simply unable to parse some subjects.

The sub-problem was my lack of a toolkit to use to get past the "bah, that particular detail is BORING" phase with certain things. I have quite a backlog of things I need but don't have available because of this...

For example, I still don't know assembly language (I only just recently realized that I saw learning a language as learning its grammar, while asm is all about CPU architecture, which I was never looking at) and I also don't know basic math.

Also, I was standing in a store a while ago completely stumped about what buttons to push on my calculator to figure out how many grams of X I could get because I had $Y to spend. I did figure it out in the end but I don't have any sort of mental map of how to do these tasks because my brain doesn't find them interesting enough to focus on.

An aside: I tried to optimize my (re)typing so typed "production{,} /capability" before. That didn't really work; a) bash doesn't let you remove the space in comma expansion so this canonically doesn't work, and b) typed out like that it isn't very clear and visually looks terrible. I think I inadvertently proved your point before I got 4 words out. lol

> 2. 500+ tabs, 20k bookmarks? Are you aware that at this rate you'll never get anything done because consumption of information will take 100% of your life, with an ever-growing TODO list of things to read? This is borderline addiction.

It definitely looks like that, yes. Some clarification!

This is actually because I'm using a ThinkPad T43, and Chrome on 2GB RAM and a single-core <2GHz CPU doesn't tolerate hundreds of tabs very well. I think my real maximum working tab count is around 50-100 tabs or so, but what ends up happening is that bookmarking those tabs gets uncomfortable after only about 10 tabs are open, because opening the bookmark folder selection popup (I use Better Bookmark) means Chrome has to spawn a new renderer, an operation that makes the system swap to death and can routinely take 10-15 seconds (sometimes 30+ seconds or more). Unfortunately it's easier to just suspend the tab (with The Great Suspender) than do this.... oops, now I have 731 tabs open. Except 680 of those tabs are actually Sad Tabs now because Chrome's broken malloc decided it didn't have enough memory (with only ~1.3GB of my 7.8GB of swap in use...) and it killed all my extensions, and The Great Suspender has no functionality to detect and reload "crashed" tabs when it restarts, and fixing it manually makes the system swap to death easily for 10 minutes (yep).

TL;DR: Chrome encourages me to suspend and forget about tabs rather than get back to them and sort them out. I argue that because no work is being done to fix this, it IS kind of deliberate. But would there be a way to fit into a bug report? No. :(

The real issue is that The Great Suspender is easily 1k+ SLOC because JavaScript, "modern" OOP, and edge-case management immediately lead to verbose, hard-to-learn code. I've looked at the code and it would be quite outside my comfort zone to maintain it.

So, in the end, I'd need to make my own extension - which would need to be a rewrite, since I kinda dislike the GPLv2 for productivity stuff like this, I also don't want to wind up as the maintainer for this extension, and I need an integrated bookmark manager+tab manager+tab suspender, so I can do things like bookmark suspended tabs and get the right thing, unload/close a tab but keep it in a "read later" list, bookmark things out of that list, etc etc.

I'm at the point where I can't deny that I need to do it. I'm working on a crawler for a website that's technically already shut down so I can try and get the data off it - or, more accurately, going round in circles where I can't focus because I don't know whether the site will really shut down in 10 minutes or next week or whatever, and it's messing with my motivation - but once that's done I think I'll be starting on this.

First time I've ever seen "That comment was too long." on HN.

This is part 2 of 2.


> 3. Execution is a skill. If you ever tried to do any of the things you read so much about (as opposed to just reading & talking), you'd find you completely lack experience in doing. You seem to be living under an illusion that you're acquiring skills. You are not.

This was actually exactly what I was trying to say before. You said it a lot more succinctly than I did:

> The idea that studying a subject with the notion that improving our understanding of that subject will make us better at it does hold true for a lot of areas and domains, but I think it tends to break down in a lot of the creative process.

You make an undeniable point. I also noted that:

> Unfortunately, this seems to be a rather hard idea to grasp, and there's a bit of a learning curve to it.

and I wish I was making faster progress...

> You sound smart. Very smart. Almost too smart for your own good. But intelligence - and knowledge - is not enough. You need to jump off that psychological cliff before you build it up so high the fear will stop you from ever making the first step.

Thanks. I've had exactly this problem for quite some time. It actually got to a point where I nearly became fully mentally detached and went off the deep end - I was thinking about ideas I had until I'd find a hole somewhere, then scrabble around frantically until I found the first thing that sounded like it would fix that problem, at least in theory. Do that for long enough, without any groundedness, going entirely off of "reasonable guesses".... welp. :D I've thankfully moved past those anxiety issues!!

In my case the psychological wall is built up as a side effect of another process: the fact that my attention span is like a broken bicycle that I can be pedaling as fast as humanly possible, but which will gradually slow down halfway up the hill, stop, and begin rolling backwards (all while I'm pedalling at crazy speed). So no matter how much interest I have and no matter how much effort I invest (my current project, the crawler, being a textbook-for-me case in point) I always roll to a stop.

This has perplexed me for years - depression/mood doesn't quite nail it, since I can crack up at stuff on Imgur and Reddit all day (well, not all day, those websites are like chewing gum, they dry out after an hour or so at the most), and my perspective is not predominantly dark/black, which I would think is a prerequisite for behavior that could be argued looks like "giving up."

I've learned a bit about the foundational health issues behind my autism, OCD, nutrition absorption problems, brain fog, etc etc, and made some good progress with correcting those problems - particularly issues with mental clarity - but I still have quite a ways to go, as I've noted above.

> Having said that: close your web browser. Open your editor. You already have enough inspiration; now you need code. That's all you'll hear from me.

Oh yeah, I've been thinking of writing a text editor for a while now... :P

In all seriousness, my motor coordination is terrible (I use two fingers to type with, and sometimes my muscles jump) so text editors with complex shortcuts involving multiple keys or key sequences that must be executed perfectly are a deal-breaker for me. Stuff like CTRL+S is my current comfort-zone limit for keyboard shortcut complexity, although I wouldn't mind something like making the Shift or Ctrl key itself save too. If I don't use a function as frequently then I don't mind, but I save almost obsessively (I use file alteration watching to rerun my code) - I actually just hit ^S while typing that :D (I don't usually do that in Chrome, lol) - so I prefer "single-chord" or single-step keyboard shortcuts. I never used WordStar when I was younger, I guess?

I don't like that it's impossible to completely filter out the religious pretentiousness of emacs and vim, which both have their pros and cons. But vim is installed by default in most places, and I can see effort was made to give it user-friendly default keybindings, so it's what I learned (or more accurately, know I'll be able to use without learning :P). emacs is essentially where all IDEs got their inspiration, so is associated with carefully-finetuned installation and configuration, and (arguably) associated themes of fragility. I get a very "this UI is a carefully designed optical illusion" vibe from emacs, like the last time I ran it and played with the package installer I discovered that the entire UI locks up while it's doing network requests (IIRC). Fun.

So yeah, I want a simple editor that follows widespread traditions, but also one that offers some obscure things like realtime syntax highlighting/formatting similar to QBasic's editor, which I've not found in any other environment (!).

> PS. If you follow my advice and start building things instead if just thinking about it, you'll find your creations don't even begin to live up to your expectations. That's normal; it simply shows the discrepancy between what you are and what you could be.

I really really like this way of interpreting this. It's very motivating. Thanks!

Btw, I followed you on tumblr. :P

> I must confess my curiosity at the type of copy protection you referred to. I was thinking you electrically glitched the EEPROM in a dongle, but that doesn't explain the asm code.

Load exploit/dumper code over JTAG, then glitch the CPU into thinking there's no JTAG connected, making it run the code with full permission level. As simple as that. It was all written in the datasheets and reference manuals - if you knew what to look for and how to combine the knowledge.

Ah, I almost figured out what the variable power supply was for :)

So searching "CPU glitching" didn't do much, but "CPU voltage glitching" found me lots of results.

I realize all you need is a variable voltage supply, and maybe (?) something to easily inject voltage +/- pulses (within a given voltage limit) and that a lot of glitching stuff is probably unnecessary, but it might be useful for learning.

And yeah, making (often lateral) connections between disparate pieces of information is often what makes the difference. I think it's mostly about exposure to a given field to get really good at that. Guess I should get started :) ...soon.

Speaking out of curiosity, can you name some companies that provide NDA-laden USB-serial bridges? I'm painfully aware that it's hard to even know such products exist, especially when your production volume barely runs into hundreds or thousands and you're not established in the industry... and from my limited interactions with even the smaller players like SIMCom or SiRF, it takes an NDA just to talk to someone who can get you to sign a second NDA so you can see datasheets. Even if you have tons of money, giving it to such company in return for their product can be a challenge.

> FTDI is not and never was especially big in the USB-to-Serial market

... whaaa? They're probably the biggest company in that market, closely followed by Prolific (PL2303).

> ...and their chips tended to implement CDC in variously broken ways

No, their chips implement a proprietary protocol. Not CDC at all.

isn't one reason that RS-232 is an open protocol but USB requires licensing ? I thought it cost money to get an official VID/PID from the USB organization ?

Funny, had to learn all this stuff for my Master's thesis as it was a crucial part of my project to provide reliable shell command exchange via serial connection. It was really really hard to find anybody who knows anything about this lower network level and terminals.

What I can add for everybody who feels the same disappointment as ESR: It's very common for a growing community that three things happen.

A) The number of people with just a little knowledge over the holy grail of your community increases.

B) The popular communication is taken over by great communicators who care more about their publicity than your holy grail.

C) This gives the impression that the number of really cool people decreases. And that is depressing to old timers. But it's in fact often not true. Actually most often the number of cool people increases too! It's just that their voices are drowned in all the spam of what I like to call the "Party People" (see B).

So yes, you can actually cheer. It's harder to find the other dudes, but there are more of them! Trust me, I'm not the oldest guys here but I've seen some communities grow and die till now, and it's nearly always like that.

And these days

D) the B)s use various "social" tactics to tar and feather A)s that get in their way...

I've seen this kill entire companies in an extremely slow, painful death. I used to fight this... These days, as soon as I see a Chief Architect or a CTO start to social their way to some technical goal, I just leave the company - for it's a very clear signal that an engineer is a second-class citizen there. No point in waging that battle.

As someone still learning about the finer points of social interaction (and sorely lacking in experience), what would you say are some of the signs that this sort of thing is happening?

As having seen and being part of the start, rise and fall of certain scene of genre of music, this article provided a great piece to reflect upon how and what really did happen :)

This was so entertaining & insightful to read I ended up buying the actual book [1]. Thank you.

[1] https://www.amazon.com/dp/B00F9IV64W/

Burning Man ?

When you find swaths of knowledge that younger people don't know, you've found success in the overall human goal of abstracting concepts and building on the shoulders of those who came before us.

I'm not suggesting the article is a, "Gosh, Millenials!" conversation. I just get a warm tingle when reminded that I have absolutely no clue how to do something people did just a generation ago, and I don't need to. It's success!

Then you'd probably enjoy watching two TV series by James Burke: Connections and The Day the Universe Changed. They explore the driving forces behind many of the major technical inventions in the past 800 years -- how the status quo created a void that invention arose to fill.

The videos may look a bit dated now, but the content is amazing and Burke is terriffic.


Every time I think about this series that used to air on TLC I just get so depressed at what TLC has become.

It once really was The Learning Channel

I was confused on why there were funniest home videos on National Geographic, too...

Commercial television for you.

I recall running into those on late night BBC sat broadcasts.

The best part is perhaps Burke himself, and his very very British presentation style.

Still, I have a strange feeling that our whole technological edifice is standing on the head of a pin. When the folks pass who know this stuff, and something breaks, it'll be a while before things get running again. Hopefully before the food riots.

This stuff won't be forgotten. It just doesn't need to be known by everyone wanting to so something with software.

It's really no different than COBOL. As long as there's value in knowing it, there will be a small number of people who can command a large paycheck for that rarely-useful knowledge.

At least that's what I comfort myself with as I gaze over my stash of DB25 connectors and Z80-SIO chips...

I'm curious what these are used for.


For those curious like myself (heavily elided for smaller wall of text):

The Z80-SIO (Serial Input/Output) ... basic function is a serial-to-parallel, parallel-to-serial converter/controller ... configurable ... "personality" ... optimized for a given serial data communications application.

The Z80-SIO ... asynchronous and synchronous byte-oriented ... IBM Bisync ... synchronous bit-oriented protocols ... HDLC and IBM SDLC ... virtually any other serial protocol for applications other than data communications (cassette or floppy disk interfaces, for example).

The Z80-SIO can generate and check CRC codes in any synchronous mode and can be programmed to check data integrity in various modes. The device also has facilities for modem controls in both channels, in applications where these controls are not needed, the modem controls can be used for general-purpose I/O.

What's really interesting is this bit:

• 0-550K bits/second with 2.5 MHz system clock rate

• 0-880K bits/second with 4.0 MHz system clock rate

110Kbps at 4MHz. That's almost the 115200 baud we're all familar with. At 4MHz! (2.5MHz yields 68750bytes/sec, or 67.13Kbps.)


Also - rather amusingly, I discovered that the MK68564 datasheets ripped off the Z80-SIO's intro text verbatim. Are these compatibles or something completely different? https://www.digchip.com/datasheets/parts/datasheet/456/MK685...

Not familiar with those, but it's likely they licensed Zilog's design.

Just like when all the horse-tenders passed and nobody knew how to plow fields with horses anymore

We're in a different situation than the horse-tenders. Much of our technology is built on top of this older tech.

It'd be like if we moved from using horses to using giant machines made out of glued-together living horses, and then all the horse-tenders died.

It can take weeks or months to figure out low-level stuff. Even a college education. If something critical depends upon it (and pretty much all critical systems do these days?) then we won't have time. Before the meltdown/sewage backup/food riots.

> When you find swaths of knowledge that younger people don't know, you've found success in the overall human goal of abstracting concepts and building on the shoulders of those who came before us.

Abstracting, yes, but I don't know about building-upon. The thing is, a lot of this stuff is still sitting around beneath the covers, and someone needs to understand it.

Even worse, sometimes there's stuff that's abstracted over that's important, e.g. if the Excel or 1-2-3 teams had known about Field/Group/Record/Unit Separators, would they have ever come up with CSV?

Or the fact that SYN can be used to self-syncronise, due to its bit pattern …

> Even worse, sometimes there's stuff that's abstracted over that's important, e.g. if the Excel or 1-2-3 teams had known about Field/Group/Record/Unit Separators, would they have ever come up with CSV?

Neither team had anything to do with CSV which originates in Fortran's list-directed I/O. And there is no field separator (which would be redundant with unit separator), FS is the file separator, these codes were intended for data and databases over sequential IO, in modern parlance a group is a table.

This exactly! I like to make the car analogy:

I have no idea how my car works. I mean, I more or less understand the principles underlying the internal combustion engine, but I wouldn't be able to service one, much less assemble one. But I don't need to. Typically, the only indication made available to me that something is wrong is a single bit of information ("check engine light"), but that is enough. You don't have to be a "car person" to make effective use of a car. I get in, I go, and well over 99% of the time that's the end of the story.

Compare this with computers. When something goes wrong, it's usually vital that you (or your users) relay the precise error message (and God help you if there isn't one). You generally have to be a "computer person" to some degree to make effective use of a computer. If you are unconvinced by this comparison, contrast how often your family asks you to perform [computer task] versus how often you would approach a mechanic family member to perform [car task]; Contrast how often you hear "I can't do this, I'm not a car person" versus "I can't do this, I'm not a computer person".

I consider swaths of modern hackers who simply don't know about much of ASCII as evidence of babysteps towards computers maturing as a technology.

I understand your argument, but I would disagree with a few points:

First, we use computers for so many more things than cars. The average user does really well with basic tasks like checking their email and simple word processing. This would be daily driving in your car analogy. Occasionally things blow up, but that isn't too different from a major problem with a car. However users are constantly trying new things with computers, new programs, websites, and tasks. Car-owners who are constantly trying new things with their cars have as many problems, if not more, than the average computer user. The difference is that the people who use the full range of their car's capabilities are deeply interested in their vehicles.

Second, abstractions like the check engine light are far from perfect. How do you know whether the light signals imminent failure or a minor inconvenience? What additional information is needed for the mechanic to diagnose the problem? I recently chased down a problem in my car that caused the check engine light to come on with a code that was physically impossible. It took a few weeks of careful experimentation and instrumentation before I was able to figure out what it thought was going on. This was a case where I absolutely needed more than a cursory knowledge of how my car works.

I also think that a hacker should be similar to a amateur mechanic: although their car might be fuel-injected, they have a cursory knowledge of how a carburetor works. They may have an automatic transmission, but they understand what a clutch is. Compare that to many developers who have never set foot outside their niche; They have never used a radically different programming language or a different OS. They've never taken the time to dig into the layers beneath the one they use. I would argue that is a weakness. How will you ever debug a problem when it inevitably occurs in the layers beneath you?

I have no idea how my car works.

I find this weird. As I proceeded through Comp sci in high school, going from Pascal to C to assembler, I was always troubled by my lack of understanding, "but why does it work?" That anxiety finally disappeared in college when I learned how logic gates are constructed and went through the exercise of implementing multiply as a series of logical operations.

Similarly, I find it strange that someone would be comfortable driving a car without fairly deep knowledge of how it functions and how to repair it. I don't understand how you're not plagued with anxiety.

> If you are unconvinced by this comparison, contrast how often your family asks you to perform [computer task] versus how often you would approach a mechanic family member to perform [car task]; Contrast how often you hear "I can't do this, I'm not a car person" versus "I can't do this, I'm not a computer person".

I think this is more about social norms/conventions than anything. I would never ask a family member who happens to be a surgeon to remove my gall bladder for me, or a family member who happens to be a mechanic to replace my clutch over the weekend. But for some crazy reason, it's perfectly acceptable to ask your "computer person" family members to spend hours removing the 1200 malware infections you got by installing that cute puppy toolbar you downloaded. The complexity of the tasks doesn't have anything to do with it.

> I consider swaths of modern hackers who simply don't know about much of ASCII as evidence of babysteps towards computers maturing as a technology.

Nobody else has argued on this point yet so I'll throw my 2¢ in. (An aside: I had to look up the codepoint for ¢ - 2A - so I could use it. I haven't memorized ASCII yet, let alone Unicode.)

In my opinion, things like the first 31 characters of ASCII, line discipline, NIC PHY AUIs, the difference between RS-232 (point-to-point) vs RS-485 (current-loop), how to make your Classic Mac show a photo of the developer team (hit the Interrupt key then input "G 41D89A"), or how to play notes on period VT100s (set the keyboard repeat rate really high); we've moved into an era where Web stacks reveal hard-to-diagnose bugs in nearly-40-year-old runtimes (Erlang), Apple will give you $200k if you extract the Secure Boot ROM out of your iPhone (in one person's case via a bespoke tool that attached to the board and talked PCIe), the UEFI in Intel NUCs is such a close match for the open-source BSP that Intel releases that it's a lot easier than everyone would like for you to make UEFI modules that step on things that shouldn't be step-on-able and let you fall through holes into SMM (Ring -2), and few people care that sudo on macOS doesn't really give you root-level privileges anymore.

We've just replaced all the old idiosyncrasies with a bunch of more modern idio[syn]crasies. You're right that technology has matured, but this has unfortunately meant that a lot of the innocence we took for granted has been lost. Things aren't an absolute disaster, but it's more political now, and we have to keep on our toes. Computers aren't universally somewhere we can go to to have fun; we have to work to find the fun now.

(Also, I just did that thing I often do with forums - I expect my reply to appear at the end of the thread, so I go to click on the reply button at the end. But that would reply to the comment at the end of the thread, not yours. This is a problem endemic to forum UI and not a HN issue. Not all aspects of computers have matured yet, not by a long shot.)

> how often you would approach a mechanic family member to perform [car task];

I guess it depends on where one live, as i see that happen all the time (either family, friends, or neighbors).

Many of the control codes are still in active use today in the air-ground communications protocol spoken between airplanes and Air Traffic Control.

The ACARS[0] protocol I work with every day starts each transmission with an SOH, then some header data, then an STX to start the payload, then ends with either an ETX or an ETB depending on whether the original payload had to be fragmented into multiple transmissions or fits entirely into one.

These codes aren't archaic and obsolete in the embedded avionics world.

[0] ACARS: "Aircraft Communications Addressing and Reporting System" - see ARINC specification 618[1]


Here's the origin of that - The Teletype Model 28 "stunt box".[1] This was a mechanical state machine, programmable by adding and removing metal levers with tines that could be broken off to encode bit patterns. These were used in early systems where a central computer polled mechanical Teletype machines in the field, and started and stopped their paper tape readers and other devices. Remote stations would punch a query on paper tape and put it in the reader, then wait until the central computer would poll them and read their tape. This was addressable, so many machines could be on the same circuit. Used in 1950s to 1970s, when mainframes were available but small computers were not.

[1] https://www.smecc.org/teleprinters/28stuntbox001.pdf

Thanks so much for sharing this. This is exactly the kind of TTY history I've always been looking for.

Those DB9 and DB25 connectors are still kicking around the bottom of my toolbox.

Why is DEL's bit value 0xff (or 0255)? Because there was a gadget out there for editing paper tape. Yes. You could delete a character by punching out the rest of the holes in the tape frame. I used it once. It was ridiculous.

And don't forget the lace card - a card punched full of DEL. Having every single hole punched, it was so fragile it would crumple up and jam the reader. People today think they're smart 'cause they invent things like DRAM rowhammer... it's all been done before, kids. ;)

Ohh no.

Was there anything that created these lace cards as part of normal operation? I'm guessing not, considering the ramifications.

What about programs that did this when they encountered bugs?

LOL, I swear it feels like the answer to every "why is this weird computer thing this way?" question I see is "because we used to do it this way on punch cards."

Wait until you find out that "coder" originally meant "someone who encodes messages into Morse". And if you start digging into the word "code", you'll find it comes from latin "codex" which is a mutation of "caudex" meaning, literally, "tree trunk".

Because back then, people would write on wooden tablets covered with wax.

So.. next time you see a kludge and think of "historical reasons", consider that "historical" goes back much farther than 20th century. :)

Huh. I remembered from school that "caudex" was a reasonable translation for the insult "blockhead"... now I know why!

And, what did the word "computer" mean back in the day? Hint: they were mostly women. And, they defeated the Nazis in World War Two, with the help of Alan Turing and his crew.

Not just a gadget, but teletypes in general. (See e.g. https://en.wikipedia.org/wiki/Teletype_Model_33) You press a key, it punches the code on paper tape. If there are already holes, the new holes get punched over top. All holes is the only thing that can be punched over anything else with a consistent result.

The intent was that, when reading tape, DEL is ignored, because it's a position that was punched over with DEL, i.e. deleted.

Also, when punching tape, a key will punch its holes and (naturally) advance to the next character position. For DEL, that means you erase the character under the cursor, and then the next character is under the cursor. That is, it's a ‘forward’ delete. (I think it was DEC's VT2x0 series terminals that screwed that up for everyone.)

And note also that Teletypes and paper tape were the rationale for the ultimate "you asked for it, you got it" text editor - TECO. The basic idea was, if you had a paper tape with typographical errors, you could feed that and a correction tape into TECO, and it would apply the corrections. It eventually morphed into an all-purpose editor; DEC's VTEDIT was a screen-oriented editor written in TECO macros, and Emacs was originally implemented in TECO as well.

It's kind of appropriate that a typical TECO command line looks a lot like transmission line noise. :)

TECO = Tape Editor and COrrector

It's still a forward delete in Windows.

Nobody has mentioned the smell of the data.

The archive tapes were saturated with insecticide, so bugs would not be inclined to chew up your stored info.

There were also separate mechanical duplicators, plus multi-layer tape so that ordinary terminals could make more than one copy in real time.

For RS-232 electrical reliability, it's hard to beat a design intended to allow any pins to be connected to + or - 25 volts or ground in any combination without doing any damage to the equipment at either end.

Plus not restricted to the minuscule cable-length specifications of USB, and to reach the rest of the conected (by phone) world, the same EIA open-source digital protocol was just modulated/demodulated to analog audio upon send/recieve.

Remember RS-232 was always expected to be at least building-wide if not site-wide, depending on the size of the site.

Ordinary data communication at relatiely slow speeds has benefits that might as well be taken advantage of when they are needed.

Of course no code was absolutely required for any of these processes, but you could still seamlessly share ASCII files between Apple, Commodore, DOS PC's etc. using native COM port commands.

To get more speed between two points, on early PC's you could get software to multiplex more than one COM port to handle a single data stream over multiple signal pairs.

When needed, this would require and tie up multiple phone lines to reach off-site but it worked, plus it was the same technique on your own local copper but then it was more feasible to be always on.

Many buildings were originally equipped with top-quality AT&T/Bell copper pairs each dedicated to a separate signal for each (prospective) phone line to each office through its on-site relay box. At the time many of these pairs were rapidly becoming idle with the arrival of the modern office multiline phone which ran on fewer pairs, or had its own dedicated wiring installed at deployment.

With Windows 9x, COM port multiplexing was built into Windows, and with the arrival of the 115Kbaud UART's you could theoretically get 460Kbaud between offices by running four 3-conductor DB9 cables from the phone access plate on the nearest office wall, and using a PC having connectors for the full 4 COM ports which had become standard on motherboards.

You mean DE9.

DB series are the size of old 'parallel' ports. DE are the common width, like DE15 for vga.

From the article:

>Standard RS-232 as defined in 1962 used a roughly D-shaped shell with 25 physical pins (DB-25), way more than the physical protocol actually required (you can support a minimal version with just three wires, and this was actually common). Twenty years later, after the IBM PC-AT introduced it in 1984, most manufacturers switched to using a smaller DB-9 connector (which is technically a DE-9 but almost nobody ever called it that)

>Almost nobody ever called it that

I'm sad that "[FGRU]S ({Field|Group|Record|Unit} Separator)" didn't get much use, and instead we have to rely on tabs or commas (TSV / CSV), and suffer from the problem of quoting / escaping.

BTW, I use Form Feed (CTRL+L) character in my code to divide sections, and have configured Emacs to display them as a buffer-wide horizontal line.

It seems that so many programming headaches have the same root cause: the set of characters that compose "text" is the same set that we use to talk about text. Hence the nightmares with levels of quoting and escaping. The use of out-of-band characters like NULLs to separate pieces of text does help, but I don't think there is a complete solution. Because, eventually, we want to explain how to use these special characters, which means we must talk about them, by including them in text....

> Hence the nightmares with levels of quoting and escaping.

PostgreSQL has an interesting approach to this problem that I've found really straight forward and allows me to express text as text without getting into strange characters. What they've done is allowed using a character sequence for quoting rather than relying on a single character. They start with a character sequence that is unlikely to appear in actual text: $$, it's called dollar quoting. Beyond just $$, you can insert a word between the $$ to allow for nesting. Better explained in the docs:


What the key here is that I am able to express string literals in PostgreSQL code (SQL & PL/pgSQL) using all of the normal text characters without escaping and the $$ quoting hasn't come with any additional cognitive load like complex escaping can (and before dollar quoting, PostgreSQL had nighmareish escaping issues). I wish other languages had this basic approach.

Perl's had something like that for a long time: quote operators. You can quote a string using " or ' (which mean different things), and you can quote a regex using /. But for each of these you can change the quote character by using a quote operator: qq for the double-quote behavior, q for the single-quote behavior, and qr for the regex behavior. (There are a few others two, but I used these most often.)

    my $str1 = qq!This is "my" string.!;
    my $str2 = qq(Auto-use of matching pairs);
    $str2 =~ qr{/url/match/made/easy};
The work I did with Perl included a LOT of url manipulation, so that qr{} syntax was really helpful in avoiding ugly /\/url\/match\/made\/hard/ style escaping.

Perl is still, I think, the gold standard for quoting and string manipulation syntax. I am to this day routinely perplexed by the verbosity and ugliness of simple operations on strings in other languages.

(Of course, this may also be one of the reasons that programmers in its broad language family have a pronounced tendency to shoehorn too many problems into complex string manipulation, but I suppose no capability comes without its psychological costs.)

Yup, the 8085 CPU emulator in VT102.pl[1] uses a JIT which is essentially a string-replacement engine.

[1]: http://cvs.schmorp.de/vt102/vt102 (note - contains VT100 ROM as binary data, but opens in browser as text)

Perl also supports heredocs — blocks of full lines with explicit terminator-line:

  print '-', substr(<<EOT, 0, -1), '!\n';
  Hello, World

  -Hello, World!
iirc sh-shells also have that.

This seems like an awesome feature. I wish Python had something like it.

Python has triple-quoted strings which generally do the trick, and uses prefixes for "non-standard" string behaviours (though it doesn't have a regex version IIRC, Python 3.6 adds interpolation via f-strings)

    str1 = f"""This is "my" string."""
    str2 = """Auto-use of matching pairs"""
    str3 = r"""/url/match/made/easy"""

Yes, I've belatedly caught on to using triple-quotes to avoid some escaping. But I didn't know about the f-strings - thanks! (I'll be using those when I start using 3.6.)

Interesting, especially as I use PostreSQL. Unfortunately, "$$" is very common in actual text (millions of TeX documents, for example) as is $TAG$. But this could still work if you were careful to use TAGs that would never be found in text. But what if the document that you linked to itself had to be quoted? Would that lead to a problem?

I think it would be wrong to call it a perfect system or one created with the intention of so being. I'm sure in some disciplines, especially technical disciplines, you may well come across those sequences on a much more common basis... which sounds like your experience. Most of what I do is in mid-range business systems, after 20 years of professional life, it's something I've never come across. I suspect those sequences are fairly rare outside of specific domains and thus why that choice was made by the PostgreSQL developers.

Your question about self-referential documents and linking I don't understand; maybe an example. The PostgreSQL dollar sign quoting feature is simply a way to use single quotes (important in SQL) without having as many escaping issues. So instead of:

  SELECT 'That''s all folks!';
You could write:

  SELECT $$That's all folks!$$;  

  SELECT $BUGS$That's all folks!$BUGS$
And where it starts to save you in PostgreSQL is with something like (PL/pgSQL):

              PERFORM $quote$That's all folks!$quote$;
              PERFORM 'Without special quoting';
Note: this code produces nothing, it just should run without error (I ran it on PostgreSQL 9.4). In PL/pgSQL, the body of the procedural code is simply a string literal... but that means any SQL related single quoting would have to be escaped if we used single quotes. So using normal single quotes the previous code example would look something like:

              PERFORM ''That''''s all folks!'';
              PERFORM ''Without special quoting'';
And it gets worse as you get into less trivial scenarios... which is why I suspect this dollar quoting system was created to begin with.

I agree this solution handles a lot of common cases and makes the code easier to read than when forced to escape everything. I wasn't clear in my comment about self-reference. I meant that, suppose you are storing the text of articles in the DB (not a great idea, but it happens). The article (the one that you linked to) explains the $$ mechanism by showing how it works, so it's full of $$ sequences - the very sequences that we are assuming won't be encountered in normal text. That's what I meant in my beginning comment when I said that handling text that talks about our quoting conventions will lead to problems.

Ah, that's clearer for me.

There are a couple ways to handle depending on the scenario. If I were dealing with a static text under my control, say a direct insert of the text, I would either just enclose it all in traditional ' characters or come up with some unique quote text between the $$.

If I'm dealing with arbitrary text coming from, say a blogging website, I would either handle traditional SQL escaping in my input sanitizing code (or thereabouts) since I have to do that anyway ($$ is great for handwritten code where escaping introduces cognitive load, but not necessarily important for machine generated code) or I might create an inserting PL/pgSQL function with the article text as a parameter... that will get escaped without my having to do anything assuming I simply insert the text directly from the parameter.

> The use of out-of-band characters like NULLs to separate pieces of text does help, but I don't think there is a complete solution.

NULL is actually in-band, not out-of-band, and in fact it illustrates the issues with in-band communication you mention. That's what, presumably, ESC was for: a way to signal that the following character was raw and did not hold its normal meaning.

You can still devise a pretty good protocol with straight ASCII over a wire, using SYN to synchronise the signal, the separator characters to separate data values and ESC to escape the following character (like '\' is used in many programming languages).

> That's what, presumably, ESC was for: a way to signal that the following character was raw and did not hold its normal meaning.

Yes, ESC is a code extension mechanism: it means that the following character(s) are not to be interpreted according to their plain ASCII meaning, but some other pre-arranged meaning. Ultimately a shared alternate meaning for terminal control was standardized as ISO 6429 aka ECMA-48 aka “ANSI”. Free reading: https://www.ecma-international.org/publications/standards/Ec...

That this gave us keyboards with an Escape key that GUIs would repurpose to mean ‘get me out of here’ is a coincidence. (Plain ASCII had Cancel = 0x18 = ^X for that.)

MIT culture for historical non-ASCII reasons also referred to Escape as ‘Altmode’, which is ultimately how EMACS and xterms ended up with their Alt-key/ESC-prefix clusterfjord.

DLE not ESC https://en.wikipedia.org/wiki/C0_and_C1_control_codes#DLE

ESC is used for introducing C1 control sequences

We recently had to deal with an issue like this. My decision was to just sort of punt on the issue, and just base-64 encode the text. So there would be no shenanigans with escape character processing and such. The loss in efficiency was considered acceptable.

Form Feed would actually cause the printer to feed to the top of the next page. Original line printers had a carriage control tape that indicated to the printer where the top of the page was.

Also line printers using standard paper were 132 columns across and 66 lines down, which was 11 inches at 6 LPI. This matched the US portrait paper height and allowed for about 10 character per inch plus margins for the tractor feed and perforations.

I remember when a common (and relatively harmless) prank was to send a file with several thousand form feeds to the high speed "tree killer" printer at a remote site. Newbie operators would have a coronary when the fan-fold paper started spewing across the room.

("relatively harmless" because the paper wasn't actually printed on or otherwise damaged -- the operator just had to refold it and move it back to the input side).

Agreed, if there was just a little more editor support and use early on the whole CSV mess could have been avoided.

They actually display fairly well in vim

Perl has built-in support for these characters, sort-of, for use when reading in text files. Several of Perl's operators pay attention to certain global variables, like $INPUT_RECORD_SEPARATOR.

    while (my $item = <>) {
By default this will read lines from STDIN, but if you set $INPUT_RECORD_SEPARATOR to the RS character, it'll read a whole record at a time. You can also set $LIST_SEPARATOR to the FS character, which you can use with the split function to divide your record into fields, and the join function will use it automatically to turn your list of fields back into a record.

I used these characters and Perl features in the early 2000s for managing a data processing workflow. The data was well-suited to being processed as a stream of text, and by using these separator characters I was able to avoid the overhead of quoting and escaping which made the processing MUCH more efficient.

There's also the $IFS (Internal Field Separator) for Bourne-like shells.

Oh, and Ruby copied Perl's `$/` and `$,` for record and field separators. (The "English" module provides $INPUT_RECORD_SEPARATOR and $FIELD_SEPARATOR.)

I worked on a database whose format was derived from Pick [1], that used 0xFE, 0xFD, 0xFC as record, field, sub-field separators - which was great until we had customers that needed those code-points for their data (i.e. in needing to enter data that was not all in English) and we had no escape mechanism. Using characters from the control set would have made so much more sense.

[1] https://en.wikipedia.org/wiki/Pick_operating_system

> I use Form Feed (CTRL+L) character in my code to divide sections, and have configured Emacs to display them as a buffer-wide horizontal line.

I've seen formfeeds used in source files like that before; it's a simple way of getting some primitive wider navigation (Emacs has built in commands for navigating by page), but I don't know why you'd limit yourself to them when there are much smarter options available now.

I don't limit myself to them, I don't really even use page navigation in Emacs. It's just that I like a nice, solid, horizontal line I have displayed instead of this character, and I kind of like the feeling of doing the old-school ^L thing :).

Things that I still find hard to forget these days:

* ASCII codes for those single and double box characters, so I could draw a fancy GUI on those old IBM text monitors

* Escape codes for HP Laserjets and Epson printers for bold, compressed character sizes etc.

* Batch file commands

* Essential commands for CONFIG.SYS

* Hayes modem control codes

* Wordstar dot format commands

* WordPerfect and DisplayWriter function keys

* dBaseII commands for creating, updating and manupulating records

I wish they would all move out of my head and leave room for me to learn some new stuff quicker!

Fun fact: Your smartphone is still using (extended) Hayes commands to control the cellular modem inside.

LTE smartphones too?

You'd be surprised how little has changed in that regard. Establishing a data session, from the early GPRS days until today's LTE, is still:

Even if the underlying technology has completely changed, the interface has not.

Whoa! I just dialed that on my Nexus 5x and got

> USSD code running...


> Connection problem or invalid MMI code.

So what was my phone just trying to do?

You are not supposed to dial 99# from the phone UI. It's special number that gets interpreted by the phone hardware itself and switches it to essentially router/access-server mode (the 1 at the end is optional parameter which is index of configuration you want, with the configurations being defined by AT+CGDCONT), most common thing that happens is that phone estabilishes PPP session with whatever device that sent ATD*99# and then routes it through packed based celular networkg (ie. GPRS/EDGE/UMTS/LTE), the mechanism is not specific to IP (although the actual network-side transport is always over IP) and other L2 protocols than PPP are sometimes supported. There is no reason why smartphone has to use Hayes protocol to communicate with baseband (and thus this mechanism), but many do.

Replying to OP because it's the most logical place to put this -

I'm curious what a "USSD code" is, and what kinds of codes I could input into the dialpad that do interesting things.

(I'm already aware of standard telco features like call forwarding, and I know iOS and Android both have their own set of "easter eggs" accessible via the dialer. I'm talking specifically about non-secret codes that talk to the baseband and stuff like that, if this sort of thing exists.)

The wikipedia page answers all your questions https://en.wikipedia.org/wiki/Unstructured_Supplementary_Ser...

Thanks so much! That was a really interesting read.

Hook your phone to your computer via USB and try to invoke that command on the serial port that appears; you'll get a PPP connection right away. It won't work with the dialpad app.

Yes, definitely. Most LTE modems are derived from 3G modems, and support 3G and 2G as fallback. That is not great for security, because 2G encryption is weak and broken.

It was such an odd joy when I had a Nokia n900, and I realized that you could thether by using wvdial, the same program I used in the 90s to dial into my isp when I couldn't be bothered to figure out pppd chat scripts...

I love how the modem control codes leaked across layers. So many scripted IRC clients had an option to send a quick "+++ATH0" to everyone and hopefully disconnect their crappy modems.

Lol talk about a walk down amnesia lane. I'm now remembering a time I embedded a couple hundred ASCII bell characters in a war script to knock other highschool kids offline. Nothing would clear a chat faster at 2am than computers making a really loud noise that unplugging the speakers wouldn't stop. Ahh nostalgia.

All the cool kids attach the power LED to the speaker headers, to stop your bell script :P

Nice try. BITD the bell triggered a piezo buzzer soldered directly to the motherboard. Speakers had nuthin to do with it.

By the time I was on IRC, most people were using PC speakers, which were real speakers at the time. These days, you get a garbage piezo with your case if you're lucky.

Either way, the power led is useless, so better to hook it up to the speaker header ;)

> * ASCII codes for those single and double box characters, so I could draw a fancy GUI on those old IBM text monitors

Presumably not ASCII - they'll be from some extended IBM character set (CP437?), I would think.

You are indeed correct. I had an old IBM PC/XT manual I believe, which had the charts at the back that I always used to refer to. There were the standard ASCII chart, and another extended chart with the IBM specific set.

dBase III blew my young mind with how it would let you teach it new words for the data you wanted to get out of it. The UX still stands out in my mind as unbelievably good. I should probably see if I can dig out an old copy and find if it's as good as I remember.

Escape codes for HP Laserjets

<esc>&l1O to switch to landscape :)

I'm just about to write some code to parse HP PJL so you never know when ...

PJL is easier to parse than it looks at first glance. I worked on a laser printer controller back in the late '80s that for convenience we made mostly HP compatible. Have fun!

Back in 1985 I wrote a HP PCL to HPGL translator in C. Fun stuff!

Add to that the z80 assembly language instructions I used most often - for patching infinite lives in games. (JMP, JZ, NOP, and RET).

Great list. How about...

* Lotus 1-2-3 "slash" commands

Fun fact about octal: every commercial and most non-commercial aircraft have a transponder to identify with Air Traffic Control. The squawk code is in octal.


And the civilian SSR system was built on the released frequencies and protocols of Mode 3 of the military Mark X IFF system, which was adopted by civilian agencies in the mid-1950s after a particularly nasty airborne collision.

Sixty years later and we're still bounded by legacy. As a result of shortage the general-use squawk codes are namespaced into each national ATC region, so aircraft have to constantly change squawks even in supposedly contiguous regions such as Eurocontrol. Squawk 4463 means a different thing in UK airspace than in French.

Ironically, military aircraft still support Mode 3 in order to integrate with civilian ATC, who call it Mode-A, but all their special don't-shoot-me-I'm-friendly stuff is handled by more modern encrypted protocols.

And the squawk codes are broadcast as part of the ADS-B system[0]. These packets can be received with the inexpensive USB software-defined radios (e.g., [1]).

[0] https://en.wikipedia.org/wiki/Automatic_dependent_surveillan...

[1] http://www.rtl-sdr.com/adsb-aircraft-radar-with-rtl-sdr/

A lot of the data buses on aircraft also use octal to identify the data words.

The fact that Windows uses CR-LF as a line separator baffles me to this day (and I am not old enough to have ever used or even seen a teletype terminal!) - for a system that was developed on actual teletype terminals, it would have made perfect sense: To start a new line, you move the carriage back to the beginning of the line and advance the paper by one line.

But DOS was developed on/for systems with CRT displays.

It doesn't really bother me, but every now and then this strikes me as peculiar.

The thing to remember is that the CRs are never really gone, they're just silently inserted for you by the tty.

Dealing with raw vs cooked (where LF is automatically translated to CRLF) ttys in UNIX is also a giant pain when you have to do it, so in a way it's not surprising that they decided to leave that out. The original DOS kernel was very minimal compared to UNIX even of the same era. Of course, it turns out that having to write CRLF into files is also a pain - Windows has binary and text mode files instead of raw and cooked mode ttys - and one that you encounter much more often.

When you think about how a typewriter works it's actually correct. The "newline" handle does both a carriage return and a line feed. But you could conceivably do a carriage return without a line feed (and type over what you already have), or a line feed without a carriage return (which might have some actual use).

It's not just conceivable. It's how the output of grotty in TTY-37 mode works to this day. When you are reading a manual with underlining and boldface, and you haven't brought your manual system into the bold new GNU world of the 1990s where it actually uses ECMA-48 instead of TTY-37, the tty output driver is using carriage returns and backspaces to enact overstrikes.

* http://jdebp.eu./Softwares/nosh/italics-in-manuals.html

I wrote a better manual page for ul(1) that explains some of this. ul is basically a TTY-37 to your-terminal-type converter, and it implements a lot of the effects that one would see on a real teletype. Unfortunately, the old manual hasn't progressed much beyond the original 1970s one and doesn't explain a lot of the functionality that the program actually has.

* http://jdebp.eu./Proposals/ul.1

One use for CR without LF was to overstrike passwords on printing terminals. You'd enter your password (which would appear on the paper) then the system would go through several rounds of issuing a CR followed by overstriking the password with random characters.

I still use CR without LF all the time. If I'm writing something that features a long loop, I might want status updates, so printf("%d \r", percentage); helps tremendously. You'll need to fflush(stdout) too, since usually flush is triggered by "\n".

On the other hand, there are far fewer use cases for LF without CR, certainly nothing that isn't better done using ANSI codes.

This the use of CR without LF on a virtual TTY. On a real TTY or typewriter there is less use, but as one user pointed out one could remove information like a password which was printed.

LF without CR is something that one would do on a typewriter for typing tabulated data or mathematical formulae. It's just a way to go "down" but stay at the horizontal position you were previously at.

Aaaah, yes, that makes sense. I remember setting up LPD on NetBSD years ago, and part of that was setting up a filter to prevent the staircase effect. Those were the days... ;-)

and without the CR you could never have made the spinning line \|/-\|/-... :-)

I call BS ;)

AIUI is just that DOS was complying with the standard, while Unix had decided to break it. The standard was designed for physical teletypes and has a bunch of issues (like, what do you do with LF-CR?) but its not unreasonable of DOS to have stayed with it.

Apparently the LF vs CR+LF printer control code standard arose ~1965 as ASA (pre-ANSI). Neither Unix nor Microsoft created the standard, and of course ASCII didn't either.


CR/LF always struck me as the logical choice. And according to the SO link posted by 'chadcmulligan, it turns out that pure LF separator is actually a Unix hack that somehow got later accepted as the Right Way...

It wasn't a Unix invention; see https://en.wikipedia.org/wiki/Newline#History (as posted by randcraw nearby); for confirmation of the story there, see the Euro version of the standard for free at https://www.ecma-international.org/publications/standards/Ec...

It is the right way. Different terminals require different line endings and abstracting that to a single new line character makes sense. The UNIX tty driver then translates that new line character to what is appropriate to start a new line on a given terminal. In most cases CRLF is enough but older devices also needed additional delays.

If you are going with the abstract and translate approach, wouldn't the right way be to abstract to RS?

An interesting thing to note is that if you put a unix terminal into raw mode, you also have to use both CR and LF, because LF only moves the cursor down, but doesn't move it to the start of the line.

Windows is backwards compatible with DOS. DOS was backwards compatible with CP/M. It wasn't at all uncommon to see CP/M used with a printer as the primary interface.

And helpfully enough, EPS files can contain \r and \n as line feeds, singly or both, in either order.

Related, Python has universal newlines support:


Edit: added the PEP (it's from 2002) and excerpt from it:


This PEP discusses a way in which Python can support I/O on files which have a newline format that is not the native format on the platform, so that Python on each platform can read and import files with CR (Macintosh), LF (Unix) or CR LF (Windows) line endings.

Yeah, but it doesn't do \n \r, and iirc, it's not too happy when the line endings change midstream.

Interesting, I had not checked it for those cases. Maybe it only reads a few lines at the start and assumes from the line endings for those, what it is going to be for the rest.

I seem to remember the Acorn BBC used \n\r for line endings.

I think it's mainly to maintain backwards compatibility.


That seems to be the reason for a lot of odd design choices. ;-)

> It is now possible that the user has never seen a typewriter, so this needs explanation […]

Aw man… I'm only 36, but now I feel old for growing up in a time where a typewriter was still common enough to run into (even if they were rapidly being displaced by personal computers).

They still exist in the wild though as a hipster accessory — they probably do well on Instagram too I suppose.

I'm 36... back in the mid-00s when I was in the Army we were still using typewriters, mostly to fill in pre-printed government forms (and award certificates, etc). I even had to deal with carbon paper.

My grandmother had an old manual typewriter, which I had to use once to type up some homework when I was in High School.

I do not miss them.

I'm work at a global logistics company. Manual typewriters are still somehow part of our workflow for filling in forms, not even carbon backed.

It boggles my mind why that hasn't been replaced by a PDF form. Perhaps IT being siloed in another building keeps such legacy going.

With many of these things i suspect it comes down to laws and regulations more than anything else.

Meaning that your typewritten document will be accepted as evidence during a lawsuit or similar, while a PDF of same may not.

I gave a talk on the origins of Unicode a while ago (now published on InfoQ at https://www.infoq.com/presentations/unicode-history if you're interested) where I talked about ASCII, and where that came from in the past (including baudot code and teletype).

The slide pertaining to ASCII is here:


Ah the good old days, when hackers were hackers and quiches were quiches.

Oh wait, this article is 'man ascii' & 'man kermit'.

Although interestingly enough from 'man ascii', it's clear why ^C is ETX:

  >         Oct   Dec   Hex   Char                        Oct   Dec   Hex   Char
  >         ────────────────────────────────────────────────────────────────────────
  >         000   0     00    NUL '\0'                    100   64    40    @
  >         001   1     01    SOH (start of heading)      101   65    41    A
  >         002   2     02    STX (start of text)         102   66    42    B
  >         003   3     03    ETX (end of text)           103   67    43    C
  >         004   4     04    EOT (end of transmission)   104   68    44    D
  >         005   5     05    ENQ (enquiry)               105   69    45    E
Holding Ctrl set bit 6 to '0', bit 7 to '1', and bit 8 to '0'. 'C' and 'c' differ by bit 6 only ('1' for 'c').

It was just a month or two ago I saw an ASCII table layed out in a way so that it clicked that "oh, so THAT'S why backspace is ^H" and the other control chars you end up using like ^D, ^C, ^G, ^[ suddenly made sense after that

I remember CLU said, "Acknowledge" but I can't remember if the MCP said, "End of transmission" or "End transmission."

From now on every time I Ctrl-d I want to think the voice of the Master Control Program.

I'm pretty sure the MCP says "end of line", not "End of transmission".

It's been a while though, maybe he says both.

Ah, right. I thought there was at least one place where he said, "End transmission," but I'm probably wrong.

Then there's "End Of User" which makes their terminal explode.


The MCP said "END OF LINE"

> quiches were quiches

Kind of interesting how remnants of culture wars of 35+ years ago linger today, and from the perspective of 2017 how anybody could have gotten annoyed at people who eat egg-and-cheese pies.

kbdmap(5) had a fictional character in its ASCII control character list for 17 years. ascii(7) used non-ASCII names for quite a while, too.

* https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205776

* https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205778

I think you mean "quiche eaters". :)

I uh was too hungry to type it like that (it's lunchtime)

Does anyone remember EBCDIC? IBM defined EBCDIC for the same purposes as ASCII, but ASCII took off with newer generations of machines. The last time I wrote an ASCII-EBCDIC conversion routine was the late 90's, part of generating a file for upload to a vendor's mainframe.

Some popular projects still have to support it: https://github.com/apache/httpd/blob/2.4.x/include/util_ebcd...

The .NET CLR has an EBCDIC text encoder/decoder in the base class library. I had to research this once when it was asked of me how easy it was to injest some COBOL-built files in C#. We ultimately didn't need to use that as the COBOL side had switched to ASCII at some point and its owners had forgot, but I suppose it is good to know that even modern .NET code can speak EBCDIC if need be.

As a young student of Electronics, we had to religiously perform conversions between EBCDIC and ASCII.

ESR forgot to give the reason for XON/XOFF: physical terminals often couldn't keep up with an output stream even at a low 9600 baud, so they'd have to back-pressure the sender (usually a dial-up or direct-connect host) to let them know when to stop and when to start.

Plus, people used them manually (control-S/control-Q) on systems to stop output scrolling by, and restart it when they've read what's on the screen, before built-in pagination filters (e.g., more(1) or less(1)) became common. (Specially back in the DECsystem-10/-20 days.)

Not to mention how the truly ubiquitous HTTP protocol uses CRLF in the headers. The mechanical origins of the CR/LF combo were so strongly ingrained in developer culture that line oriented protocols like HTTP inexplicably continued to use it - either that or Tim Berners Lee just copied it from earlier line oriented protocols like SMTP because it just seemed like the "right way to do things".

It also doesn't help that most modern web servers also include logic to handle a single LF character to terminate lines anyway.

Well it meant you could interact with a web server via telnet in a pinch.

RS-232 is invaluable. Give me a new piece of hardware and as long as there are 2 wires for the serial port I can port Linux kernel to it.

I have ported Linux to a custom ARM board many years ago. Started with a boot loader written in assembly and writing a single char into serial port for a debug console. It takes a single line of assembly or C. Infinitely easier than USB. From there on, I was able to unwind the whole system, develop USB drivers, TCP tunnels, etc.

The following table describes ASCII-1965, the version in use today. It used to be common knowledge that the original 1963 ASCII had been sightly different (lacking tilde and vertical bar) but that version went completely extinct after ASCII-1965 was promulgated.

Not quite true - early adopters like DEC kept using the 1963 version for a very long time, which prompted others to follow them. When the Smalltalk group decided to replace their own characters for ASCII in Smalltalk-80 to be compatible with the rest of the world, it was the 1963 version that they used.

Due to this, since I use the Celeste program in Squeak Smalltalk to read my email, I see a left arrow whenever someone wrote an underscore. The other difference is that I have an up arrow instead of ^. But it did adopt the vertical bar and tilde from 1967 ASCII, so it was a mix.

For anyone similarly curious what Celeste is, there are some screenshots at http://wiki.squeak.org/squeak/1467, and it's included in http://files.squeak.org/3.7/unix-linux/. No, it doesn't work in Squeak 5.1 :)

To the OP, I'm very curious why you're using Celeste. I take it you've been using it for years?

When switching from a KDE based Linux to a Gnome based one many years ago I had to find a replacement for KMail. Since I design Smalltalk computers, it seemed silly not to use Celeste even if I had to put up with some limitations and add a bunch of bug fixes (all dealing with tolerating broken emails).

It is not something I would give to a "normal" person to use, but it is more than good enough to me. The main problem is that it does a reasonable job of showing incoming emails (though attachments show up as links at the end of the text) but when editing an email to send it shows the raw headers and MIME for any attachment.

Welp. I didn't see your comment until now. I wonder if you'll see/get this reply.

What do you mean by "design Smalltalk computers"? That sounds really interesting. Do you mean you configure them to autoboot Linux into Squeak or similar? What are they used for? (...I'm guessing education-type environments...?)

I can understand what you mean. I tried to get it working, as I said, and while I got the main window open I had no idea how to configure it (and I must admit I don't have much incentive to.)

And yet he doesn't know the PDP-11 is an octal machine even though it's 16 bits.

This deserves a little more explanation. The PDP-11 had 8 registers and 8 addressing modes so in the days before debuggers you could dump the binary and pretty much debug straight from the octal without so much as referring to your handy-dandy pocket reference card. http://www.montagar.com/~patj/dec/pocket/pdp11_programmingca...

EDIT: Also, the most significant bit was used for 3/4 of the instruction space as a flag for a byte vs. word operation (or add vs subtract) so having it alone in its own octal digit (0/1) made perfect sense. For example 01ssdd was MOV (word) and 11ssdd was MOVB. 06ssdd was ADD and 16ssdd was SUB.

> ENQ (Enquiry), ACK (Acknowledge) In the days of hardware serial terminals, there was a convention that if a computer sent ENQ to a terminal, it should reply with terminal type identification followed by ACK. While this was not universal, it at least gave computers a fighting chance of autoconfiguring what capabilities it could assume the character to have.

That's not quite right (the ACK part isn't right at all). See <https://en.wikipedia.org/wiki/Enquiry_character>.

Why did the end-of-line indicator settle on LF and opposed to CRLF? Naively, the latter makes more sense to me by virtue of it being more explicit. Do Unix-alikes always inject a CR after a LF?

This is mostly coincidental history. A lot of teleprinters used to require that CR+LF was sent for largely mechanical reasons, and Windows was made similar to MS-DOS which was made similar to CP-M which had been built for these kinds of devices.

Unix, on the other hand, was made similar to Multics which had the clever idea of on-the-fly replacing a line feed with whatever the printer required. So text files needed only the LF, and a CR was added automatically if the printer required it. This had its upsides and downsides, but the major upside was that you could print a text file on two different systems and reliably have it come out the same way!

This is one of the reasons that in e.g. C, the '\n' character is somewhat awkwardly defined as "a one-byte number that will move the cursor to the start of the next line", when on some operating systems (Windows) this will end up actually being the two-byte CR+LF sequence (well, when output is in text mode...). Even around the time of C's development this was already an issue and newline translation magic was already required, the ancestors of Linux and the ancestors of Windows just happened to put that magic in a different place.

It was complicated. Some early printer devices had automatic-CR (think of a typewriter). Also, 'lineprinters' had a chain torus of letters and hammers at every position on the line, so there was no carriage at all.

I always thought that instead of these lamentations about lost knowledge, people should just put together resource guides to maintain these skills. Hackers used to know these things? If they're still useful to know, how can I learn about them today? Otherwise, it just sounds like the worst combination of geek posturing and "kids these days."

It is more complex than that.

Even if it isn't (and cannot) be literally true, I think Eliezer Yudkowski was on to something when he invented the "Merlin Interdict" in his fantasy work: Harry Potter and the Methods of Rationality.

Every highly systemized art form requires a living tradition to pass its most advanced achievements. Documentation, no matter how extensive, cannot convey all the subtleties, mostly because the experts are not 100% aware of all the little details that set them appart from the merely competent.

Some human endevours are best learned as a long process of deliverate practice under a wise mentor, who can help you steer the path and point your attention back to one apparently irrelevant detail or another. Self study is posible and effective, but it will only take you so far. And to rediscover some lost art from first principles requires a level of geniality on par with the first creators of the art in the first place.

> If they're still useful to know, how can I learn about them today?

By reading the article instead of lamenting lamentations ;)

Yeah, no, I read it :)

Some of that stuff piques my interest, but then it can be difficult to get started understanding it relative to other stuff that's more widely covered on the web. I guess I am just a little pampered when it's comparatively much easier to learn about a web framework. But that's probably part of the old school hacker ethos as well.

I was actually pleasantly surprised by how little lamenting esr did in this piece. I'd sort of expected it to be "Things Every Hacker Once Knew [But Kids Today Don't Learn Them And That's Awful And So Are The Kids]."

The article recalls teletype terminals. I'm just putting a video showing how it worked in close: https://www.youtube.com/watch?v=MikoF6KZjm0 . It hypnotizes IMO :)

I'd like to see how looks editing and running BCPL/C programs using ed on such a terminal.

I went from amazed ("it's like a mechanical watch...") to bored/irritated ("oh, so THAT's what 110 baud was like... ouch") to amused ("...but I could put up with this, I guess.") as I watched this. Thanks so much.

My conclusion is that I could probably put up with using one of these, but that I'd feel quite cramped if it was my main workstation.

The mechanics of `ed' suddenly make a lot more sense now.

The dialing bit at the end was awesome. Reminds me of relay-based lift motor rooms! (There are videos of those on YouTube.)

Watching that video suddenly made it clear to me where the word "teletype" came from.

Now I want one of those for my desktop.

Wasn't "{Field|Group|Record|Unit} Separator" meant to allow an alternative to using commas in CSV data?

It has two more levels than CSV (which only has units and records) so no. And FS is file separator not field.

These were used for serial data sources, not just network but punch cards or drums or magnetic tapes. GS, RS and US were intended for databases on serial data sources, "group" is a modern-day table. Lammert Bies has more: https://www.lammertbies.nl/comm/info/ascii-characters.html

You can repurpose the final two (record and unit) for CSV, but that's not their original role, and you'll have to make sure they're never opened via user-controlled anything as these control codes are non-printable.

Even in the last few years I've seen FileMaker systems that used this set of ASCII characters to delimit fields in plaintext data exports. Initially baffling since I was expecting CSV, it seemed like a sensible choice once I figured out what they were doing.

Since it predates CSV, it wasn't an alternative to it: it was the original.

I suppose the Excel or 1-2-3 team either didn't know about the separator characters, or they did but then-current text editors weren't useful enough to do anything with them (and user-editability was important). It's a real shame: with a more integrated system, they could have added functionality to the editors to understand ASCII record encoding, and CSV need never have existed.

I use them whenever I can :)

Maybe a simple viewer would be nice.

Seeing this reminds me of how things were and to be thankful for how far things have come.

Read this. Or at least take a look for some historical insight. It is both dry and interesting.

An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration


It describes how we got here in excruciating detail, starting with Morse. Military communication systems had great influence. Before the ASCII, as we know it today, there was an "ASCII-1963" that was a bit different.

The long and winding path: Morse Baudot Murray ITA2 FIELDATA ASCII-1963 ASCII-1967

> SO (Shift Out), SI (Shift In): Escapes to and from an alternate character set. These are never interpreted in this fashion by Unix or any other software I know of [...]

Aren't SO and SI used by the ISO-2022-JP character encoding?

They're used by ISO 2022 in general. Free version: https://www.ecma-international.org/publications/standards/Ec...

Of course Unix didn't interpret them; the terminal did.

Linux does do something with SI and SO, at least on the actual framebuffer console. It enables the VT100 alternate character set, for drawing things.

The k,l,m,n,o,p,q,r,s,t,u,v,w keys then become useful for drawing lines

If you have a linux desktop, switch to an actual console (Ctl-Alt-F1), then:

echo -e \\x0E

And type a bit of lowercase characters between k and w. To get the terminal back to normal:

echo -e \\x0F

That's what the VT100 did with SI and SO. It's been 30? 35? years since I touched a VT100, but I think I recall there being a configuration bit to enable alternate character set on SI/SO.

Here you go: http://www.pcjs.org/devices/pcx86/machine/5170/ega/2048kb/re...

Probably not what you used to use, but "ctty com2" (and a couple of Returns directed at the VT100) will make it "go." I'm not 100 on what the keymapping is when you're in the setup screen (virtual Setup key at bottom-left).

Isn't BS used in combination with other characters to encode non-character terminal information, like text color changes? In some programs, `]\b` for example will change the text color.

The `]` character itself will not be printed, since `\b` will delete it from the visual line, and this effectively creates a side-channel for communicating "invisible" information within the regular character stream.

It's a pain to work with, though, since it makes things like `strlen` behave in very non-intuitive ways. Just imagine a string becoming longer when you delete the BS character. That's no fun.

^[ (control-[) is how escape is rendered. VT100/ANSI uses escape sequences for terminal control sequences (like positioning the cursor or changing the color).

Ha! The first technical book I ever bought was The RS-232 Solution. It is not impossible that I bought it at an airport bookstore--this was in the mid-1980s.

Hm, maybe that's the way how we can try fixing the "programmers don't know their ancestors' discoveries" issue? By "older" programmers blogging about things that are obvious to them, but apparently no longer to people? in a somewhat loose "old folk stories" style, but slightly more dense than the pure "funny folklore tales" like what did Jobs say to woz, or esr quip to ken?

Blogging and story telling isn't enough, if the audience doesn't care.

For example, there are lots of old papers from Xerox PARC, Burroughs, AT&T and many others freely accessible on the Internet.

Some of them, we have to thank to the laborious work from people that bothered to digitize their original form, produced by plain typewriters.

Yet, I doubt many youngsters bother to read them.

I'm 21 years old, the way Alan Kay speaks about Xeroc PARC work, Douglas Engelbart, Burroughs etc intrigues me. Then I try reading for example Engelbart's paper, it's fucking huge! I just get bored so quickly with the way they're written, especially because they're so out of context, many things were very different back then, so a lot of what I know right know either didn't exist yet, or will only confuse me due to my assumptions about how things were done back then.

Then again, I have read a few papers, it's just highly inconvenient. And in what kind of setting would you take the time to really read these anyway? Work setting? No, just get your work done. At home? I don't mind reading a bit during my own time, but I don't want to spend hours upon hours trying to understand something from a very different context. Academic? Yeah sure why not. But I'm not in academia!

If you have the resources, find out what museums are in your area, or plan trips to the museums that contain things you'd be interested in.

For example https://www.youtube.com/watch?v=MikoF6KZjm0 is in a museum, as is https://www.youtube.com/watch?v=NEbMksxQAgs. You can go and check them out and poke them (to some extent).

While watching videos is really awesome, actually being able to watch these things in action provides a level of context that is impossible to convey digitally.

I do agree that they might be a bit dry and it is hard to understand some of the magic when not having been in touch with it.

As for when to read them. I read them while commuting to clients, train and plane travels.

The vast majority of people don't need to care--nor should they. For most programming, the intricacies of the ASCII character set is completely irrelevant for pretty much all computer work done today.

However, a small percentage of people will care, and it is that small group which will carry on this knowledge. Bemoaning that the "youngsters" don't bother is like complaining that most people don't bother learning Old English--there are some people who do learn it, presumably because they find it interesting, and those are the people that everyone else relies on for their occasional Old-English-reading needs.

In my opinion, the problem with papers is different: that of high cognitive barrier to entry (they're too dense for casual reading, and tend to build on silent assumptions well known only in narrow circles of specialists), and number overload. That's extrapolated (sorry!) from my personal attitude and experience with them. For this part, maybe a kind of "pop-science" (i.e. popularization) articles could help, too. In fact, some of such do appear on HN occassionally (e.g. some data structure akin to bloom filters recently, from what I recall). But actually that's not exactly what I attempted to express in my comment above.

What I tried to talk about was the other (if related) often repeated lament: that youngsters "forget" (in reality, never did know for starters) and reinvent stuff that was common knowledge/tech not long ago. I see this as something else than papers: I assume papers were more of a "bleeding edge" at the time they were written, and not all of them did spread to become well known (not to mention commonly understood) even at their time. For me (a tech mid-age?), the first realization of the trend came with some recent stories (a year or two ago?) about "a new app" for "woohoo, communication between mobile phones over audio!" I.e. modems, reinvented (not to mention the irony of phones now being more digital than analog, though still modem-ing over radio waves at lowest level... but I digress...).

That's the thing I believe old masters (but also commoners, like me! or you, whichever category you find yourself in) could try to popularize, so it doesn't get forgotten. And that's what I found brilliant in the original article. Re-pollination of "age"-old ideas, once common and obvious, now forgotten because out of use. In a lightweight, easily consumable and approachable tale.

That's the way of a wise old sage, telling young ones some lightweight and amusing stories by the campfire, but secretly hiding in them the good stuff he knew, or that he learnt the hard way in his own time. Engaging, amazing, and playing on curiosity, for the benefit of the new ones; not complaining and deriding. Playing with how to make the young ones care; tricking them into caring. That takes more effort, true; but then it may become a good proof whether one really cares about this stuff being remembered.

[edit] By the way, thanks for writing down your argument, especially because it made me flesh out some of my vague thoughts better, and explore them further even for my own understanding.

Agreed, that is why I advocate so much better systems programming languages, as new generations should learn there were other options, even full OSes used for actual productive daily work, not just closed in some lab.

Raymond Chen at Microsoft has been doing exactly this for nearly 15 years (!) in his blog The Old New Thing: https://blogs.msdn.microsoft.com/oldnewthing/

It's an invaluable source of insight into why various weird things in Windows are the way they are.


Freed from such constraints, the Windows console subsystem metaphor is a CGA-style text mode display from the early 1980s. That's the kind of progress you just don't ever get with POSIX.

In 32-bit Windows (which you can still get). The NTVDM was removed from 64-bit Windows, and presumably the graphics hardware emulation was too since it's not used by anything else. On x86_64 cmd.exe is purely a console. [EDIT: I misunderstood; the exact component the parent commentator was referring to was not removed.]

I say "presumably" because cmd.exe is a fair bit of a rats' nest due to its organic development over the years. One of its biggest issues is that you must use cmd.exe in order to read/write stdio while a program is running; there's no API to do it any other way. (So you have to open a hidden window then use a bunch of fancy tricks to poll it every so often per second. Yup.)

However, this is being tracked and will (at last!) be fixed (apparently soonish): https://github.com/Microsoft/BashOnWindows/issues/111#issuec... (that's the relevant comment, the whole thread is interesting - there's actually a carefully-shot photo of the whiteboard all of this is on!)

The console window is a grid of (char,attr) - see ReadConsoleOutput/WriteConsoleOutput - i.e., CGA text mode. It's only a metaphor.

(As for standard handles, you can redirect them without needing cmd, but there's some fiddly stuff involved. I don't remember the exact details, but it's something like: redirect parent process's handles to appropriate pipes, spawn the child, retract the original redirection, access pipes as appropriate. The API implies that you can just supply the pipes directly, but it doesn't work that way, because Windows is stupid. Which everybody knows. (But what not everybody knows: it's still not as stupid as POSIX.))

Ah, I see.

I finally get it: cmd.exe must be running for the *ConsoleOutput API calls to work.

I don't have the specifics for POSIX pipe redirection, but I do know that you use dup2() to redirect everything appropriately. Oh... as for actually launching a terminal, yeah, that's insane. Or rather it has an insane history.

No, the console window is separate from cmd - it's a builtin Windows thing. If your Windows program (that isn't cmd) needs a console, and it doesn't have one, it can call AllocConsole(), and one will be added for it. (A new console window will appear with your program's EXE's path in the title bar.)

If your program is tagged as a console program, Windows will usually open a console for it when run - or, at its parent process's discretion, it can be made to use the parent's console. This is how cmd.exe gets a window when you run it from Explorer, and it's how cmd /k runs the new cmd inside the same window when you run it from cmd itself.

No. A command interpreter is not a console.

* http://jdebp.eu./FGA/a-command-interpreter-is-not-a-console....

If it ain't baroque, don't fix it!

In fact maybe even if is baroque don't fix it... https://www.youtube.com/watch?v=uKiCphIiO6c

I'm working on a greenfield project at a hotel where we've had to interface with a Mitel PBX over RS-232.

I was shocked how literally the ASCII codes were followed by the PBX. It sent an "ENQ" before each command, we had to send an ACK back, and then it sent us STX/ETX-delimited records.

I'm 32 and working today, in 2017. I hope I make stuff that lasts this long.

It's harder now, because any developer can make a fancy framework, draw some graphics, make a website, find a domain (on the .io TLD!) noone else has thought of yet, and look as good as (or better than) multimillion dollar companies. Anybody can compete now. And so, for the sake of landing a job, or networking/personal publicity, or even just for the experience, people do.

The problem is, frameworks and libraries a) are both easy and fun to write, b) are really hard to comprehensively test with full architectural coverage, and c) suggest some form of standardized behavior. While (b) creates technical debt, the main issue is (c) vs (a): we're flooded with "do it this way!" from a thousand groups, even in situations where the developer(s) didn't really intend for that to be their predominant statement.

With all this noise and chaos, it almost feels awkward to stick to old, icky, widely-hated legacy standards in the face of all this innovation. Or at least that's what it's felt like to me. Objectively thinking about it and comparing everything, though, nothing's perfect, but what's been around for a while has the combined benefit of a) having a fairly widespread mindshare, and b) having known solution patterns for a wide range of issues.

I guess what I'm trying to say is that building software on top of tried-and-tested methodologies is likely to produce long-lived results (which is fairly logical).

I'm also reminded of http://qdb.us/53151 :D

- I say "or better than" because large corporations often have standardized internal Web style guidelines and rendering toolkits/templating engines, and making sweeping changes in those is harder than for websites that are little more than a landing page and some documentation - so the bigger an enterprise is, the likelier it is that its website might look mildly dated.

If you want to geek-out further on ASCII have a look at Bob Bemer's site: https://www.bobbemer.com/ Bob is colloquially known as the "father of ASCII" (among other things) and his writing is fun to read and interesting.

Good article, brought back memories! A small correction:

> 56 kilobits per second just before the technology was effectively wiped out by wide-area Internet around the end of the 1990s, which brought in speeds of a megabit per second and more (200 times faster)

That should read 20 times faster?

> and more...

probably refers to the first widely installed (with coaxial cables) 10 MBit Ethernet networks...

I wrote code on a teletype connected to an HP3000 in the mid-70's, so its always a kick to see that technology mentioned here. Especially when it's described as "really old" which obviously it, and I, are :).

RS-232 is alive and well in the micro controller space, thank you very much.

Pretty soon we'll have to start telling them what a command line interface is, commonly known today as a foreign and abstract concept defined by an acronym, "CLI".

I don't think so: just as reading, writing & speaking are still our primary forms of communication (rather than pointing & grunting), so too will CLIs last.

It could be deprecated fast though if either big manufacturers decide to ship GUI only systems, or if GUI systems became way more reliable. Or could live forever.

For example we ship a certain device that can be configured fully both from CLI, from web GUI and from Windows application. Customers internally have teams that are 100% polarized - some teams use GUI stuff only and some teams use CLI only. Both refuse to switch on principle. ANd in our case I think in 5-10 years CLI will die. It is just a mess - you can do fast configuration with userfriendly extras in CLI but only on small scale. It just doesn't allow userfriendly editing of kilometer long configs, especially on 100s and 1000s of devices. So if we'll continue improve GUI configuration CLI will eventually die I think. Same could happen in general IT systems, provided there is better alternative (or forced alternative).

Right, that makes a lot of sense. If you develop the CLI version more and made it easier to (for example) batch-apply patterns against many datasets, and added sufficient facility to fix the other configuration gripes that are currently harder than in the GUI, well, the CLI will win some more.

I do have to agree, GUIs can be easier to use if they're well-designed. It's also arguably easier to build GUIs than CLIs in some situations, particularly where you don't need something to be fully Turing-complete.

The FIX protocol uses the SOH control character to separate message fields.

copy con com2 atz atdt ath0



True, but the tone in this piece less noxious than average.

Yeah. If he stayed away from politics, race, sexuality, pop culture, and guns (and toned down the hyperbolic "community luminary" pose) he'd be in general a fairly entertaining writer. I'm certainly glad I read this bit and the attendant HN thread.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact