Any interesting stories to share? I think we'd all be interested.
Mind if I ask a question directly related to one of my comments a few days ago? I was lamenting the fact that the ASCII codes 29, 30, 31 (Group, Record, and File separators) never really became widely implemented, as these were specifically designed to delimit data. Ie, one could easily include commas, new lines/carriage returns, etc in data cells without clashing. But instead CSV seems to be the most common standard for tabular data.
Were these ASCII codes ever considered for tabular files?
At Yahoo, the typical delimiters for logs and whatnot were ctrl+a, ctrl+b, etc. It was slightly nicer than CSV, but only slightly. It was mostly nicer when manually inspecting files with columns that had embedded commas (that otherwise would have been escaped). The machines don't care, and for any interesting processing you'd often end up with escaping anyway.
Emacs can edit files in this format without any extra work (it displays them as ^\, ^_, etc., with a different text color so that you can easily distinguish them from character sequences like "^" followed by "\") but maybe you mean to say that Emacs by itself doesn't understand the hierarchical structure of such a file.
This is easily fixed. You can get Emacs forms-mode for a file with these delimiters as follows:
(setq forms-field-sep "\036")
(setq forms-multi-line "\037")
(setq forms-read-file-filter 'forms-replace-gs-with-newlines)
(setq forms-write-file-filter 'forms-replace-newlines-with-gs)
(setq forms-file "fsgsusrs.data")
(setq forms-number-of-fields (forms-enumerate '(name aliases wikipedia employer notes)))
"Project for a New American Century conspirators\n\n"
"\n Name: " name
"\n Aliases: " aliases
"\n Employer: " employer
"\nWikipedia URL: " wikipedia
(defun forms-replace-newlines-with-gs ()
(while (search-forward "\n" nil t)
(replace-match "\035" nil t)))
(defun forms-replace-gs-with-newlines ()
(while (search-forward "\035" nil t)
(replace-match "\n" nil t)))
You might further argue that this would create incompatibilities between different systems and so of course everyone would just use the same data file format. Even today, this seems implausible — JSON, various dialects of CSV (with tabs, commas, doublequote-delimited commas, pipes, and colons being the most common delimiters), SQL dumps, and HTML are all in common usage — and in the context of the 1960s and 1970s it seems even less founded. Remember that there were at least five widely used conventions for how to separate lines in ASCII text files up to the 1980s: \r\n (from teletypes), fixed-width records of 80 bytes (from punched cards), \n (from Unix), \r (from PARC, used in Smalltalk, Oberon, and the Macintosh), and \xfe (Pick, see below). And the PDP-10 used a six-bit variant of ASCII they called SIXBIT, the PDP-11 used ASCII, IBM used EBCDIC, and UNIVAC used FIELDATA.
This is a time when even computers from the same manufacturer couldn't agree on how many bits were in a character, much less how to delimit fields in data files. Thus even my attempt to save your argument is invalid.
Interestingly, there was a popular system that worked this way, with non-printable delimiters to divide up different levels of a hierarchical data structure represented as a string: Pick. But Pick didn't use FS, GS, RS, and US in storage either, and although it did use them in the user interface, it used them backwards. Pick's "items" (usually used like records in a database, but accessed like files in a directory) were divided into "attributes", corresponding to database fields, by the "attribute mark", byte 254, displayed as "^" (or often as a line break) and entered as control-^ (RS); the attributes were divided into "values" by the "value mark", byte 253, displayed as "]" and entered as control-] (GS); and values could be divided into "sub-values" by the "sub-value mark", byte 252, displayed as "\" and entered as control-\ (FS).
Pick also reserved byte 255 to mark the ends of items, like NUL (\0) in C, or ^Z (\x1A) in CP/M or early MS-DOS. It called it the "segment mark", displayed as "_", and it was entered with as control-_ (US).
Note that ASCII-1963 had eight separator control characters instead of just four: http://worldpowersystems.com/J/codes/#S0
I wish we had more of the former and less of the latter. I suspect no forum is ever safe from eternal September.
Something like this could filter out the "random geek wanna-be with an axe to grind" type post.
How would, say, something like Thompson's "Trusting Trust" (though I suppose that was published in ACM or IEEE), or a Dijkstra or Pike blog post, rate?
Comments from, say, Linus Torvalds on the LKML, or Lennert Pottering on systemd, or Bill Gates' various book recommendations, etc.?
Google Sheets are surprisingly the worst in this department. It's pretty much impossible to prevent Google Sheets to convert 5/7 to a date and 0123 to a number (losing the leading 0 of course and rendering the data invalid). No, ' is not the answer.
"If you move your mouse pointer continuously while the data is being returned to Microsoft Excel, the query may not fail. Do not stop moving the mouse until all the data has been returned to Microsoft Excel."
I've always wanted to know why Method 2 works!
At G+, Noah Friedman, who's part of the team who worked on the code, has inquired occasionally about the availability of some early Emacs code. Pre-1990, if I recall.
I'm not sure he ever turned that up even as a standalone tarball, let alone from a revision control repository.
It seems like if parsing fails it should throw an exception and fallback to regular csv parsing.
Am I missing something?