Hacker News new | past | comments | ask | show | jobs | submit login
The newline cat mystery (petefreitag.com)
46 points by rmason 6 months ago | hide | past | favorite | 44 comments



"cat wasn't displaying anything" ?

perhaps: "the terminal correctly displayed the file, overwriting the printed line at each CR, with the last line blank"

https://unix.stackexchange.com/questions/355559/bash-and-car...


That over writes though. It doesn't delete.

Foofoo\rbar\r

Should show as

barfoo

With the cursor over the start of that line.


If there's no newline, cat won't output one, and bash will put the prompt right after:

    [dale@host ~]$ echo -n "hi" > test
    [dale@host ~]$ cat test
    hi[dale@host ~]$

If you add a \r, then the cursor goes back to the start of the line, and the prompt overwrites the "hi".

    [dale@host ~]$ echo -en "hi\r" > test
    [dale@host ~]$ cat test
    [dale@host ~]$


Yeah except the file ends in \r so the OP may have had their prompt longer than any of the lines (e.g. [user@hostname cwd] $) such that they couldn’t tell.


depends on the terminal


What terminal interprets "\r" as "clear the line"? I would have thought that is considered a bug.


most of them do. it's what a /r means. it's how cli programs can easily output a spinning wheel/similar for busy status.


You want to know how I know you never wrote your own C program or terminal emulator?

/r means "slash, then the letter r". \r means carriage return, which doesn't clear the line, just moves the cursor to the left margin. Unless you're on an Apple ][, in which case CHR$(13) moves to the beginning of the next line, but still doesn't clear anything. And CHR$(4) is where the real magic happens, but it has to follow a CHR$(13)!

]PRINT CHR$(4) + "INIT HELLO"


you're absolutely right. It's impossible that I've ever written any C, and just fat fingered the / and \.

...

Point is, \r goes to the beginning of the line, without going to the next one, as you reiterated, so you can output a spinner using

    -
    /
    -
    \


No, "\r" means return the cursor to the start of the line without clearing the line.

Try this:

  $ echo -ne "ooo\rf\n"
  foo
If a "\r" cleared the line, you would just see a "f", not a "foo".


isn't that what I said?


You want to know how I know you never wrote your own text editor?

Carriage return is a carriage return, not a delete or backspace or clear line.


>Carriage return is a carriage return

To be fair it isn't.

On typewriters it also adds a new line. On Mac, it is (used to be?) the new line character.

I don't know how non vt terminals handled it.


who said anything about clearing? I just meant that \r goes to the beginning of the line, so you can output a busy spinner


> What terminal interprets "\r" as "clear the line"? I would have thought that is considered a bug.

>> most of them do. it's what a /r means. it's how cli programs can easily output a spinning wheel/similar for busy status.

You did.


Output that doesn't end in LF is pretty annoying in general, since the shell prompt gets visually tacked on to the end. I have this as part of my PROMPT_COMMAND:

    printf "%%%$((COLUMNS - 1))s\\r%1s\\r"
Effectively, it adds the missing newline in those cases and puts a '%' at the end of the output to let you know. The implementation is tad bit of printf trickery that leverages your bash's line wrapping.


IIRC ZSH auto-fixes partial lines, and can display a marker if you enable PROMPT_SP (I don't think it's enabled by default):

> Attempt to preserve a partial line (i.e. a line that did not end with a newline) that would otherwise be covered up by the command prompt due to the PROMPT_CR option. This works by outputting some cursor-control characters, including a series of spaces, that should make the terminal wrap to the next line when a partial line is present (note that this is only successful if your terminal has automatic margins, which is typical).

> When a partial line is preserved, by default you will see an inverse+bold character at the end of the partial line: a ‘%’ for a normal user or a ‘#’ for root. If set, the shell parameter PROMPT_EOL_MARK can be used to customize how the end of partial lines are shown.


I used to leave a file called README on my public ftp directory, which contained only:

cat: README: No such file or directory

I'd occasionally get email from frustrated people who had trouble trying to read the README file, so I'd tell them to simply run "emacs README", and emacs would solve all of their problems. I don't know if my passive aggressive emacs evangelism ever worked, because I never heard back from them.


> Still I find it very odd that cat didn't show anything, perhaps it is because the carriage return, brings the cursor back to the beginning of the line, effectively erasing the line that was just output.

Bringing the cursor back doesn't delete what was output. It's writing over it with the second line and the shell prompt that deletes it.

If you set `PS1='$ '`, one should be able to see part of both lines:

  $ printf '"order_id","date"\r"1","2023-01-01"\r'
  $ ","2023-01-01""


I don’t understand this part:

> Now one thing that I should have noticed was that the file size was 9 bytes

The file contains more than 9 characters. What was the author trying to say?


Probably not this simplified example.


He saw nothing print when he used 'cat' on the file. Seeing it be 9 bytes was strange because it just printed out nothing.


But it contained more than 9 bytes if you look at his example.


I usually pipe to od (od -c) whenever I'm not sure about stuff like that.


or hd (hexdump -C)


I'm a big fan of `sed -nel` for an unambiguous listing of the contents of a file. The only case that sed doesn't show is a missing trailing linefeed.


I use "cat --show-all" at least partly because it's more teachable (since when I run into this I'm usually helping untangle someone who hasn't run into the problem before, and it's more explicitly about the problem instead of being a related thing that happens to expose it (like this, or hexdump)


why would you show a missing trailing linefeed? it's missing... do you mean it will do a single CR-LF whether there was a NL there or not?


hexdump -C

...this is the only way.


This awakened dim memories of trying to duplicate curses functionality and reading sources for that and termcap.

share and enjoy

https://www.gnu.org/software/termutils/manual/termcap-1.3/ht...


I’m impressed that there are people who mix what I think is Visual Basic (right?) with UNIX command line tools. Hadn’t seen that one coming!


cat does have parameters to show nonprinting characters.


  cat -v
For a very short file, I might reach for hexdump -C


xxd


This is one of the places that "do one thing" principle breaks down. Or a bad habit became a bad definition for the use case of a command and it stuck for newbies.

`cat` usually does one thing: concatenate the contents of the inputs you give in the order you give them. Of course, if the user gives only one input, it will only print that.

It is not a text viewer even though many *nix users use it that way because they learned it that way, because a series of tutorials taught the wrong thing to generations. The very first one probably came from a very limited and very bad Unix environment like a server that is hostile to active development. So the tutorial used whatever core utility that's available: cat, beginning the multi-generational bad habit.


The "do one thing" is failed by text itself - text encodes backspacing and such. It even has a bell character. This has no reason to be built into our text format.

Saying "cat is not a text viewer" is absurd, though - the history of Unix is that the "print" command was removed when the Unix devs realized that cat did the same thing already, and so told their users to just use cat with a single argument.

I'm not saying it was good design, but it was intended design.


> It even has a bell character. This has no reason to be built into our text format.

it's not built into our text format (our being us,-unix-before-unicode). It's built into the terminal.

The file format is 8-bit clean as it should be, any other values are going to have to be out-of-band. The unix stream format follows the file format, with the exception of

the terminal format (and the comms format) is 7-bit ascii with varying amounts of ANSI control, but that's on you, you bought the terminal...


ASCII characters roughly below 20 are largely control character. Backspace is a control character. Carriage return is technically a control character. There are CSV like form field characters. Escape is another non-printable ASCII character.

ANSI escape sequences do also include control codes but there’s plenty in ASCII too.

Frankly, you only need to spend 30 seconds looking at an ASCII table to see this.

Source: I’m an author of an alt $SHELL


sorry, but ASCII is ANSI, so the ASCII control characters are ANSI characters, and I didn't even mention "ANSI's escape sequences" which are an escape from ASCII's single character meanings, but use the ASCII ESC character as it was intended, to escape within the ASCII framework. ANSI makes people think about controlling devices and ASCII also controls devices (not to mention bidirectional comms) so I think it's a good way to think of it.

Since we've all read the docs, but a bunch of you are laboring under misimpressions, I'm trying to say things in an unusual but truthful way to be the iceberg to the titanic of misunderstanding that thinks the BEL character doesn't belong in ANSI's ASCII protocol


I'm not sure what you mean: ASCII text encoding is full of controls characters, most of them being only historical and not used by anything, including terminals.


ASCII, an ANSI code, includes the concept "information interchange" in the name, i.e. it is also a communication protocol, and also a terminal device control, inasmuch as it is also a character set. That's why OP was making a mistake claiming that "BEL" doesn't belong in there.

ANSI's escape sequences came later to control more functions for glass teletypes.


I would like to counter-argue that it's the author's fault for doing things the wrong way.

While I'm sure the actual code looked different, the "spirit" of the code builds a CSV file by concatenating strings and specifying raw ASCII codes. Building a CSV file like this on 2023 is guaranteed to cause bugs down the line.

Had the code used the Python CSV primitives, or at least had the author let Python convert the "\n" character to the OS-dependent representation, the bug would have been avoided entirely. The author just happened to run into it with `cat` instead of, say, a CSV parser confused about the unusual choice.


It's ColdFusion, not Python. A language that forces you to program like it's the 60s because character escapes are too advanced a concept to implement.


The author says it was a simplified example. They may will have implement a fully features CSV emitter but got the liver separator wrong. Either way whether or not their CSV was valid is irrelevant to the story. The point is that some characters in a file may obscure the contents of that file in surprising ways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: