
What happens when you grep the file you've redirected grep to? - luu
https://anniecherkaev.com/grep-your-way-to-freedom
======
userbinator
_which means that when we do the write in that for-loop, we’re guaranteed that
the read which happens after in the while-loop must see the result of that
write. So: we read an “a”, write an “a”, and when we get back to the while-
loop we’re guaranteed that the next read will see that “a” we just wrote._

In other words, it's behaving like a FIFO that persists a history of
everything that's been written to it, which also suggests that there are _two_
pointers, one for the read position and another for the write position; all
becomes clear when we realise that it's not reading and writing with the same
file _descriptor_ , but rather _two_ file descriptors of _one_ file, opened
twice.

The tl;dr of why small sizes terminate and large sizes loop infinitely is
this: when the size of data output is not enough to fill the write buffer, the
read can reach the end of file (0 bytes read), and then it flushes the write
buffer before terminating; but when the output size fills the write buffer, it
gets flushed and ends up back in the input, causing the infinite loop.

The C stdio buffering layer doesn't guarantee that data written with a
fwrite() will show up in a fread() from a different file handle that
references the same file (and now that I think about it, that would seem to be
quite difficult to guarantee.)

~~~
tankenmate
It's not overly difficult to guarantee, the UNIX/Linux kernel does it just
fine. It does require more work in that the libc file cache will need to
invalidate itself if the source file / socket changes state (either via atime
if the filesystem supports it or via inotify/fnotify). Obviously invalidating
the cache will lead to lower performance but more correct behaviour.

------
NiceGuy_Ty
I did this before when using ripgrep, and accidentally created a several
hundred GB file. Now ripgrep checks if this is the case and handles it
gracefully:
[https://github.com/BurntSushi/ripgrep/pull/310](https://github.com/BurntSushi/ripgrep/pull/310)

------
repsilat
A really fun one is having `bash` execute its output using a FIFO. It worked
when I tried it last, but it'd obviously be more than crazy to rely on it (for
more than one reason.)

~~~
userbinator
Or for even more fun, self-modifying shell scripts:

[https://stackoverflow.com/questions/3398258/edit-shell-
scrip...](https://stackoverflow.com/questions/3398258/edit-shell-script-while-
its-running) (curiously, the accepted answer is incorrect and downvoted
highly)

...and Windows batch files:

[https://stackoverflow.com/questions/906586/changing-a-
batch-...](https://stackoverflow.com/questions/906586/changing-a-batch-file-
when-its-running)

~~~
jwbensley
Self modifying code is wandering into the (very interesting IMO) realms of
metamorphic and polymorphic code. I started an SO questionable here:
[https://stackoverflow.com/questions/10113254/metamorphic-
cod...](https://stackoverflow.com/questions/10113254/metamorphic-code-
examples)

I'll try to find some time soon to port that C+inline assembly example to
Linux, I didn't know much C or ASM back when I originally asked the question.

------
ScottBurson
A variation you can run into with GNU grep:

    
    
      grep -r [something] . >foo
    

I've done this by accident once or twice :-)

~~~
AnimalMuppet
The way I've done it:

    
    
      grep -r [something] * > foo
    

What I now do instead:

    
    
      grep -r [something] * > .foo
      mv .foo foo
    

Why it works: At least on bash, "*" does not match hidden files (files that
begin with "." such as ".foo").

------
JepZ
Reminds me of how easy it is to destroy a file with sed:

    
    
      sed 's/a/b/' test.txt > test.txt
    

After that command the file is empty no matter what was in there before.
Instead you have to use the -i flag like:

    
    
      sed 's/a/b/' -i test.txt
    

And different operating systems seem to behave differently:

[https://stackoverflow.com/questions/5171901/sed-command-
find...](https://stackoverflow.com/questions/5171901/sed-command-find-and-
replace-in-file-and-overwrite-file-doesnt-work-it-empties)

~~~
contingencies
My memory is to _sed -i -e 's/pattern/replacement/' file_, the addition of
_-e_ was perhaps subconsciously to help protect against this eventuality by
mentally segregating my standard pipeline invocation which takes neither
argument.

Anyway _sed_ is so fast you should definitely check the output before
overwriting stuff. Same goes for anything on the command line.

Get ZFS. Do snapshots.

~~~
ramshorns
> _you should definitely check the output before overwriting stuff_

That wouldn't really help in the case of redirecting sed to the same file.

    
    
        $ echo "a" > test.txt
        $ sed s/a/b/ test.txt
        b
        $ sed s/a/b/ test.txt > test.txt
        $ cat test.txt
        $

~~~
contingencies
Indeed. Use >> in preference to > at all times ;)

------
dima55
"apt install moreutils". Then look up the "sponge" tool

------
IgorPartola
Stupid question: in the cases where the input is a regular file, why doesn’t
grep check the size of the file at the beginning? Is t actually desired
behavior to search through stuff added to the file while we are processing the
beginning of it?

~~~
rocqua
In Unix we have the idiom of "Everything Is A File". As such, when grep gets a
file descriptor, it cannot always get the file-size, the file-size might be
infinite, or might vary over time. Instead, like other have said, the only way
to know you've read 'until the end' is to read reports you've reached the end
of the file.

Consider for example what should happen when grepping /dev/null/? Or, a more
sensible case, piping the output of some command to grep. Grep will read from
'standard in' which is "Just A File" so it just calls read until it reports
end of file.

~~~
IgorPartola
> Consider for example what should happen when grepping /dev/null/

In the solution I gave, same thing as before because I said _regular file_.
That is if you can open /dev/null for reading, which I thought you couldn't.

I think I'd be fine with saying that the infinite loop is the behavior that
happens. GNU's grep obviously is doing some janky check which it shouldn't be.
Consider what it will do when you get just slightly more clever and it can't
determine the input/output types. The one on macOS has arbitrary file size-
dependent behavior, which is problematic. Doesn't seem like either of them
does a consistent thing.

------
raldi
> And about 15 up-arrow+enters later

Gonna blow your mind here: Press up _once_ and then Ctrl-O.

~~~
ship_it
Nice, works for `/bin/bash` but not `zsh`. Still nice trick.

~~~
JdeBP
It works in at least version 5.2 of the Z shell, where Control-O is bound to
accept-line-and-down-history.

------
stormbrew
The section about the guarantees of read/write doesn't seem entirely correct
to me. For sure it's relevant to why the data is there, but the reason it
doesn't _terminate_ is just because read doesn't tell you there's an eof until
you try to read again past the end. It would be entirely possible to construct
a version of this loop that would terminate. Though it would be awkward.

------
emilfihlman
>echo "a" > test.txt

test.txt will contain "a\n" not just "a"

-n to disable adding \n

------
asicsp
good one..

when I checked on Linux, both `cat` and `grep` give error when input file name
is same as output... but not `sed/awk/head/tail/sort/etc`..

~~~
goialoq
If you redirect output, how can the command know the name of the destination?

~~~
Karliss
Somewhat more common use of this is to check if output goes to terminal so
that terminal color codes don't get printed to files.

~~~
rocqua
Which is really annoying when you want to pipe grep to less, and need to add
--color=always to get grep to understand less takes color codes (when using
less -R).

------
JdeBP
I did enjoy how about halfway down the author bemoaned the fact that the C
language does not have line numbers. (-:

~~~
jwilk
It was phrased poorly, but I think they meant the Apple's source browser
doesn't show line numbers.

------
vijaybritto
Brilliant write up. Was very easy to grasp!

------
starpilot
Some things are not meant to be questioned.

~~~
chickenthief
Agreed, this analysis helps nobody.

------
ourmandave
Ugh. Flash backs of my older brother's torment when he wanted to use the
computer and I wouldn't get out of the way fast enough.

"Quit grepping yourself! Quit grepping yourself!"

#IKnowYourAreButWhoAmI?

~~~
taneq
Ah, the oldschool version of "quit googling yourself."

