Hacker News new | past | comments | ask | show | jobs | submit login
ls | grep “echo ${data}” – Why/how does this work? (zerobin.net)
76 points by indigodaddy on May 8, 2018 | hide | past | favorite | 46 comments



  grep "echo ${data}"
means that when grep is exec-ed, it receives these arguments:

   argv[0] = "grep" /* possibly full path, irrelevant here */
   argv[1] = "echo <contents of data variable>"
   argv[2] = NULL 
If this does what you want, it means that you wanted to look for strings starting with "echo ". All of arv[1] is the pattern to look for (because it doesn't look like a command line option). Since there are no other arguments, grep is scanning its standard input.

OTOH

   grep `cat filelist.txt`
gets execed like this:

   argv[0] = "grep"
   argv[1] = "<first 'word' from filelist.txt>"
   argv[2] = "<second 'word' from filelist.txt>"
   ...
   argv[n] = "<last 'word' from filelist.txt>"
   argv[n+1] = NULL
Needless to say, grep doesn't work this way; it doesn't accept multiple patterns as arguments. We cannot use "grep foo bar baz file" to scan file for occurrences of foo, bar or baz. Rather, what this means is to scan bar, baz and file for occurrences of foo.

Also, there is the risk that material from filelist.txt looks like command line options for grep. If filelist.txt contains "-v foo bar" then grep `cat filelist.txt` means exactly the same thing as "grep -v foo bar".

grep has a -f option which can specify a file whose lines are patterns:

   grep -f filelist.txt
If we use the -F option, those patterns are interpreted as fixed strings, like the single pattern given on the command line.


fgrep is the way to go here. It uses a different algorithm that makes only one pass through the file while searching for all strings simultaneously. An extension of trie search.


fgrep is an obsolete command that didn't make it into POSIX.

I already mentioned the -F option of grep.

"fgrep is the same as grep -F" -- GNU grep man page.

I would expect that when the patterns in -f <patternfile> are treated as regular expressions, they are also combined into one regex.


I know it's the same thing. I prefer the name fgrep because it does do a different thing. I write "grep -F" in scripts, but I say fgrep.


I don't understand how this is an answer. If I follow your argument correctly, and totally ignoring for a moment what ${data} might evaluate to, then for the first command to produce any output it is necessary for the output of ls to at least contain the string "echo". But we can see that it does not, yet the claim (which I haven't tested) is that it does produce output. Why is that?


It doesn't work, at least in a POSIX-like shell:

  0:terada:~$ mkdir testdir
  0:terada:~$ cd testdir
  0:terada:~/testdir$ data="testfile1 testfile2 testfile3"
  0:terada:~/testdir$ touch $data
  0:terada:~/testdir$ ls -1
  testfile1
  testfile2
  testfile3
  0:terada:~/testdir$ ls | grep "echo ${data}"
  1:terada:~/testdir$ 
I've seen this sort of shenanigan more than once in comp.unix.shell; usually followed by "oops, I somehow confused myself and can no longer reproduce this" type of backpedaling.

The submitted web page doesn't have a properly presented repro test case; it's writing about something which allegedly happened on some machine somewhere, possibly with inaccuracies and omissions.

We don't know what shell this is. Maybe the command that was actually run was ls | `echo ${data}`, data happened to be empty at the time, and the shell being used expanded that backtick expression to an empty argument instead of expanding to no argument. An empty regex matches everything.


This was done on CentOS 7.3.x (not CentOS 6 as I had incorrectly first noted), and the following versions of bash and grep:

"GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)"

"grep (GNU grep) 2.20"

So where $data is a newline delimited list of items/patterns:

  grep "${data}" 
..will indeed match each item/pattern individually and grep over each, eg, just as grep -f does against a file with a list of patterns. It doesn't work without the double quotes, and putting the echo in front was obvious derp in hindsight. See my comment where I replied to uraza and demonstrated this.


Aha! This is a POSIX-required behavior, not documented in the GNU "info grep" or man page.

According to POSIX, the syntax is:

  grep [-E|-F] [-c|-l|-q] [-insvx] -e pattern_list
What is pattern_list? "The pattern_list's value shall consist of one or more patterns separated by <newline> characters"

In the GNU man page and info, the pattern argument is referred to as the PATTERN metavariable, a noun in singular form. The only hint I can find that it can contain a plurality is this, in the man page:

       -F, --fixed-strings
              Interpret PATTERN as a list of fixed strings (instead of regular
              expressions), separated by newlines,  any  of  which  is  to  be
              matched.
This drops the hint that there may be multiple regular expressions in PATTERN separated by newlines, which may instead be treated as fixed strings.

A similar description is in the Info manual.

Elsewhere in both documents, PATTERN is misleadingly referred to as "the pattern".

This may be worth a documentation patch.


Documentation patch logged against GNU Grep which thoroughly clarifies that the pattern argument is a newline-separated list.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31400


This only works accidentally.

If the output of ls is not a terminal, then it is single-filename-per-line (and sorted alphabetically):

  $ ls
  nomatchfile1 nomatchfile2 testfile1    testfile10   testfile2    testfile3    testfile4
  $ ls | cat
  nomatchfile1
  nomatchfile2
  testfile1
  testfile10
  testfile2
  testfile3
  testfile4
The first argument to grep is the pattern, the following arguments to grep are the files to search. If there are any files specified, stdin is not read/searched.

  $ grep testfile1 testfile2 testfile3
  $ ls | grep testfile1 testfile2 testfile3
Quoting affects whether you pass one or multiple arguments to grep:

  $ echo ${data}
  testfile1 testfile2
  $ echo "$data"
  testfile1
  testfile2
Finally, newlines in the pattern cause grep to treat each line as a separate pattern:

  $ ls | grep "$data"
  testfile1
  testfile10
  testfile2
Backticks alone do not quote their contents, so the expanded backticks output is treated as multiple arguments.

etc. etc. lots of ways to mix this all up :)


This also means that "echo testfile1" is one the patterns passed to grep here.


yup. which is why "testfile1" was not in the result of the original experiment:

  $ ls | grep "echo ${data}"
  testfile10
  testfile2
  ...


do not parse the output of ls

http://mywiki.wooledge.org/ParsingLs


Excellent link, ty. More things that I should not have been doing... :)


Would you mind telling us what it is you hope to accomplish? There's so much wrong with the paste. For instance:

  [jbrown@tools1 testdir1]$ echo ${data}
  testfile1 testfile2 testfile3 testfile4 ...
Is ambiguous. Is `data` an array? Is it a string with lots of spaces? Newlines? There's no way to know and I get the sense that we're being nerd sniped.


>There's no way to know and I get the sense that we're being nerd sniped.

I thought I was having a stroke while reading the OP (and some of the top comments)

can someone concisely summarize what's happening?


heh definitely nerd sniped ... compelled to answer when this gets up-voted so people don't think it's some special insightful question, rather than like 5 different critical errors all mixing together with strange results


I don't know what nerd sniping is, but I'm pretty sure I wasn't trying to do that, nor trick anyone. I'll try to better explain how I defined the variable when I get to a computer. Essentially I just copy pasted from the filelist.txt output, into data='copy pasted into here, which also appeared to clearly generate newlines'


I believe you it wasn't intentional :) explanation: https://www.xkcd.com/356/


Lol OK fine. :)

Look, I'm kind of at the level of the layman trying to decipher a Numberphile video. You guys are trying to distill it down for me, and I'm desperately trying to grasp at it. But hey, at some point I may grab hold.

I thought I was serviceable in bash/builtins, then I post this and realize just how much I don't know. But hey that just means there's that much more to know and learn.


"Javascript is required for ZeroBin.net to work. Sorry for the inconvenience."

Here's the contents of the paste:

I stumbled upon/essentially derped the:

  ls | grep "echo ${data}"
 
... line, saying oh this should work... and it actually worked.. I didn't think anymore about it...

... but later on, I thought about it more and decided echoing the variable shouldn't actually work, and you should have to do a for loop instead... and that it should fail in the same way that piping thru a cat of the file should fail (and does fail):

  ls | grep `cat ../filelist.txt`
 
...does not work/no output (as expected)

But, piping to grep and echoing the variable actually works and returns the wanted results/matches!! How is this possible?

  [jbrown@tools1 testdir1]$ ls | grep `cat ../filelist.txt`
  <<NO OUTPUT>>
   
  [jbrown@tools1 testdir1]$ ls | grep "echo ${data}"
  testfile10
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
   
  [jbrown@tools1 testdir1]$ echo ${data}
  testfile1 testfile2 testfile3 testfile4 testfile5 testfile6 testfile7 testfile8 testfile9 testfile10
   
  [jbrown@tools1 testdir1]$ cat ../filelist.txt
  testfile1
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
  testfile10
   
  [jbrown@tools1 testdir1]$ ls
  nomatchfile1   nomatchfile2  nomatchfile4  nomatchfile6  nomatchfile8  testfile1   testfile2  testfile4  testfile6  testfile8
  nomatchfile10  nomatchfile3  nomatchfile5  nomatchfile7  nomatchfile9  testfile10  testfile3  testfile5  testfile7  testfile9


It doesn't (bash, GNU grep). Shell expands inside double quotes. grep does not run commands in a re.


> It doesn't (bash, GNU grep).

It can, you just incorrectly guessed what's the required input for it to work (the author didn't do a very good job at showing the environment in line 29).

If you pass to grep a multi-line argument as the pattern (or a pattern, for -e option), grep will treat each line of this argument as a separate pattern. Or at least that's what GNU grep is doing.

"echo ${data}" worked (and only somewhat and accidentally) because while the first line of ${data} got a prefix of "echo ", the other lines stayed the same.

`cat filelist.txt` didn't work, because grep got the content of filelist.txt as a separate arguments (one word per line, if I guess the content correctly) thanks to shell's word splitting, so grep treated the first filename as a pattern and the rest of the filenames as files to search for the pattern. If the filelist.txt contained a name that didn't exist, grep would print an error message for ENOENT. Note that in this scenario the output of `ls' was not read by grep at all.

If it was "`cat ...`" (double-quoted form), wordsplitting wouldn't take place and the grep call would work as the author was expecting.


Yep, this is the correct answer! I was just writing up something similar, but you beat me to it. This is the expected behavior if the `data` variable has newlines between every word:

  $ echo ${data} 
  testfile1 testfile2 testfile3 testfile4 testfile5 testfile6 testfile7 testfile8 testfile9 testfile10
  $ echo "${data}"
  testfile1
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
  testfile10
  $ echo "echo ${data}"
  echo testfile1
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
  testfile10


Thanks for all the helpful and insightful comments everyone.

I guess essentially what I was doing was really the same as the following (excepting the echo derp):

  [jbrown@tools1 testdir1]$ ls | 
  grep 'testfile1
  > testfile2
  > testfile3
  > testfile4
  > testfile5
  > testfile6
  > testfile7
  > testfile8
  > testfile9
  > testfile10'
  testfile1
  testfile10
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
...the above has the same/desired output (including the missing testfile1 that the echo derp had missed).

I guess though that I still don't fully understand how/why grep is able to interpret/execute each newline individually, eg kind of cycling through it.

EDIT - I do realize dozzie pretty much explained it, and what's happening here, however still sort of bewildered that grep is able to perform the operation sort of separately on each newline as an individual argument, and sort of parse through each one...


Grep treats the pattern as a newline-separated list of items to search for.

From https://www.gnu.org/software/grep/manual/grep.html#Introduct...: "Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text."


Right, and actually, it looks like only 'ls | grep "${data}" ' (eg, the data var double-quoted) works:

  [jbrown@tools1 testdir1]$ data='testfile1
  > testfile2
  > testfile3
  > testfile4
  > testfile5
  > testfile6
  > testfile7
  > testfile8
  > testfile9
  > testfile10'
  
  [jbrown@tools1 testdir1]$ echo $data
  testfile1 testfile2 testfile3 testfile4 testfile5 
  testfile6 testfile7 testfile8 testfile9 testfile10
  
  [jbrown@tools1 testdir1]$ ls | grep `echo ${data}`
  <<NO OUTPUT>>

  [jbrown@tools1 testdir1]$ ls | grep ${data}
  <<NO OUTPUT>>

  [jbrown@tools1 testdir1]$ ls | grep "${data}"
  testfile1
  testfile10
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9
Believe that using echo in the backticks may amount to same effect as 'ls | `cat ..filelist.txt` (eg, results in ENOENT)

And of course, as many have mentioned grep -f does what I was looking for and would have prevented this whole exercise:

  [jbrown@tools1 testdir1]$ ls | grep -f ../filelist.txt
  testfile1
  testfile10
  testfile2
  testfile3
  testfile4
  testfile5
  testfile6
  testfile7
  testfile8
  testfile9


For multi-line greps (rare, I guess), I use: pcregrep -Mr


Hint: notice that “testfile1” is missing from grep’s output.


Another hint: set -x This will show you the expanded versions of your command.


Ok so just | $data

or

| "$data"

.. would/should have worked as well (and should have caught everything?) ?


Did not notice that testfile1 was missing, interesting


Hint2: create a file called “echo testfile1” and try again.


Btw, this was after-the-fact testing. We had a list of about 800 or so unique UUIDS (there was no pattern anywhere between the UUIDS), where the UUIDs were present in some portion of the string for the filenames in working dir.

Example UUID in the list:

fd306bdfb115f1284655da6b06f826eb

There were thousands of other files in the directory that did not have any part/string of the filenames that matched the list. And using the echo $data technique found the matching files.

Is this still accidentally/happenstance matching here?


Realize now of course that 'grep -f uuidlist.txt' would have likely just done it, but kind of glad I've undertaken this exercise, even if it clearly shows my bash skills are subpar.

I've got plenty to learn no doubt, and ny intent in posting this was to learn.


It probably missed the first UUID in the list, because the search term would have been "echo $UUID". grep would correctly parse the rest of the newline-delimited list.


Why wouldn't he just use `grep -f`?

EDIT: and about the question in the pastebin - I guess it has to do with ${var} showing it contains spaces as opposed to `cat file.txt` having newlines.


I'll have to experiment with the -f flag. Completely forgot about this, and would likely have done the trick without having to resort to the echo variable nonsense.


People who use 'ls | grep -f ../filelist.txt' don't see anything odd to write about. It's kind of like the anthropic principle.



Tested on a file with spaces instead of new lines and still doesn't work with the pipe thru cat file


I did not understand at all what this does.

It seems like

    ls | grep "echo ${data}"
    echo ${data}
    cat ../filelist.txt
all returns `testfile1..10` with slightly different formatting/ordering.


Are the files all empty? Looks like echo is part of the search string and the expansion might be interpreted as the file list. ls is ignored. Just a guess...


Under what shell? If this works as described, then it's broke.


This is bash (CentOS 6x).. will have to check exact version whilst at next available computer..


Correction, this was done on CentOS 7.3.x (not CentOS 6 as I had incorrectly first noted), and the following versions of bash and grep:

"GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)"

"grep (GNU grep) 2.20"

So where $data is a newline delimited list of items/patterns:

  grep "${data}" 
..will indeed match each item/pattern individually and grep over each, eg, just as grep -f does against a file with a list of patterns. It doesn't work without the double quotes, and putting the echo in front was obvious derp in hindsight. See my comment where I replied to uraza and demonstrated this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: