Hacker News new | past | comments | ask | show | jobs | submit login

Can anyone explain how this regex [- ~] matches ASCII characters ?



It's pretty simple. Assuming you know regex... Im going to assume you don't since you are asking.

The bracket expression [ ] defines single characters to match, however you can have more then 1 character inside which all will match.

  [a] matches a
  [ab] matches either a or b
  [abc] matches either a or b or c
  [a-c] matches either a or b or c. 
The - allows us to define the range. You can just as easily use [abc] but for long sequences such as [a-z] consider it short hand.

In this case [ -~] it means every character between <space> and <tilde>, which just happens to be all the ASCII printable characters (see chart in the article). The only bit you need to keep in mind is that <space> is a character as well, and hence you can match on it.

You could rewrite the regex like so (note I haven't escaped or anything in this so its probably not valid)

  [ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~]
but that's not quite as clever or neat.


It doesn't. Space is significant here, and if '-' is a the front of the matching character class it matches literal '-'. Your regex '[- ~]' matches either '-' or ' ' or '~'.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: