
Incremental Regular Expressions (2012) - lelf
http://jkff.info/articles/ire/
======
oconnore
Similarly, I wish Regex engines had built in “partial match” support — i.e. is
there some suffix that can be added to this string that results in a match?

I wrote something like this in Haskell for doing path traversal (descend into
directory, or skip?). But it seems like a straightforward API addition to
expose valid/non-valid states.

~~~
wahern
The regex engine used by musl libc, TRE
([https://github.com/laurikari/tre](https://github.com/laurikari/tre)),
supports approximate matching. musl doesn't seem to expose that capability,
however.

------
apocalypstyx
On a tangentially related note, more and more I find myself wishing regex
syntax wasn't such a hodgepodge of literal/non-literal characters. It always
seems to end up in a mess of what is and isn't escaped and knowing what needs
to be in which context. And I keep thinking we really only need one special
character, namely '\', to prefix all non-literals. So \\\ produces '\', \\. is
dot match, \\* is star match, etc. So instead of: .foo* we have \\.foo\\*,
which admittedly is going to end up more verbose in most situations, but still
strikes me as clearer as to what's intended.

~~~
chubot
The Oil shell fixes that, and it compiles back to the old syntax so it
seamlessly works with 'grep', etc.

[http://www.oilshell.org/release/latest/doc/eggex.html](http://www.oilshell.org/release/latest/doc/eggex.html)

Point 2: _Their syntax is vastly simpler because literal characters are
quoted, and operators are not. For example, ^ no longer means three totally
different things. See the critique at the end of this doc._

(In other words, it's similar to lex or re2c syntax)

\----

Another example:

[http://www.oilshell.org/blog/2019/12/22.html](http://www.oilshell.org/blog/2019/12/22.html)

The Eggex syntax is independent of Oil, so you can implement as a library in
another language. I'm interested in help too:

[https://github.com/oilshell/oil/issues?q=eggex+label%3Aeggex](https://github.com/oilshell/oil/issues?q=eggex+label%3Aeggex)

One easy contribution is to to translate it back to PCRE syntax, because
that's a very common syntax that people care about. Right now it translates to
ERE, which works with egrep, awk, and GNU sed --regexp-extended.

It should be a trivial change to print the eggex AST in a hundred lines of
code or so (really the hard part is testing).

~~~
j88439h84
Interesting idea. What are the benefits of defining a fancy new regex-like DSL
instead of just using context-free grammars specified in EBNF?

~~~
chubot
Eggex gives you a more EBNF-like syntax for regexes. CFGs are basically
regular languages with recursion, so there's no reason for them to have a
wildly different Perl-like syntax. That's basically a historical accident.

If you've written CFGs, then eggex will be very familiar. See the example:
[http://www.oilshell.org/blog/2019/12/22.html](http://www.oilshell.org/blog/2019/12/22.html)

\-----

Also, regular languages are straightforward and tractable, and you should use
them where possible (aside from bad syntax, which Eggex is fixing).

In contrast, CFGs are useful, but if you start using them a lot, you will see
they're not really "one thing" as far as programming/engineering is concerned.
They sort of explode into a fractal of complexity -- there are many different
subsets of CFGs, some of which can be parsed efficiently with particular
algorithms.

And some people reject CFG in favor of PEG, etc.

[https://github.com/oilshell/oil/wiki/Parsing-Models-
Cheatshe...](https://github.com/oilshell/oil/wiki/Parsing-Models-Cheatsheet)

\----

That is, the recursive grammar issue is not "settled". Whereas regular
expressions have well known algorithms and are "settled" (except for syntax).

[https://swtch.com/~rsc/regexp/](https://swtch.com/~rsc/regexp/)

So basically regexes have bad syntax but relatively good semantics (libc
BRE/ERE doesn't have the Perl backtracking issue). CFGs don't have a
consistently bad syntax, but the semantics are much more complex when you're
talking recognizing vs. generating. And of course in programming (rather than
math), we want to recognize, not generate.

Related:

[https://github.com/oilshell/oil/wiki/Why-Lexing-and-
Parsing-...](https://github.com/oilshell/oil/wiki/Why-Lexing-and-Parsing-
Should-Be-Separate) (Do the easy thing with the fast algorithm, and the hard
thing with the slow algorithm. Don't do the easy thing with the slow
algorithm.)

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=4964496](https://news.ycombinator.com/item?id=4964496)

