

Frak takes an entirely different approach to generating regular expressions - joeyespo
http://thechangelog.com/frak-takes-an-entirely-different-approach-to-generating-regular-expressions/

======
brudgers
I get the motivation for writing it.

I am not sure I see wisdom in using it.

The typical use case for regular expressions is to find patterns in an
arbitrary unknown string.

There can be multiple regular expressions which match a particular string or
set of strings because of the Kleene Star - i.e. the examples:

    
    
      "foo" "bar" "baz" "quux"
    

are all matched by

    
    
      (?:ba[rz]|foo|quux)
    

but also by

    
    
      [:alpha:]* | [:lower:]*  etc.
    

Having started the port of VerbalExpressions to Racket, the fundamental issue
is that the only practical way to express an automata requires the use of one
of the forms which can express an automata. TANSTAAFL.

~~~
noprompt
This isn't something I would recommend using for everyone. But it does have
use cases that make it appealing and useful. The same thing could be said
about VerbalExpressions.

While it's true `[:alpha:] _|[:lower:]_ ` (even better `([:alpha:]|[:lower:])
_` or just `[:alpha:]_ `) would work to match "foo", "bar", "baz", etc.; it
would also match "turduken" or epsilon (""). This isn't always what you want.

One nice thing about generating a regular expression in this manor, is it's
arguably much easier to maintain. Instead of having a heavy pattern to
refactor or, god help you, append to, you simply add another string to your
list and be on your way. Plus you get the potential memory savings and
performance benefits.

In the case of VerbalExpressions, the `or` method could use the same technique
as frak to improve the quality of the expressions it emits. While on the
subject, VerbalExpressions could also borrow some good ideas from is
Christophe Grand's regex library:
[https://github.com/cgrand/regex](https://github.com/cgrand/regex).

Edit: Forgot some colons.

~~~
brudgers
I don't disagree with any of your points. But I don't think they address the
problem inherent in Frak - there's no way to insure that the expression gives
the intended result against an arbitrary input.

The results with "barber" are indeterminate since the input string "bar"
matches (^bar$) as well as (bar). In other who knows what language is
described by the output of Frak?

I am not in love with VerbalExpressions - porting it to Racket was a
reasonably sized project from which I felt I could learn something about
Racket, I did. And something about regular expressions, which I also did.

I think VerbalExpressions suffers from a bit of the same conceptual problem as
Frak - though perhaps to a lesser degree. That problem is treating Regular
Expressions as something other than the formal description of a language.

However, I think that the transition from a muddled concept of regular
expressions to correct one is more straight forward with VE - i.e. adding
(kleeneStar), (atLeastOne), and (exactlyOne) to VE doesn't break the bigger
idea.

The general problem with anything that attempts to implement regular
expressions informally can be summed up as:

    
    
      Pick any two:
       
       ()Kleene Star
    
       ()Union
    
       ()Concatenation
    

It's a "cheap, fast, correct" problem.

~~~
noprompt
> "But I don't think they address the problem inherent in Frak - there's no
> way to insure that the expression gives the intended result against an
> arbitrary input."

I've made sure to thoroughly test the patterns produced by frak. They've also
passed muster with another project where frak has proven to be a good idea
(see this issue [https://github.com/guns/vim-clojure-
static/pull/28](https://github.com/guns/vim-clojure-static/pull/28)). While
they are not full optimized (yet), I'm fairly sure they are correct. If you
have found a counter example, please open an issue.

> "The results with "barber" are indeterminate since the input string "bar"
> matches (^bar$) as well as (bar)."

The next point release will allow users to specify an exact match. Initially,
this wasn't an option because Clojure has two functions `re-find` and `re-
match` for relaxed and strict matches respectively.

    
    
      user> (re-find #"bar" "barber")
      "bar"
      user> (re-find #"^bar$" "barber")
      nil
    

> "I think VerbalExpressions suffers from a bit of the same conceptual problem
> as Frak - though perhaps to a lesser degree."

There is nothing conceptually wrong with what VerbalExpressions or frak
attempt to do. They may not provide value to everyone, but they are useful
contributions and have tangible benefits.

> "That problem is treating Regular Expressions as something other than the
> formal description of a language."

Why is that a problem?

It has been shown it's possible to use regular expressions as a means to
perform arithmetic. Is that too a problem?

I don't think so.

Formality is a great thing and necessary. But it can also be the opposite.
While I'll agree understanding it is important, I'll contend getting hung up
on it is dangerous.

------
crazygringo
I really like the idea... but so far, it seems limited to actual characters,
with no "*" or "+" expansion, no collapsing whitespace into "\s", etc.

So it doesn't seem like you could feed it "a", "aaa", "aaaaa" and then get
"a+".

But even if it did, it would seem there would be so many edge cases in what
the "correct" regular expression should be, that specifying them all would be
just as hard as writing the regular expression in the first place.

But nevertheless, definitely a cool little toy. Would be great to have a web
front end to test it out, as webjames said.

~~~
samatman
You're asking for implicit induction, which is not what the user would want in
the general case. "a" "aaa" "aaaaa" could be expanded to "a(aa){0,2}" through
deduction, and no more.

since frak is using a trie, this kind of encoding should be a conceptually
straightforward extension of the existing parse structure.

~~~
noprompt
These expansions are certainly possible and I'm planning to investigate
whether they have performance benefits. One interesting discovery we made last
weeks is that `(a|b|c)` actually produces a larger state table than `[abc]`.
This was both the case for Java and Vim. While the performance gains are
minute, nevertheless, they are gains.

I haven't checked if "a{4}", for example, is faster than "aaaa" or produces
fewer states. But it's something I am interested in. I just haven't had enough
time this week.

------
samatman
For those who are missing wildcards:

Wildcards are not possible when going from a restricted input to a finite
automaton. However, if you aren't using say ø, just make your input to frak
"fooøbar" then add your wildcard to the generated regex.

I agree the output could be more compact, and collapse certain conventional
characters (\n, \t and the like). The tool itself has all the power one should
need.

------
webjames
Great work - I'd love it if someone would write a web front end to this.

~~~
noprompt
There will be a JavaScript version available this later week, both for nodejs
and the browser. If you want to try out the command line version see issue #2
for instructions.

------
Groxx
This is pretty cool - nice work! Can't say I've seen a trie-based construction
like this, but it's clever and looks like a near-perfect fit.

~~~
noprompt
Apparently the Perl community has been doing this for years. :)

------
crazygringo
GitHub is currently suffering a big DDoS, here's the Google Cache version:

[http://webcache.googleusercontent.com/search?q=cache:PXnt0AW...](http://webcache.googleusercontent.com/search?q=cache:PXnt0AWpMZAJ:https://github.com/noprompt/frak+&cd=1&hl=en&ct=clnk&gl=us&lr=lang_en%7Clang_pt)

------
gleb_bahmutov
Is there analogy implemented in javascript?

~~~
agumonkey
Someone told me it was present in emacs, here's an old source code for it
[http://stuff.mit.edu/afs/sipb/contrib/emacs/packages/w3m_el-...](http://stuff.mit.edu/afs/sipb/contrib/emacs/packages/w3m_el-1.2.8/attic/regexp-
opt.el)

