
Inference of Regular Expressions for Text Extraction from Examples - fforflo
http://machinelearning.inginf.units.it/publications/international-journal-publications/inferenceofregularexpressionsfortextextractionfromexamples
======
bcherny
This is very cool, and is a very hard problem!

It reminds me of Hofstadter's Jumbo experiment from the '80s:
[https://books.google.com/books?id=somvbmHCaOEC&pg=PA97&lpg=P...](https://books.google.com/books?id=somvbmHCaOEC&pg=PA97&lpg=PA97&dq=hofstadter+jumbo)
(Clojure implementation here:
[https://github.com/vemv/jumbo](https://github.com/vemv/jumbo)). If I remember
right, his thesis was that this sort of pattern matching is at the core of
what is intelligence, and maybe even what is consciousness.

For example if I have "ABBC" and "BCCD", what is the rule? Is it any 4 letter
string? Is it the same character repeated twice? Is it the same character
repeated twice in a particular offset from the beginning of the string? There
is some aspect of heuristics, and some aspect of "elegance" to a given
solution - what a human would judge to be the "right" solution is surely some
balance of specificity and simplicity.

Also see relevant Stack Overflow posts:
[http://stackoverflow.com/questions/4880402/how-to-auto-
gener...](http://stackoverflow.com/questions/4880402/how-to-auto-generate-
regex-from-given-list-of-strings) and
[http://cstheory.stackexchange.com/questions/1854/is-
finding-...](http://cstheory.stackexchange.com/questions/1854/is-finding-the-
minimum-regular-expression-an-np-complete-problem)

------
tyingq
Different approach, but I find this pretty handy for coming up with ideas for
parsing via regex: [https://txt2re.com/](https://txt2re.com/)

------
maraschino
Source code here:
[https://github.com/MaLeLabTs/RegexGenerator](https://github.com/MaLeLabTs/RegexGenerator)

------
brudgers
Demo: [http://regex.inginf.units.it/](http://regex.inginf.units.it/)

