
Parsing JSON with a single regex - kamaal
http://blogs.perl.org/users/brian_d_foy/2013/10/parsing-json-with-a-single-regex.html
======
ot
To whomever is thinking "but JSON is not regular!", note that that regex is
not a regular expression (Perl has a very liberal definition of regexes).

Perl regexes can be recursive: the operator $^R means "match here the whole
regex itself", making the matching algorithm a full recursive descent parser.
I suspect that parsing speed wouldn't be too bad because JSON is LL(1), so the
parser shouldn't backtrack.

I don't think that the author is suggesting to use this in any useful code,
he's just showing a cool hack. It kind of resonates well with the Perl
aesthetics of short, "clever" code with as little as possible dependencies
(and a fascination for regexes).

~~~
fulafel
The original Randal Schwartz post
([http://www.perlmonks.org/?node_id=995856](http://www.perlmonks.org/?node_id=995856))
says it was infact code for a customer app.

------
lazyjones
Running code in the regex with (?{ ... code ... }) is cheating, because you
can just put a whole Perl program in there. It's also one of the many badly
implemented features of Perl that could be more useful if the implementation
was any good. Using it will cause subtle bugs when you use regexps in the code
called from there (even if you just call a sub/function), because the context
(of regexp processing) isn't saved.

~~~
kamaal
>>Running code in the regex with (?{ ... code ... }) is cheating, because you
can just put a whole Perl program in there.

Would you like to call eval, recursion, macros, closures and many such things
cheating?

Sorry but doing things without understanding or knowing the true consequences
of what you are actually writing, _always_ causes bugs. And that's true with
any programming language.

Perl has many features which are designed to make the language powerful
enough, to express solutions to very complex problems that other wise demands
piles and piles of code in other languages. That's the whole point of the Perl
language as a whole, you should take a really hard problem on a Friday night
which you would find very difficult to code up in a weekend and then go to a
finished script by Sunday evening.

~~~
lazyjones
> Would you like to call eval, recursion, macros, closures and many such
> things cheating?

Yes, in the context of "I did this with just a regexp".

> _Sorry but doing things without understanding or knowing the true
> consequences of what you are actually writing, always causes bugs. And that
> 's true with any programming language._

That's a cheap excuse for implementing this in a way that is just plain wrong,
or too limited to be useful, when it could have been implemented correctly (at
least if enough capable programmers still understood Perl's source code to
work on it).

> _Perl has many features which are designed to make the language powerful
> enough_

I never disputed that. I used Perl almost exclusively in the past 14 years or
so, so I know what I'm talking about. These broken language features are the
biggest PITA about the language. They are useful for some very specific tasks,
but useless for others where they could be used if they had been implemented
correctly. Adding a feature with more gotchas than useful use cases is _not_
the way to making a language more useful and enjoyale.

------
pak
A great example of why you shouldn't, in fact, ever try doing this for JSON
parsing within something important.

~~~
yeukhon
or this: [http://stackoverflow.com/questions/827557/how-do-you-
validat...](http://stackoverflow.com/questions/827557/how-do-you-validate-a-
url-with-a-regular-expression-in-python)

~~~
gjm11
Or for that matter this: [http://stackoverflow.com/questions/1732348/regex-
match-open-...](http://stackoverflow.com/questions/1732348/regex-match-open-
tags-except-xhtml-self-contained-tags/1732454#1732454)

(which is what I was expecting yeukhon's link to go to; it comes up in every
discussion of abusing regular expressions to parse very non-regular languages,
for good reason).

~~~
yeukhon
What did I just read lol. I don't want to spoil my own fun... those fonts...

------
kamaal
Here is the original thread on which this talk is based upon :
[http://www.perlmonks.org/?node_id=995856](http://www.perlmonks.org/?node_id=995856)

------
draegtun
Here's the video for this presentation: [http://perltv.org/v/parsing-json-
with-a-regex](http://perltv.org/v/parsing-json-with-a-regex)

------
evandrix
save disabled by author @ [http://www.slideshare.net/brian_d_foy/json-
regex](http://www.slideshare.net/brian_d_foy/json-regex)

------
meritt
.. but... JSON isn't regular. And neither is HTML :(

~~~
gpvos
Did you look at the regex? Perl regexes are also not regular; they're
basically grammars.

~~~
yeukhon
I might be off on this one, but regular expressions are and can be defined by
regular grammar. No?

~~~
Xylakant
If you use the strict term, yes. If you're looking at what Perl calls Regexp,
then no. Perls regular expressions are a superset that can parse (at least
some) context free languages.

~~~
yeukhon
What about Python's _re_? How do you determine whether an implementation of
"regex" is context free or regular expression as we know it? It has been a
long time since I have to deal with CFG and formal CS. Thanks.

~~~
Xylakant
I don't know about python, but a lot of languages support PCRE (perl
compatible regexp) or at least a large subset of it. PCRE is sometimes less
than what current Perl supports but in any case a superset of "traditional"
regular expressions. For example there's support for recursive subpatterns [1]
which clearly places it into context free territory[2]. Other regular
expression engines support similar features, so I'd say that most real world
implementations are not regular expressions in the formal sense.

[1] PHP manual for PCRE recursive patterns
[http://www.php.net/manual/en/regexp.reference.recursive.php](http://www.php.net/manual/en/regexp.reference.recursive.php)

[2] Recursive subpatterns allow matching a language of the form (a * n)b(a *
n) which is not regular.

~~~
kamaal
Actually PCRE is a wrong term to describe Perl like regular expressions in
other languages. Because, when you talk of PCRE you are ideally talking of
compatibility with Perl regexps circa 1995.

Current day Perl regular expressions are far more powerful, and totally a very
different beast.

~~~
Xylakant
PCRE is what the library is called and it claims to offer full support for
what perl5 does, both syntax and feature-wise:
[http://www.pcre.org/](http://www.pcre.org/)

------
DanWaterworth
Somebody show this guy a parser combinator library.

------
riffraff
notice you can _match_ json with systems based on oniguruma/onigmo (ruby,
sublime text, textmate, julia etc), and the syntax is quite readable.

Sadly, I couldn't find a way to access the nested matches, so it's quite
pointless :)

------
Gigablah
Right, now parse HTML :D

~~~
sateesh
Oh that has already been done too, using perl :)
[http://stackoverflow.com/questions/4231382/regular-
expressio...](http://stackoverflow.com/questions/4231382/regular-expression-
pattern-not-matching-anywhere-in-string) (Check the answer from tchrist)

------
andyidsinga
..because you can!!

------
duedl0r
very ugly presentation....

