
Go-Restructure: Sane regular expressions with struct fields - jdoliner
https://github.com/alexflint/go-restructure
======
muraiki
As someone who likes both Go and Perl 6, this reminds me a lot of Perl 6
grammars; I look forward to using this in Go. Here's a rough equivalent in P6,
although I don't think the Go one or mine are completely legit in terms of
internationalization and whatnot:

    
    
      grammar email-address {
          token TOP { ^ <user> '@' <hostname> $ }
          token user { <[ \w \d . _ % + - ]>+ }
          token hostname { <domain> '.' <TLD> }
          token domain { \w+ }
          token TLD { \w+ }
      }
    
      if (my $email = email-address.parse('joe@猫.com')) {
          say $email<user>;
          say $email<hostname><domain>;
          say $email<hostname><TLD>;
      }
    

It's also possible to turn the captured parts into their own objects ($email
has a type of Match; it's not just a hash).

------
inglor
Here's a 1-1 port in JavaScript [https://github.com/benjamingr/js-
restructure](https://github.com/benjamingr/js-restructure)

    
    
        function matcher(obj) { "use strict";
          let props = Object.getOwnPropertyNames(obj);
          const re = new RegExp(props.reduce((p, c) => p + (c.startsWith("_") ? obj[c] : `(${obj[c]})`), ""));
          props = props.filter(x => !x.startsWith("_"));
          return function(pattern) { 
          let o = {};
          const res = re.exec(pattern);
          for(let i = 0; i < res.length; i++) o[props[i]] = res[i+1];
            return o;
          };
        }
    

And example usage:

```js matcher({ _ : "^", user : "\\\w+", _2 : "@", host : "[^@]+", _3 : "$"
})("user@ycombinator.com") ```

~~~
ajanuary
JavaScript doesn't guarantee the order of keys.

[edit] Also, the API makes less sense because JavaScript isn't statically
typed so you can make up the result object on the fly.

    
    
        match('^(?<user>\w+)@(?<host>[^@]+)$', 'joe@example.com') => {'user': 'joe', 'host': 'example.com'}

~~~
inglor
Yes it does - it does for getOwnPropertyNames but not for Object.keys.

~~~
ajanuary
Seems I might be outdated, and ES6 does define an order.

------
jtruk
Neat. It could adopt the key:"value" convention of other struct tag libraries
(e.g. json:"name" xml:"name") to make it more composable.

~~~
Spiritus
Agreed. I think "re" would be a good and succinct key. Or "regex" if you wanna
go more explicit.

~~~
endymi0n
[https://github.com/alexflint/go-
restructure/issues/2](https://github.com/alexflint/go-restructure/issues/2)

------
andmarios
Very nice solution! I try to avoid using 3rd party libraries but this I would
use without hesitation.

Some benchmarks would be interesting.

~~~
elmin
Why do you avoid third-party libraries?

~~~
jernfrost
I don't know his reasons, but I also try to avoid third party because 1) Often
you can do the same with the standard library with very little extra code. 2)
Using lots of third party makes your code less maintainable. A lot of little
libraries are not well maintained. 3) You are at the mercy of when the third
party library updates to fit say latest language version or standard
libraries. 4) Third party libraries can pull in a lot of unwanted complexity.
You might only use a tiny tiny part of it. 5) By relying on third party you
force other developers to know that library in addition to the standard
library.

There is a certain amount of trigger happiness around when it comes to third
party libraries. People tend to use third party libs WAY TOO much. Often they
don't even check what is in the standard library.

I've thrown out over 50% of the third party libraries used in iOS projects
I've taken over while simplifying and reducing the amount of code. The reason
why using the third party libs grew the code was that third party libraries
are usually quite generic, which requires more code to adapt to them. If your
needs are quite simple, custom tailored code can take less space than
utilising a third party library.

------
jerf
I recommend that the first example be changed to use struct{} as well. The
"strings" are still present in the struct footprint if you leave them there:
[http://play.golang.org/p/ZG1ULgzSwZ](http://play.golang.org/p/ZG1ULgzSwZ) And
people luuuuuv to pick up the one example where you did something a bit wrong
and copy paste like mad....

I'd really rather see something that generates marshaling code based on a
regex or something, though. This loads an awful lot of meaning onto things
that most code assumes doesn't have meaning, like struct field order. It looks
really clever in isolation but if you start playing multiple tricks like this
in one code base they'll start conflicting.

For example, note how you now can't use encoding/json on these objects as
currently written. Now, that's fixable... well... it's _probably_ fixable.
AFAIK the struct tagging system isn't actually specified, so, for instance, if
you try to put a tag

    
    
        regex:"[^\"]+",json:"quotefree"
    

there's no guarantee how anything will parse that. Should that backslash be
there to get the "character class of everything but double-quote"? Will the
regex code get the backslash? Will encoding/json see a field 'regex:"[^\"' and
a field ']+",json:"quotefree"' and then fail because there's only a field
named ']+",json' and not one named 'json'? Will you encounter one of those
situations where the backslash is simultaneously required and not required?
Beats me. Plus I don't guarantee stability on whatever the answer is between
versions, nor do I guarantee it if you reverse the order of the two things,
nor do I guarantee it if you try to add a third struct tag.

I would suggest taking this code and converting it into a function

    
    
        UnmarshalRegex(*regexp.Regexp, val interface{}) error
    

with no use of struct tags, because you get 80-125% of the value, while
dodging all the previous problems. I'd go ahead and say the user of the code
is responsible for proper grouping, just describe how it needs to be, and
check it at runtime. Especially if you use named capture groups, further
removing issues of order.

~~~
arnehormann
It is specified - a struct tag is just a string literal after a struct field.
See
[https://golang.org/ref/spec#String_literals](https://golang.org/ref/spec#String_literals)

If you use backticks, the tag must not contain another backtick. On anything
but a struct tag, you could get by by concatenating literals, but that's not
allowed for struct tags:
[https://golang.org/ref/spec#Tag](https://golang.org/ref/spec#Tag) You could
use double quotes instead of backticks, but that leads to a dark corner of
escaping hell and an utterly unreadable mess.

Thankfully, the language strongly nudges you to very, very simplistic and
light uses of struct tags.

~~~
jerf
"It is specified - a struct tag is just a string literal after a struct
field."

I meant the internals of a struct tag, not the grammar. The standard library
implies some structure with things like `json:"name,omit_empty"` but that
structure is not actually specified AFAIK. And I seem to recall finding a
github issue where the core team said they don't intend to specify one,
basically for the reason that they don't want struct tags to be used for
things like this systematically, but I couldn't google it up. If they fully
"specify" struct tags I think they fear massive metadata additions, instead of
little annotations here and there. I'm not _quite_ sure enough of this to
state it without qualification, but I am pretty sure it is accurate.

~~~
cypher543
The reflect package specifies a "convention" which it uses to extract key-
value pairs from a struct tag using the Get method:

[https://golang.org/pkg/reflect/#StructTag](https://golang.org/pkg/reflect/#StructTag)

There's no need to parse the entire tag string yourself and it's safe to
assume other libraries will work with the convention. If they don't, that's
not your fault and it would break with other tags like "json" and "xml"
anyway.

------
fpoling
The example that is used to show the case for the library can be written as
[https://play.golang.org/p/tuhGjqZlu0](https://play.golang.org/p/tuhGjqZlu0) .
This is simpler than the code with the library.

------
jernfrost
This is such a simple but beautiful idea. I love it! As a Swift developer I am
really envious of these tagged fields on Go and how they simplify e.g. JSON
parsing. And now I realise they can simplify regexp parsing as well. Dam!!

~~~
themartorana
Yeah this is really cool. But it's at least in part because the Go community
has grown exponentially over the past year or two. It seems like it's fair to
expect the same will befall the Swift community pretty shortly here.

------
steeve
Very very very elegant approach to the problem. I know I'll be using it.

~~~
jdoliner
Agreed, it minimizes the friction in going from regex to usable data structure
more than any previous solution I've seen.

Really makes me appreciate the value of Go's field tags.

------
iofj
What's wrong with named capture groups ?

[http://www.regular-expressions.info/named.html](http://www.regular-
expressions.info/named.html)

~~~
yalue
The support for named capture groups in Go's regular expressions is severely
lacking, in the sense that it may as well not even exist. There's an API
(SubexpNames) that returns an ordered list of the capture group names (if you
provided them), but it's up to the user to implement some scheme for mapping a
name to the contents of a capture group. At that point, it's probably going to
be easier to just use the capture group indices.

~~~
donatj
I actually use them quite extensively and I don't mind the implementation.

------
brightball
Very clean approach. Also does an excellent job of clearly communicating what
the regex is doing, which is saying a lot for regex's in general.

Great concept and implementation.

------
tptacek
This depends on reflect. What's the speed penalty?

~~~
laumars
I came to the comments section to ask the same question. I've always liked the
idea of reflection but avoided it because of performance scares (and usually I
can solve the problem another way).

Since we're discussing the performance footprint of regular expression
libraries, it's also worth mentioning how slow even just running the base
regexp library can be. For example, the email example could also be written
without regex:

    
    
        indices := strings.Split("joe@example.com", "@")
        fmt.Println("Name:", indices[0])
        fmt.Println("Domain:", indices[1])
    

( [https://play.golang.org/p/ZezcoBjc9v](https://play.golang.org/p/ZezcoBjc9v)
)

Obviously the above code doesn't do any sanity checks - which is where regular
expressions can often make things easier. But the above would run a lot faster
than a regexp pattern match.

------
inglor
I wrote a quick C# port PoC:
[https://gist.github.com/benjamingr/4de21494b3e76088e5f7](https://gist.github.com/benjamingr/4de21494b3e76088e5f7)

Nice idea.

~~~
masklinn
That's not as useful for C#, as you can get a match by name using
Regex.Match.Groups[String].

------
XorNot
This is neat but I don't love what it does to struct definitions. Would a
modified variant better make use of named capture groups?

------
doomrobo
Does this work with numerical fields as well? I only see strings in the
examples.

