Hacker News new | past | comments | ask | show | jobs | submit login
ReX.js – Your RegEx companion (areknawo.github.io)
58 points by areknawo 4 months ago | hide | past | web | favorite | 54 comments



I guess unpopular opinion: I think this library is excellent especially for making readable regex match/replace code. Even as a frontend dev with 5+ years of experience, it's difficult to read regex by skimming it (perhaps that's just b/c I don't spend much time with it at all outside of matching user inputs). Hopefully, if this library catches on, it may make JS code using regex slightly more readable. Regardless, props to the author for making this library.


Let me save you 8 kilobytes of dependency and a new API to learn (and debug, when you start wondering why your ReX built regex isn't working how you want). Apparently the main "feature" of ReX is you get to annotate the components of a regex string.

    // match something that looks like a date in the form 02-14-1924
    let x = "";
    x += '\d{2}'; // match the month 
    x += '\-';
    x += '\d{2}'; // match the day
    x += '\-';
    x += '\d{4}'; // match the year
Even this is horribly overcomplicated example would just be `\d{2}\-\d{2}\-\d{4}`. The Rex library provides no useful abstractions to simplify vanilla javascript and regex.

If a (dynamically built) regex misbehaves, I'm only inspecting the lines of code that compose the string, not an entire library to understand what went wrong.


No need to escape the dashes btw:

    $ node
    > '02-14-1924'.match(/\d{2}-\d{2}-\d{4}/)
    [ '02-14-1924', index: 0, input: '02-14-1924', groups: undefined ]


That's poor usage of regex. Try this:

    $ node
    > let XRegExp = require('xregexp');
    > XRegExp.exec('02-14-1924', XRegExp(`
        (?<day>  \\d{2} ) -
        (?<mon>  \\d{2} ) -
        (?<year> \\d{4} )
    `, 'x'))
    [ '02-14-19',
      '02',
      '14',
      '1924',
      index: 0,
      input: '02-14-1924',
      groups: undefined,
      day: '02',
      mon: '14',
      year: '1924' ]


You may want to check out my lib, compose-regexp, which serves a similar purpose with a smaller API and weight (< 1KB mingzipped).

Named captures can be created with the following helper (with an example that parses strings):

    import {avoid, capture, either, sequence, suffix} from 'compose-regexp'

    function makeNamedCaptures(indices) {
      let i = 1
      return function namedCapture(name, exp) {
        if (indices.hasOwnProperty(name)) {
          throw new Error("\"" + name + "\" is already used as a capture name")
        }
        indices[name] = i ++
        return capture(exp)
      }
    }

    const captureIndices = {}, namedCapture = makeNamedCaptures(captureIndices)

    const zeroPlus = suffix('*')
    const any = /[\s\S]/
    const anyNonReturn = /./

    const stringMatcher = sequence(
      namedCapture("quote", /['"]/),
      zeroPlus(
        avoid(ref(captureIndices.quote)),
        either(
          sequence('\\', any),
          anyNonReturn
        )
      ),
      ref(captureIndices.quote)
    )


    // stringMatcher is actually a plain regexp:
    console.log(`"foo'\\"'bar"baz`.match(stringMatcher))
    // --> [ '"foo\'\\"\'bar"', '"', index: 0, input: '"foo\'\\"\'bar"baz' ]
You can off course parse strings without using back references, the goal here is to provide an example with a simple grammar that you can follow along.

For complex grammar with sub-parts, you can wrap the sb-patterns in functions and inject the `namedCapture` as a parameter.

https://flems.io/#0=N4IgZglgNgpgziAXAbVAOwIYFsZJAOgAsAXLKEAG...


> Extreamly small (~8kB gzipped)

For a regex library, that strikes me as not only not small, but actually extremely big.

If you're building a project that really warrants 8+k just for an external dependency to abstract the regex logic, you're either overusing regex, or regex is such a central and fundamental part of your project that rolling your own non-generalised abstractions is going to be well worth your while.


So... this is regular expressions for people who want to spend time learning this library instead of just learning regular expressions directly?


Yeah I don't understand what value this library provides. It's just additional indirection, not a proper abstraction any different than the regex APIs it's calling under the hood.


Some people, when confronted with a problem, think "I know, I'll use ReX.js." Now they have three problems.


I know you're meme-ing, but it's a great way to put it tbh. This library struck a nerve with me today because it's a rare example of a library that's actually harmful to its intended beginner audience. The author invented a verbose DSL for what's already a DSL!

For the record, I'd never fault anyone for writing any code they want (unless maliciously ofc). "What would a method-based interface to regular expressions look like?" is a perfectly reasonable question! But the author is doing a disservice to anyone he gets to use this. Just use ''.match().


Yeah, is like why use python when assembler is just as obvious.

----

Regex syntax sucks, or more exactly, it look like is the bytecode for programming language that is left as exercise to the reader.


> Regex syntax sucks

Learn it, use it, accept it. Or are you trolling?

It is supported by most languages with special syntax because string manipulation is still important. The native support matters due to optimisations present that are peculiar to the runtime. Most any developer can understand your regular expressions, or they know who to ask. You need to know it if you expect to use shell tools.

It's just like SQL: sure SQL is missing modern features and modern syntax, but at least once you've learnt it you can use it for debugging, performance, administrative tasks, reporting systems, dev tools, and your future projects that use another language.


Why assume trolling? The sentiment that regex sucks is one shared with a very large population. At least to me, it's this kind of terrible "write-once" maintenance hell language that really ought to be wiped off the face of the planet.

Its only redeeming feature is that it's fast (assuming you don't use the wrong features - lookaheads can cripple performance). Looking for alternative abstractions is very worthwhile. I'm not suggesting throwing away the existing regex engines, just abstracting away the ugliness. Whether this solution is the right one is debatable.

You can draw a comparison with SQL. Lots of projects nowadays hide SQL behind an ORM, only using SQL when a very complex query needs to be written. I'd expect something similar for Regex would help maintainability of lots of projects.


This whole library is troll and people are trolling with it.

This:

    new Matcher().find('Reg').anyWhitespace().capture((expr) => { expr.find('Exp') }).test('Reg Exp')
vs:

    'Reg Exp'.match('Reg\s(Exp)')
> Its only redeeming feature is that it's fast (assuming you don't use the wrong features - lookaheads can cripple performance). Looking for alternative abstractions is very worthwhile.

First, it's not particular features of regex that are "slow", but the whole notion of backtracking that's central to how they work! Further, regexes are just a translation of a fundamental mathematical model. In university they even had us translate between finite state machines and regular expressions on tests. In short, any "alternate abstraction" you're looking for is going to be literally the same thing, just wordier, because you're talking about something fundamental in computer science.

In case you're interested, here's a finite state machine to regex translator / simulator linked from the wikipedia page on finite state machines: http://ivanzuzak.info/noam/webapps/fsm_simulator/


That’s literally why I said we shouldn’t trash regexes. It’s not thenidea that sucks. It’s the syntax. The abstraction of the state machine. That’s what sucks. Shorter is not better. Something wordier that’s easier to understand would help alleviate the maintenance burden.


> Yeah, is like why use python when assembler is just as obvious.

This is a truly awful straw-man.


It is? Regex have a arcane "syntax". Is good, from the point of view of a APL user, but as part of other languages is not.

Is totally outside the experience among other DSL. For example, you are on python, your DSLs are pythonic, except this one, that look totally alien.


I think at this stage, the idea is to get feedback on how to improve it, or maybe what to change completely, not saying "recommend this to all newbies to learn instead of regular expressions". I'm just guessing because it seems to be 4 hours old.

Maybe for the purpose of a more interesting discussion, let's just assume everybody knows (how to look up the docs for) regular expressions, but sometimes they're still lazy or just feeling playful, and want to do the same stuff with a different syntax... and this is just one first prototype of how maybe to do that. No offense intended, no claim that this will supersede learning regular expressions implied.


The code written with this is probably going to be a lot easier for people to read that a raw regex. Never forget that code is read many time more often than it is written.


Potentially cool if you want to generate regular expressions dynamically at runtime


So, string concatination?


Indeed! In fact, the library author cites this as one of the "main advantages" of their library (under "Getting Started"):

> As you can see one of main advantages of using this library is ability to document every line of code without any hassle.

In other words, the 'x' flag. Though it turns out JavaScript doesn't provide the x flag, but it's the same thing as string concatenating a regex across different lines.

The fact that the author includes the warning "Never-ever add a semicolon if you want to continue expression" on a method chain shows the level of sophistication this library is intended for.


Actually, I think this library could provide a security benefit since generating code by string concatenation is generally dagnerous. Analogous to generating HTML or SQL -- safer to use a library than to construct it yourself.


OT: something you are doing on that page makes it not work with Edge. On my SP4 in landscape mode, I can scroll down to past the "Installation" section and just into "Basic Usage", and then the visible page goes blank white.

In portrait mode, it gets down to just showing the "const expr = new Matcher()" line in the first code sample before going blank white.

If I've got the zoom set to 125%, then portrait is blank white everywhere below the "Github" and "Get Started" buttons.

Reducing zoom to 50% in portrait allows all the page content to be scrolled into view, although continuing to attempt to scroll blanks it out.

This isn't some quirk of my SP4. I tried it with the same results on Microsoft's Win 10 MSEdge VM running under VMWare, available here: https://developer.microsoft.com/en-us/microsoft-edge/tools/v...

Edit: it seems to happen when the class for <body> is changed from "ready-transition ready ready-spinner ready-fix" to "ready-transition ready ready-spinner ready-fix sticky", which seems to happen when it is changing the thingy on the left from scrolling with the page to sticking in place.

When on the blank white page, removing "sticky" from the <body> class via the inspector makes the missing material materialize.


It is not OT. The site doesn't work in Edge, which indicate to me that they are not testing in all browsers. This makes me doubt if they are doing testing at all and ReX.js could be a low quality solution.



Thanks for all the feedback. As the matter of fact: - IE 10 should now work properly - Library size is 4 KB (min-gzip) controlled with size-limit - ReX.js is indeed intended for creating complex regexps mainly for those who already now regexps constructs but want to write more readable code. - Short ReX.js introduction is available on https://dev.to/areknawo/the-more-proper-introduction-to-rexj...



There is a saying (which I can't remember the source) that I always thought described regex perfectly in my experience.

"Beginner programmers don't understand regex, good programmers understand regex, the best programmers don't use regex."

I've found regex is a poor choice in production systems because it is very hard to maintain (which this library tries to address) and very hard to test (which this library does not address).

However, regex is perfect for that complicated one-time find and replace.


Like anything in programming, regular expressions are a tool. Yes, it might be unusually easy to shoot yourself in the foot with this particular tool, but that doesn't make them not valuable at all. I'd argue that the best programmers use them judiciously and cautiously, as they would any fancy language feature.

Case in point: logfile analysis. Often times you might have a log file with many different kinds of messages formatted in many different ways, only some of which you care about. A quick regular expression can pull out the messages of interest, and extract out the bits you care about. Sure, in an ideal world you'd ingest the logging data in some structured format, but the real world simply does not hand you the luxury of controlling all your upstream data sources all the time.


I think logfile analysis is a perfect example. Regex could be the perfect tool for one-shot analysis of logs. But if you are writing an actual commercial application to do logfile analysis, regex is probably not the right tool.


What would you use in the actual commercial applications?

I have seen people try to replace regexes with sequences of splits and accesing hardcoded field numbers. This can inmprove readability in the simple cases, but makes code much more fragile in the complex cases.

Another approach I have seen is a full featured lexer. Looks clean and very fadt, but in my experience is an overkill for most tasks


Emacs has a great rx macro that does something like that and is widely used in elisp projects.

https://github.com/typester/emacs/blob/master/lisp/emacs-lis...

Though then again the nice (best?) thing about macros is it gets inlined as a string at compile time anyways.



I was prepared to be underwhelmed, but this looks like a reasonably good wrapper lib for regular expressions if you couldn't be bothered to learn RegExes. On the other hand, regular expressions really aren't that hard to learn.

Quick typo on the homepage: "extreamly small" should be "extremely small"


A far more sane approach, if you want to have this kind of API, is to use a PARSEC inspired parser combinator library instead, such as, for instance:

https://github.com/jneen/parsimmon


Neat concept, I wonder it'll scale to complex regexs


Great sticky `<aside>` on the home page.


I had the same thought! It's so smooth.


If you have a single CS bone in your body, you should be able to grok at least the basics of regex.

If you don't, then go back to designing* I guess, and ask for a programmers' help

* I only say this because I recently saw a tweet from a frontend designer person who said she thought regex was the worst technology ever invented and she absolutely hated it.

Disclaimer: For whatever reason, I'm a big fan. Concisely describe patterns in text and match on them? Plus the puzzle-fun of figuring out how to match on what you're looking for? Yes, please. (And yet, on certain programmer teams, I've had to scale back my regex usage because of you folks who refuse to just learn it, sigh)


> If you have a single CS bone in your body, you should be able to grok at least the basics of regex.

Sure, but I don't think the existence of this lib implies the author disputes that, and I can only imagine they're pretty familiar with regex themselves. I think they're trying to fill a legitimate need in frontend dev, which is: if you're only touching regex a few times a year, it's pretty intimidating, and not at all maintainable. SQL can also be difficult to grok if you don't write it often, and ORM interfaces make it easier.

Would the world be a better place if everybody knew regex perfectly? Yeah ok it would, but investing a bunch of time learning something you will only occasionally use is a poor investment IMO.

> If you don't, then go back to designing* I guess, and ask for a programmers' help

This seems to be needlessly condescending? You can be an excellent programmer (particularly in frontend) for years while avoiding regex. It's not a make-or-break thing.


I agree with you. They can certainly be abused, like anything else ("now you have two problems!"). But crafting a beautiful regex when it's exactly the right tool for the job is extremely satisfying. I think if more people took the time to understand them and practice writing them (when appropriate), they wouldn't have such a stigma.


    I think if more people took the time to understand
    them and practice writing them (when appropriate),
    they wouldn't have such a stigma.
I think this is part of the movement to a more "ide/web centric" development model. The chances to interaction with regexes on CLI are endless. Grep, git-grep, vim, sed, etc. Even bash has some limited regex integration. As a person who still does most their development in the cli with vim regexes are a massive part of my work flow I actively use them several times per day.

When people move away from this model there are a lot opportunities to practice regexes. I've noticed this my own workflow when working Java-centric languages (who's IDE's often have poor regex integration).


I think one reason why most people have a hard time reading regex is because they don't use any indentation or linebreaks. Honestly, if a buddy came to you and asked you to help him debug a javascript method and all 15 statements were on the same line, would you offer to help him, or tell him to fix his shit first so you can read it?

What if it was all on one line and his variable names were all "v1", "v2", etc. Would you help him then? fuck no. And yet, this is standard operating procedure with regex, except you don't even get "v1", "v2" because nothing is labeled at all. v1/v2/... would be an improvement!

This is how most people write a simple date regex:

\d{1,2}/\d{1,2}/(\d{4}|\d{2})

And mind you, this is a very simple scenario. Here is how you would write it if you treated it like actual code:

(?<month>\d{1,2})

/

(?<day>\d{1,2})

/

(?<year>\d{4}|\d{2})

First off, you can know what my intent is when I'm capturing each group. Maybe this code gets used by a european where the month and day switch places. They can figure out how to fix it in like two seconds. Secondly, the forward slashes are not lost in a sea of characters anymore because we use whitespace like a civilized developer, not a regex savage.


I also use this form but it’s even more complex to some people than the one-liners, which I guess doesn’t help the impediment-factor


The regular formalism is very neat and I like to use it, but the RegExp syntax doesn't do it justice.

They are pretty much write-only, and there's no way to compose two regexps safely (new `RegExp(a.source, b.source)` breaks for `/a|A/, /b/, for example).

Rex is a good attempt at solving this. I wrote a similar lib a while ago (more functional and tighter in design, see my other comment on this page).


>If you don't, then go back to designing* I guess, and ask for a programmers' help

Was this actually necessary to say? It's not only insulting to people who don't need/care about/currently know regex, while simultaneously insulting of designers' intelligence.


Probably not. As I said I recently read a tweet from a designer so it was fresh in my mind.


Why is this level of brevity desirable, and only with regular expressions?

Are you writing APL? Why not?


I'm a CS guy through and through. Regular languages, cfl, ..... Np complete, what was the books name again?

But I gotta say, regex isn't the best way to show things. Hard to verify, hard to maintain and what not. Computer sciency, sure. But not well "designed".

Now I have no love lost for people who talk about stuff without understanding the fundamentals or the basics even. So I don't doubt that the person was "whatever". But I do feel that regexes need an overhaul

Ps: I'm a fan too when I have to do quick stuff in shell. But not so much when I save it to a file to be read by me and multiple people weeks later. Regex is one thing that is way easier to write than read.


Which is why I use expanded regex with each section commented.


What is expanded regex?


They're likely referring to the \x modifier, also called extended regular expressions:

https://stackoverflow.com/questions/24642616/what-are-extend...


I had not known about this- makes things so much easier to maintian




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: