
Url Knife - extract-pattern
https://github.com/Andrew-Kang-G/url-knife
======
Hackbraten
Even after looking at the examples, I have no idea what this software does,
what problem it is supposed to solve, for what I would use it, nor what the
input and output are supposed to be, nor any of the examples, which are not
self-explanatory to me.

The live demo doesn’t work on my phone as well. It only shows the jsfiddle
header and menu but no content.

The only thing I’ve understood that this is about parsing and extracting URLs
and possible typos.

For starters, what does the first example try to achieve? What does the input
syntax mean? How do I interpret the output?

~~~
jpalomaki
It takes in text that contains malformed URLs and tries to make sense what
valid URLs they might be.

I think the purpose is to extract URLs from formatted text, for example from
discussion forum posts where they might be mixed up with HTML or other markup
and contain typos (due to user errors or users trying to deliberately make the
URLs undetectable).

Might be also useful if you want to extract URLs from PDF. PDF to text
conversions might cause all kinds of errors to the output.

------
sansnomme
It is disappointing that so many comments here dismiss the tool as useless.
The heuristics provided are priceless and worth days of engineering.

------
extract-pattern
Please leave issues on Github if there is an exception. Normalizer does not
work for intranets as I indicated.

------
johnghanks
Looks interesting but why? Just fail whatever input is given if it's not in an
expected format - why even bother trying so hard to recover?

~~~
kapep
I would guess it could be used for use cases where you can't fail. For example
recovering URLs from erroneous OCR scans, or unstructured text which might
contain errors.

It could probably also be used immediately on user input to suggest possible
corrections for invalid URLs.

