Except it isn’t supposed to log things that aren’t urls being pasted into the br...

Thorrez · on Nov 10, 2021

>being pasted into the browser.

Not just the browser:

>links that I send to people across all the different messenger apps I have installed.

The article does say it might be useful to log where the URL was copied from, and potentially also where it is pasted to:

> I would also expect to know where the URL was copied from. I.e., whatever application is in focus when the clipboard is modified. It probably isn't feasible though to track everywhere it is pasted (maybe it is actually...).

Jensson · on Nov 10, 2021

A naïve url detector isn't hard to write. A good url detector is difficult not because it is hard to code but because it is hard to understand what behaviour a user would expect.

marcellus23 · on Nov 10, 2021

So? In any decent programming environment, checking if a string is a URL is a one liner

dkdbejwi383 · on Nov 10, 2021

Not without false-positives.

E.g., is "foo.bar" a URL? Maybe. But it could also be a filename. How do you know if it's a "real" URL or not?

toyg · on Nov 10, 2021

In The Good Ol' Times, I would have replied "let's check the TLD", but now that list is basically trending to include the entire English dictionary... so I guess the only response these days is "ask DNS". So we've already gone from "pattern-match a string" to "pattern-match then make network calls", which (as anyone who's done any network work knows) also requires managing a bunch of possible/likely error states (offline, timeout, partial response, response format, etc etc). So yeah, nothing is as easy as it looks.

AstralStorm · on Nov 10, 2021

Can't quite do this most of the time due to privacy concerns. Leaking random URL-looking text to the network is a big no.

toyg · on Nov 10, 2021

So now we have to ask for user consent (installation time? first run?) and respond accordingly, adding another piece of UI... but it will only take an hour, right...?

dkdbejwi383 · on Nov 10, 2021

You probably also want a setting to detect slow networks and disable it there in case the user is tethering etc.

quesera · on Nov 10, 2021

Strictly speaking, a URL begins with a scheme followed by a colon.

Schemes can be registered with IANA (or not), and everyone knows the most common half-dozen or so. People often forget "mailto:" and "tel:".

The project brief asks for one thing, but the practical implementation probably requires something else.

This is a good lesson to learn, and this is how two hours becomes two days, becomes two weeks.

tremon · on Nov 10, 2021

It isn't a URL. As per RFC 3986 [0]:

> The term "Uniform Resource Locator" (URL) refers to the subset of URIs that [..] provide a means of locating the resource by describing its primary access mechanism

Since "foo.bar" does not describe an access mechanism, it is not a URL. Yes, you could make the argument that "foo.bar" is a relative-path reference as described in section 4.2, but that is only used to:

> express a URI reference relative to the name space of another hierarchical URI

So "foo.bar" can only be considered a URL in the context of another given URL, and in your example there is none.

[0] https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3

marcellus23 · on Nov 10, 2021

I don't have to worry about that, because I'll pick a language that offers a `URL` object or something similar, and which handles the validation for me.

Additionally, if foo.bar were a valid URL, then I would expect it to appear on the list. I can't read the user's mind as to whether the text should be treated as a URL or not.