
Some problems of URLs - fanf2
https://noncombatant.org/2017/11/07/problems-of-urls/
======
userbinator
_But you can get a glimpse of a better naming future by trying out Safari._

 _First, we can remove parts of the URL we don’t need or which exacerbate our
problems._

I'll be the first to comment and say this: No, no, _NO!!!_

Hiding information from users is _never_ a good thing! All it does is make
then even _less_ aware or likely to learn, and that only perpetuates the
vicious cycle of ignorance and vulnerability to being tricked.

Incidentally, I was recently forced to use Safari briefly, and absolutely
hated the opaqueness --- imagine many tabs all with the same only-hostname in
the address bar --- the constant feeling of "where am I, really?" made it very
unpleasant and disorienting.

There is plenty of good discussion on this topic here several years ago:
[https://news.ycombinator.com/item?id=7677898](https://news.ycombinator.com/item?id=7677898)

~~~
JoshMnem
As far as I know, the only browser that shows real URLs is Firefox, and you
have to visit about:config and set browser.urlbar.trimURLs to false in order
to see them. Hiding how URLs work is a terrible trend.

Users should be educated, not treated like they are stupid.

~~~
majewsky
Huh? I'm seeing full URLs with browser.urlbar.trimURLs=true here (FF 56).

~~~
JoshMnem
I'm not sure what is happening there. When my FF 56 (Ubuntu 16.04) is has it
set to true I see: example.com

When false, I see [http://example.com/](http://example.com/)

~~~
majewsky
Ah, so this only refers to the trimming of "[http://"](http://")? Maybe I
haven't noticed because more are more stuff goes over HTTPS.

~~~
JoshMnem
That's another problem with URL trimming by browsers -- users sometimes see
the protocol and sometimes don't. It doesn't make much sense.

------
no_protocol
It was initially difficult to understand the motivation because the first half
of the post failed to establish "the problem" clearly. I hard to read the
entire thing before I had some idea who this was written for and regarding.

The second half clarified most of this but I think the article could do with
some revision.

Under any URL-ish scheme, including any simplifications suggested here,
nontechnical users will be unlikely to do anything except clicking and
copy/pasting. So it really doesn't matter at all to them what the actual text
string's structure is. It seems like maybe the article is trying to suggest
"advanced" users would benefit from a better system.

Notice that in the screenshots posted, each browser varies somewhat in how it
displays the current location. Regardless of the syntax used in a URL-ish
string, the browsers can certainly improve the display of this important
information.

What about something like this:

[https://i.imgur.com/PTs55Tl.png](https://i.imgur.com/PTs55Tl.png)

Where you only show "nonstandard" portions if they are present, and they can
be given special coloring to help alert the user.

When the user interacts with that portion of the UI, they can paste in a URL
like normal, just when it is displayed, the parsed version can be shown
instead to help them see clearly what they are looking at.

~~~
djacobs
I agree, the substance was there, but the story could use some editing and
organization. If it were me, it'd look something like:

    
    
      1) How people think URLs work, the good parts that people see every day
      2) Abstract problems with URLs, followed by concrete examples, maybe links to prominent related bugs
      3) Suggested fixes for the problems, including what browers are doing at a UX level
      4) Honest look at tradeoffs involved for end users
      5) Remaining complications and unanswered questions

------
JoshMnem
Not showing complete URLs and de-emphasizing URLs are not good ideas. Safari's
way of doing things is terrible. Chrome hides the HTTP and chops off trailing
slashes. Firefox puts the <title> text before the URLs in the completion
dropdown now and it's easier for malicious sites to trick users that way.

The URL is not [https://www.tesla.com](https://www.tesla.com) it's
[https://www.tesla.com/](https://www.tesla.com/).

Instead of treating users like they are stupid, educate them. Show them the
entire URL and teach them how the WWW works. People are going to encounter
URLs anyway (when pasted), so trying to hide their structure is just going to
keep making things worse.

~~~
slaymaker1907
What's wrong with the first one? It seems perfectly legimate since you can use
them interchangeably.

~~~
JoshMnem
The first one is not the real URL that you're visiting. Go to Firefox and turn
off URL trimming or open the browser console and type
window.location.pathname. The server will add the trailing slash, but the
browser will pretend that it isn't there.

Or go to [http://example.com#123](http://example.com#123) or
[http://example.com?123](http://example.com?123) and see where you end up.

It makes things more confusing, by hiding what is really going on.

It leads to things like otherwise technically-savvy people leaving off
trailing slashes on longer pathnames, which can cause other problems. For
example, [http://example.com/some-path](http://example.com/some-path) can be
much different than [http://example.com/some-path/](http://example.com/some-
path/)

Some browsers were also hiding trailing slashes on longer pathnames, which
seems to have stopped (possibly due to breaking sites).

By hiding the real URLs, even programmers get confused, not just the end
users.

------
lgierth
Very interesting concerns about URLs! I've looked into better ways of
addressing recently too, but coming from different problem spaces:

1\. how to decouple content from location, i.e. location-addressing vs.
content-addressing 2\. how to accept more network connection constructions
than what URLs permit (proxying, domain fronting, tunnels, etc.)

> We could also imagine a new URL syntax, with fewer and less ambiguous
> syntactic meta-characters.

That's part of what we're attempting for network addressing with the multiaddr
format:
[https://github.com/multiformats/multiaddr](https://github.com/multiformats/multiaddr)
\- If the author of the article is reading here, I'd love to hear your
thoughts!

We're going with encapsulation and a posix filesystem compatible plan9-ish
syntax and approach there. multiaddr describes the semantics of networking
namespaces, and then various content systems bring their own namespaces. IPFS
brings the /ipfs and /ipns namespaces for its content.

------
lsc36
Can't agree more that URL syntax is overcomplicated. Even existing URL parsing
libraries behave inconsistently and lead to security vulnerabilities [0].

[0] [https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-
Ne...](https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-
SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf)

~~~
jklinger410
Without commentary these slides are entirely unhelpful. Someone would have to
agree with you already and have a head start on the purpose this presentation
in order to understand what is going on here.

Care to elaborate?

~~~
arkadiyt
Blackhat/Defcon videos aren't up yet but Orange Tsai gave the same talk at
HITBGSEC and that video is up:

[https://www.youtube.com/watch?v=D1S-G8rJrEk](https://www.youtube.com/watch?v=D1S-G8rJrEk)

He also has a blog post about it:

[http://blog.orange.tw/2017/07/how-i-
chained-4-vulnerabilitie...](http://blog.orange.tw/2017/07/how-i-
chained-4-vulnerabilities-on.html)

The premise is that URL parsing is complex and libraries get it wrong. This
problem is pervasive and leads to server side request forgery vulnerabilities,
which Orange was able to escalate to remote code execution on Github.

------
xelxebar
Oh wow. I didn't realize that some browsers accept straight hex or decimal
representations of IP addresses.

It _does_ seem a bit silly to have two orthogonal directory structures
smooshed together in a single format, especially with different nesting
directions. In a way, URIs are a sort of serialization of the application
layer packet headers.

I'm curious what a redesigned "address" could look like when optimized for
user-friendliness.

~~~
parenthephobia
They shouldn't exactly be optimized for friendliness. In a sense, that's the
problem we have now. They should be optimized for security, but right now
they're optimized for developers and network administrators.

It'd be interesting to see research on how many laypersons understand which
way around the "chain of authority" goes for domain names. I can easily
imagine somebody not Internet savvy thinking that "facebook.hackable.org" was
a legitimate Facebook domain. (And it doesn't help that _some_ organizations
spread their site over many domains that - even to a skilled user - are
indistinguishable from phishing domains.)

It is unclear how a domain like that could be optimized to ensure a novice
user understands that what they're looking at isn't Facebook, even though the
domain starts with "facebook" and the page they're looking at looks exactly
like Facebook's login page.

One general, slightly off-topic, notion: there should be a protocol whereby a
password manager can ask facebook.com what sites are legitimately going to ask
for your Facebook credentials - not entirely unlike SPF for passwords. Then
even if you search for Facebook in your password manager, it should refuse to
automatically provide the credentials to the site.

(More off-topic aside: Password managers should be _properly_ integrated into
all browsers. Knowing your passwords should be considered unusual, and
actually choosing them yourself downright stupid.).

------
slaymaker1907
Why not allow spaces in the URL? Much easier to read than %20 and file systems
have been ok with spaces for decades.

~~~
paulryanrogers
Probably because spaces have long been an important delimiter in queries and
shells.

------
feelin_googley
Perhaps there are issues I have not considered, but I always appreciated the
fact that original netcat accepts "serialized" IP addresses, as the author
calls them. For example, 127.1 as shorthand for 127.0.0.1.

~~~
askvictor
Would 127.3.1 become 127.0.3.1 or 127.3.0.1?

~~~
feelin_googley
Working examples:

23.185.0.4 (www.usenix.org) can be abbreviated as 23.185.4

192.0.79.32 (hackaday.com) cannot be abbreviated as 192.79.32

