But somehow you need to share the link itself, right?
So instead of sharing the link couldn't you just... Share whatever the message or content is?
Edit: what I'm saying is that "activists" (which is an interesting concern btw) will pressure the platform that the link is being shared on because the link IS the content.
Which also means rhat everybody with the link can change it’s content which means you cannot trust this content more than the person who sent it to you.
In this context, you are implying the platform is the browser? Assuming you produce your own content and own the domain for which these URLs are being produced then, yes you are left to trust the clients that actually render the content.
> In this context, you are implying the platform is the browser?
No, the platform is wherever you distribute the URLs to other people. Browsers, for the most part, don't share URLs with eachother without an intermediary server hosting the shared links, either as one or more traditional webpages with links or something like a shared bookmark list.
I agree it's interesting. What follows is not criticism of the author, but criticism of our society.
Prediction: at some point someone will want to censor something and blame the host. The host or hostname will be taken offline, even though it's not storing or distributing the content.
I consciously accepted the risks of my host being blacklisted or taken down when I shared this.
Luckily, the benefit of open source is that anyone can clone the repository and host any number of similar services if they are serious about using them for content distribution.
I don't mind if there are problems with the host since it will mean people are actually making use of the project.
Sharing a snippet of text is much easier than sharing a file.
Telling a normal person to click that link (or, go to decoder.com and paste this text into there, for a more distributed version) is much easier than getting them to view the HTML.
But why not just share the raw text then? In a situation where you would need to distribute web pages without a host, presumably to share vital information, couldn't you just share the raw text via the same medium you use to send the link?
Because you can distribute web forms that post data to locations for aggregation of standardized data. Useful for organizing strikes and other civil disobedience.
Think about the tools a modern political campaign uses, it would be somewhat similar.
Good point, I don't know what I was thinking when I wrote that. Perhaps the data from the form could get turned into an email or sms draft instead of an HTTP action.
>If you remove the need to host, you remove a point of failure from publishing
I guess it's marginally better since there's no central "place" to take it down. However, the deplatforming concern is still there because you're still probably sharing those links on the major platforms. So they can still take those down. You can argue that having it distributed among random sites is better than having it centralized, but you can achieve the same thing by uploading to multiple sites.
The proper way to do this (that doesn't involve shifting the problem), is to use IPFS/tor. When you're sharing it, you can use a gateway server, but if that gets taken down or blocked, users can still access it using the underlying network if they really wanted.
How is IPFS better? To share a piece of content, you need some place to share a new link, just like with this mechanism. In fact, IPFS is more failure-prone, because besides removing the link from where it's posted, attackers can also try to identify and DDOS the nodes hosting the content, thereby making the links useless. With this mechanism, anyone with the link could decode the data using some offline tool/script, even if the whole Internet goes down.
This and IPFS/Tor are not mutually exclusive. The one sticking point is that this requires JavaScript to work, which is disabled by default in most tor installations.
> How are web pages shared? By distributing links.
Mostly, by distributing links on other web pages.
> How are pages censored? By deplatforming. This usually happens by activists targeting the host right in their cashflow.
This does not avoid that problem, since you still need (for effective reach) hosting for the links.
It also adds new problems, like the length limit most browsers impose on URLs limiting the size of content that can be shared by this method rather sharply.
Also, some browsers (like chrome) consider data URLs hostile and limit them further. Awhile ago I was messing around with some tiny js decompression code I was stuffing into a data URL that let me get a bit more content into the payload, but chrome wouldn't run the scripts at all.
> The length limit is high in most browsers (100k chars)
Huh, I just saw a 2018 post on a Chrome support forum indicating that Chrome had the same 2083 char limit as IE...
Checking more deeply I've found more direct Chromium info indicating implying either no limit or a 2^31 character limit, with a 32kB display limit. So, yeah, the URL length limit may not be an issue.
They can still get that business to take it down. Another good strategy is simply hosting with businesses with strong commitment to free speech. DreamHost is an example of one with this policy that actually fought with the government in court to protect its users. They've been around a long time, too.
One can also host in a different jurisdiction which doesn't have laws or action against your content. Preferrably that can stand up to demands of a major country. Switzerland is popular for this. Example host that I havent vetted but highlights common differentiators:
I originally conceived this as a simple, static CodePen clone, but I felt the "publishing" of pages as URLs was an interesting idea. So I decided to present that aspect of it front and center, even though it wasn't really the point of the project at the beginning.
About a year ago, I had a proof of concept version that I ended up using fairly frequently for sharing quick HTML/CSS/JavaScript experiments (never as a means of seriously publishing and sharing censorship-proof content). I found that if its use is limited to that case, it is actually very handy and robust!
Lots of interesting feedback in the comments, largely more positive than I expected. I was expecting criticism of code quality since I have minimal web development experience, but instead the (valid) criticism is coming from people who are taking the project's applications and use cases seriously, which I appreciate.
It's a very interesting concept in the current climate. The idea of sharing content as atomic and self-sufficient units of data, much like you would share the raw idea itself, but with the ability to have layout, rich content and possibly even interactivity. A web of ideas rather than ideas on pages with servers. The disintermediation of the web.
That's why I let that part of the project be the focus!
I also hope the interest it has generated will inspire others with more knowledge and experience to make better versions or variations that are more practical.
A deeper implementation would be a browser designed for this.
Some hard problems. E.g. URLs provide a consistent place to go to find varying content. If the URL is the content, then it cant vary. How does one solve content discovery without resorting to hosting?
PS: The first person who says blockchain buys the beers.
Blockchain would be silly, there's no need for a globally-consistent view just to have search.
Distributed search has been around for some time, e.g. eMule uses Kad[1], Tribler[2] has a solution based on Tor, but the most appropriate here is probably YaCy[3]. Of course, all of those are technically "hosted", insofar as p2p involves peers acting as servers. But I bet someone has made a f2f[4] version, meaning you'd only serve to a few, hopefully trusted peers.
I have an example that I haven't yet uploaded of an entire (small) site in one URL. If the links are all to be URL Pages, the site has to have a strict tree structure, otherwise there is a circular dependence/chicken-egg problem.
I could see this issue becoming problematic with a browser implemented based on this idea, unless I am misunderstanding something.
How is this bad? If the domain is only used to host the decoder, what bad stuff can a bad guy do by executing JS? I get how this would be a vulnerability on Facebook, but why is it bad here?
It's very bad, because cookies, permissions and lots of other presisted data will be shared by all scripts on the domain.
Example:
Alice makes a benign page that uses location data and shares the link with her friends. The friends know what's up and grant location permissions to get the page to work.
What they actually did, however was grant location permissions to "https://jstrieb.github.io/urlpages" and any script served by that origin.
So when some of the friends later open Eve's URL that contains a location harvester, they don't get any prompt at all: Eve's link can just reuse the location permission given to Alice because, as far as the browser is concerned, both scripts belong to the same page.
Have someone who controls the domain and is logged in to their website admin area in another tab click a link you've crafted to steal and send you their delicious cookies.
Or, any time this sort of thing comes up in a thread on a news website like this one, an opportunistic attacker could post a malicious payload in a link and watch as all the excited people blindly click away.
just off the top of my head, but a bad guy could distribute a "bugged" version of the url that installs cookies and phones home.
very hard for a normal person to tell the good link from a bad link, and removes the way most people determine if a link should be trusted (by looking at the domain)
That's not true, service workers still obey the cross-origin policy.
The only thing I can think of is, that a service worker has it's own CSP, as opposed to obeying the CSP of the registering script, but this service doesn't use CSP anyways.
That said I couldn't get it to work. You would need to be able to register a service worker file at something like https://jstrieb.github.io/urlpages/sw.js but all the pages you have control over have proper html mime type and are rejected when you try to register them (and have too much actual html junk in them to run as js files anyway).
Yeah, you'd probably want to sign the content in the URL and verify the signature before displaying. Sound doable, if someone would actually take that idea serious...
Added to the TODO section in the repo. I agree that it seems pretty doable to implement digital signatures, and similarly to support encrypting and password-protecting URL Pages.
Since this has received more attention than I expected, I'd be willing to put the time in to implement this.
Maybe I don't get something, here, buy as I understand it, the url contains all the data so you are effectively providing the information wherever the link is posted. Thus, if the posted link is gone, the data is gone, the generator-webpage doesn't really have anything to do with it.
But the way to "share" information this way is to just send it to someone else. You could instead just send a text file or an image, it's the same (except, you could actually encrypt it). After all this is exactly as censorship resistant as direct communication today.
The, "there is no server" argument doesn't hold if you want someone else to ever get your message. It's just either your chat app or some website where the "link" is posted. I don't see any value added.
One idea I've thought about in this same area is 'compressing' the URLs via Unicode. Suppose you limit the page contents to just [a-zA-Z0-9] and about 15 special characters, and space/newline. Documents would be rendered without any formatting aside from what you can do with a monospace font (all caps, white space, tables, etc.).
This allows about 80 unique characters - about 6.5 bits per character. There are dozens of thousands of visible Unicode characters, at least 8 bits each. Using modern compression techniques you can probably get around 4 or maybe even 5 characters of the real text into one visible Unicode character. This would compress the messages so that you can share them over SMS/Twitter (split into a few parts perhaps) or Facebook without posting giant pages of text.
The data could start with a 'normal' not-rare-Unicode character indicating the language, and the same character at the end of the stream to make it easy to copy paste:
a<unintelligible Unicode garbage>a
Multiple translation tables could exist for different languages. Japanese and Indic languages have ~50 alphabet characters but no capitals, so I think you would be able to get similar compression regardless of input language.
Users would go to a website that does decompression with JS and just paste in the compressed text to view the plaintext. And vice versa. If the compression format is open then anyone can make and host a [de]compressor.
Version codes may be useful to add, so that more efficient compression techniques don't break backwards compatibility. The first release of the English compression algorithm would start and end with 'aa', a better algorithm released a few years later would use 'ab', etc. Similarly the version code could be used to indicate a more restricted set of characters to allow for better compression - [a-z0-9] and . , space newline is 30 characters which is only 5 bits per character before any further compression.
One of my toy projects: https://github.com/thanatos/baseunicode ; there's probably a lot that could be done better, and the alphabet it uses needs to be greatly expanded.
There are few other implementations of the same idea out there, too. Here's another one: https://github.com/qntm/base65536 ; since it uses a power-of-two sized alphabet, it avoids some of the work mine has to do around partial output bytes. That link also links to some other ideas & implementations
The exact metric I cared about at the time was compressing for screen-space, not for actual byte-for-byte space. I wrote it after spending some time in a situation where doing a proper scp was just a PITA, unfortunately, but copy/pasting into a terminal is almost always a thing that works. But I was also dealing with screen/tmux, which makes scrolling difficult, so I wanted the output to fit in a screen. From that, you can implement a very poor man's scp with tar -cz FILES | base64 ; the base-unicode bit replaces the base64.
Mine doesn't use Huffman coding as it wants to stream data / it leans on gzip for doing that.
You don't even have to limit yourself to visible characters, unless whatever communication medium you use forces you to do so. Let's hope HN doesn't filter the 96 zero-width characters between these quotes: ' '. EDIT: Unfortunately it appears to have turned them into spaces.
Python code to reproduce:
zw = '\u200b\u200c'
enc = lambda text: ''.join(zw[int(bit)] for byte in text.encode('utf-8') for bit in f'{byte:08b}')
dec = lambda code: b''.join(bytes([int(''.join(str(zw.index(bit)) for bit in code[i:i+8]), 2)]) for i in range(0, len(code), 8)).decode('utf-8')
>>> dec(' ')
'Hello world!'
Thanks for the input! Finding a more concise alternative to base-64 encoding is currently an item in TODO. I've also been thinking about other forms of compression.
Regarding your last point, my plan is to have the main page (index.html on the site) always decode from pure base-64 but to eventually have other "publish" targets or endpoints (e.g. /b65536/index.html#asdf) that decode in different ways. I can then add corresponding "publish" buttons to the editor
Shameless self promotion, but if you're interested in this sort of thing and know a little about DNS, you can similarly host a website purely through DNS records; no hosting, no cost, etc. https://serv.from.zone
While I agree that this is technically correct, I feel it ignores the spirit of the project.
Since I released it open-source, my expectation is that anyone who cares about safety from takedowns will clone and host one or more of their own. Or, for that matter, use the files offline with shared base-64 encoded URLs
If your site goes down, all the links go broken. To use a clone I'd need to manually copy a part of a dead link and append it on the clone url. The links are actually just data stores in need of a renderer. Why not use an html file instead?
Right it took me a a little while to realize that its still needed a host since its talks about not needing a host and doesn't say here is the bootstrap page that needs to be hosted.
data url's are truly entirely in the url with no hosting or dependencies.
Similar to this [0] from a few years ago.
That one also had some basic encryption included.
Some previous discussions[1,2] of that tool highlights that this can easily be abused for xss or distribution of illegal materials which you may not want to find yourself the focus of.
Unsure I really understand the XSS risk that's discussed in the links you provided. Nobody cares about that origin because it has nothing of value hosted on it and it's not like you'd be able to access cookies from a different origin.
The worst you could do is exploit a browser zero-day, but you can do that on any static hosting site already!
First, it's not hard to imagine that someone might try to get their account banned for a GitHub terms of service violation keeping in mind that GitHub holds the account owner accountable for content in their repository. This is true even if that content is from other account holders they've given access to their repository. In this case, anonymous access is intentionally being provided which could of course go very, very, very wrong.
"You agree that you will not under any circumstances upload, post, host, or transmit any content that:
is unlawful or promotes unlawful activities;
is or contains sexually obscene content;
is libelous, defamatory, or fraudulent;
is discriminatory or abusive toward any individual or group;
gratuitously depicts or glorifies violence, including violent images;
contains or installs any active malware or exploits, or uses our platform for exploit delivery (such as part of a command and control system); or
infringes on any proprietary right of any party, including patent, trademark, trade secret, copyright, right of publicity, or other rights."
Understanding what the tool does, GitHub might be forgiving on the ToS violation front. The problem is with the second scenario: law enforcement. It's very likely that in a lot of jurisdictions, law enforcement, prosecutors, etc., wouldn't initially understand what's going on here and even if it can be explained to their satisfaction, I think very few of us would like to spend a night (or more) in jail while attempting to explain.
You are abusing trust - now it's going to be the jstrieb.github.io who is serving malware, and since his system serves whatever JS I provide by design it becomes a very effective XSS host.
It's not really jstrieb.github.io that's serving it, because since the content is in the url fragment, it is never sent to or from the server, it's handled entirely clientside.
>a very effective XSS host.
It can only do XSS against jstrieb.github.io which has nothing valuable. So it's not useful for anything. It can't be used in a <script> tag to obfuscate XSS attacks against other websites either, because the response isn't formatted as javascript. I guess it could be used in <iframes> on other websites in order to add obfuscation, but I think the use to attackers would be quite low.
You still need something to serve you the initial document.write js, unless you are going to convince people to open your links with locally saved "index.html". I called it "XSS" because you can execute arbitrary javascript, and I was trying to avoid bluntly calling it "malware".
Though I probably should have. Here is an example of a HackerNews login page served with jstrieb.github.com https://tinyurl.com/yypvh3by, you can login to news.ycombinator.com with it, but it easily could have been a phishing site.
My point is, this is a very good idea for offensive operations.
But someone could register the github account newsycombinator and then serve an identical phishing page at newsycombinator.github.io .
I guess you're right that it's useful for takedown resistance in phishing attacks. It's useless for small, sophisticated, targeted phishing attacks, but for large blunt untargeted phishing attacks it could be useful to have a site that would be difficult to take down and censor.
This reminds me of https://itty.bitty.site/edit, which does the exact same thing but with a size limit, no in browser editor, and if I'm not mistaken, a compression algorithm.
So with both of these cases, the site itself (github pages or bitty.site) need to be up in order to decode the content? It's not like the way images can be stored in a data URL, right?
I don't see any effective difference between putting the data in a URL and putting the data in a local HTML file and opening the file in a browser. (And if you're going to put the data in a URL, you should probably be using a data: URL.)
One difference is that when you visit/bookmark some normal webpage, you save on your computer only its title and address. With this, you are saving the page content as well, just by visiting the page.
With the data in a file you are also saving the page content (in a file). I don't see what difference it makes whether you're saving that content in a file or in a bookmark. (At the risk of stating the obvious, bookmarks are stored in a file.)
Discord messages aren't actually unlimited; they are limited to 2,000 characters, unless this restriction was recently removed. I have personally encountered this, mainly when using code blocks.
Can you tinyurl a url with more than a few thousand characters?
Even if this were possible (which, even if tinyurl supported it, would break most browsers) you would be effectively hosting your content on tinyurl.com. Again, this is no different from hosting the file on any of a zillion public data hosting services.
I see some potential for using this for content distribution via QR codes. Create a page, make a QR code sticker and you're now hosting a website on a wall.
Couple notes: I tested doing this on a fork, QR codes have a 3k upper limit (which is fine for some pages), and when using large amounts of data like this they switch to a large-payload format, which seems to be blocked by the iPhone camera.
Either they're blocking large-payload qr codes or by url length.
Name one modern browser (let alone “most”) that does that. I’ll wait.
PS: IE6-9 don’t count as modern browsers, and haven’t for a long time. To my knowledge, they were the only browsers to limit to exactly 2083 characters.
Saying that this "cannot be taken down" is quite stupid. It's all fine and dandy if you live in a place without strong censorship, but in a situation where information that cannot be taken down would be really necessary, how long do you think it will take for the interested parties to find out that the content is in the message?
The content can be hidden in a short url, so it is potentially a new thing that censors would have to deal with, unless they want to block all short URL services.
I wondered about this use-case in the context if data URIs.
In theory, an alternative way to serve page content (instead via HTTP 200) should be to return a 3xx redirect and set the location header to a data URI.
This would indeed allow you to use URL shorteners or any kind of open redirect as a de-facto web host.
("In theory", in the sense that HTTP allow redirects to point to any kind of URL, so there is nothing saying it couldn't be a data uri)
I imagine this kind of situation is something that browser developers explicitly want to avoid, because the security implications would be a nightmare. So practically browsers do not follow redirects to data uris.
This principle of storing content in a URL is also used at Falstad CircuitJS, where you can draw an electronic circuit diagram and export a link. Links get quite big for complex circuits, but it's very easy to store them. My project notes are full of links to the simulated circuits, example: http://www.falstad.com/circuit/circuitjs.html?cct=$+1+0.0000...
so instead of adressing the content you are sending the content directly. brilliant. who could come up with such an amazing idea...
seriously, though: this has some marginal advantages compared to just sending the sites such as being supported as clickable in almost every program and being allowed in size restricted posts which allow arbitrarily long links
This is really interesting. It strikes me as a modern encrypted communications method, except as designed currently everyone already has the key. I wonder how much could be shared this way, and whether it could be sufficiently secured to become a “private” means of distribution.
It seems like the main content bottleneck is the fact that they store it all in straight-up base64.
Trying to, say, share binaries across the net would become very unwieldy very quickly.
Now, if they were to implement some compression before URL encoding, it might then be feasible to break up large files into a series of links or something...
Interesting! It's like as if we'd had the same intent. Great minds think alike ;) I created writer.gorilla.moe and editor.gorilla.moe a while ago, exactly with the same intent :)
Perhaps I'm misreading / misunderstanding some of the comments but long to short this is effectively a .url file format (so to speak). Yes, it looks like a URL but it's not uniform'ing, it's not resource'ing, and it's not locator'ing - in the traditional sense of those three letters.
it doesn't have to be an url shortener, but i believe that you need some external link to get self-reference here. as a rough handwavy analogy, linked lists use a pointer to the tail, not a value of the tail (here the value being the site-stored-in-a-url, and the pointer being the shortened url)
Sorry don't get this. The basic premise of REST is to access information stored on the network elsewhere. You are asking people to store information in their bookmarks.
How are web pages shared? By distributing links.
How are pages censored? By deplatforming. This usually happens by activists targeting the host right in their cashflow.
If you remove the need to host, you remove a point of failure from publishing. Now the link is the content.
It's an interesting concept whether or not you agree with this implementation.