Hacker News new | past | comments | ask | show | jobs | submit login
How do dat:// sites interact with servers? (hashbase.io)
201 points by pfraze 9 months ago | hide | past | web | favorite | 54 comments

"Cross-origin resource sharing (CORS) is a policy that prevents a webpage from connecting to a server, unless that server has given that webpage permission to connect."

CORS only applies to requests for restricted resources like fonts or XHR requests that aren't simple GETs and POSTs.

"So, while this is typically not possible: foo.com/index.html ---GET---> bar.com/pic.jpg"

Typically this is possible. Images, stylesheets, scripts, iframes, and videos aren't subject to CORS.

"You can solve it by routing the request through your host server: foo.com/index.html ---GET---> foo.com ---GET---> bar.com/pic.jpg

Not necessary, the client's browser can get bar.com/pic.jpg just fine all by itself.

"Pinning tools like Hashbase and Homebase help keep dat:// sites online"

If you publish a dat archive, how do you notify Hashbase to pin it? Can you do it through dat?

To keep my dat alive with Hashbase, do I have to set up an account, provide and confirm an email address, link a credit card, etc.?

Are centralized servers, financial institutions and surveillance all required components of the anonymous, decentralized, peer-to-peer web?

> Typically this is possible. Images aren't subject to CORS.

You're right, that was a misleading example. I changed it to data.json to be more clear.

> If you publish a dat archive, how do you notify Hashbase to pin it? Can you do it through dat? Does it require setting up an account on Hashbase, providing an email address, linking a credit card, etc.?

The "Pinning" system is very similar to Git remotes. You can pin using any endpoint that complies with https://www.datprotocol.com/deps/0003-http-pinning-service-a.... So, similar to a git remote, you do need some kind of authentication with the pinning service - unless somebody writes one that's open for anybody to push to.

The UX flow will be similar to a git remote as well, you use an HTTPS to tell the server to sync the dat. So, it's an explicit user action.

We've got two nodejs implementations of the pinning service API you can self-deploy, https://github.com/beakerbrowser/hashbase and https://github.com/beakerbrowser/homebase, and then we run a Hashbase instance at hashbase.io

>> Typically this is possible. Images aren't subject to CORS.

> You're right, that was a misleading example. I changed it to data.json to be more clear.

CORS doesn't preflight GET, POST or HEAD methods unless they have custom headers or content-type other than application/x-www-form-urlencoded, multipart/form-data or text/plain.

So a simple GET bar.com/data.json works just fine in today's browsers.


In the absence of CORS header configuration on the target site, you can't use XHR or Fetch to get anything from a different domain. It doesn't matter if it's JSON, an image, or plain text.

To some extent, you're conflating some of the requirements to avoid preflighting with cross-origin requests simply being allowed, and they're not the same.

I'd be happy to be corrected on this, but here's what I understand:

While fetch doesn't preflight for GET, it does require an Access-Control-Allow-Origin header. You can specify `no-cors` in the mode to circumvent this, but then you cant access the response body (https://developer.mozilla.org/en-US/docs/Web/API/Request/mod...)

Here's an example using your own site.

Open the console in an empty Chrome window and paste this:

    .then(response => response.text())
    .then(str => console.log(str))
You'll get your RSS feed, no CORS tricks required.

My site has the Access-Control-Allow-Origin:* header set. Try the same thing on the Beaker site in the devtools for https://example.com.

      .then(response => response.text())
      .then(str => console.log(str))
It should fail

It is failing for me because of the content security policy, which is a different thing.

That's the case on HN. Example.com doesnt have CSPs setup so it's CORS that causes the issue.

or content-type other than application/x-www-form-urlencoded, multipart/form-data or text/plain.

application/json is a content-type other than application/x-www-form-urlencoded, multipart/form-data or text/plain, and thus a request for it will fail unless the required CORS headers are present

There's a couple issues with your comment: first, the restriction you're talking about is related to preflighting, not to whether requests are allowed at all.

Additionally, you're thinking about the "wrong" Content-type header: the limitation you're mentioning about urlencoded and so on is a limitation on request headers, not response headers.

The CORS headers are required for the GP's described request to succeed, but not for the reasons you give.

> Are centralized servers, financial institutions and surveillance all required components of the anonymous, decentralized, peer-to-peer web?

Servers and financial institutions will never go away, no matter how hard we try. I'm hoping that we can make surveillance go away by sharing much less data with organisations who rely on surveillance to survive.

Hashbase is not "centralized" in the sense that you are always free to choose a different provider of its hosting services. You can host your own Hashbase: https://github.com/beakerbrowser/hashbase

You can even choose multiple providers at once, providing you with resilience in case one of your chosen providers violates your trust, eg. by losing your data or using it to spy on you.

> CORS only applies to requests for restricted resources like fonts or XHR requests that aren't simple GETs and POSTs.

CORS applies to XHR. Including GETs and POSTs, and including fetching images over XHR.

There's some other sibling comments here discussing preflight requests, which it sounds like you might be referring to, but CORS is not limited to just preflight requests.

CORS applies to XHR but fetch can make opaque requests by setting {mode: 'no-cors'} as the second parameter.

> CORS only applies to requests for restricted resources like fonts or XHR requests that aren't simple GETs and POSTs.

This is completely incorrect. You can not make any XHR GET or POST requests cross-origin without CORS.

Fonts also do not require CORS, you can link to them in your CSS/styles/etc without technical restrictions (might be legal restrictions of course).

Sure you can, you can make any request you want without CORS, the content is just opaque to you (for example you can display an image but not manipulate its pixels).

You do this for example with `fetch('./any-resource', {mode: 'no-cors'})`.

You can then for example do a `.then(x => x.blob())` and then use the resulting blob as an image.

> If you publish a dat archive, how do you notify Hashbase to pin it?

This is provided by hashbase in their ui. Sign up for an account and add your dat. It is email based, no credit card/address unless you go over the data cap.

If you are uncomfortable using the service you can run your own node, it is open source: https://github.com/beakerbrowser/hashbase

> Images, stylesheets, scripts, iframes, and videos aren't subject to CORS.

It depends on what you want to do with the image: https://developer.mozilla.org/en-US/docs/Web/HTML/CORS_enabl...


This is a little backwards, the goal of CORS isn't just to protect the _user_ it is also to protect the _third party website_.

All it takes for a website to opt-in to this is just adding a single header - it's possible for bar.com to allow the request from foo.com by opting into it.

>For each new origin that the site contacts, a permission prompt will be presented

I don't think this is an adequate approach to security. When the browser presents me with a prompt to load data from a third party site, I don't know what data is being loaded, what it's being used for, or whether this prompt is expected (as part of the regular functioning of the application) or unexpected (indicating that the application has been compromised, and I should navigate away from it).

In general, I've noticed that users react in one of two ways to these sorts of prompts. Naive users will blanket allow -- allowing all sites to access all of the capabilities of their browsers, regardless of the reason or necessity of that access. More sophisticated users will blanket deny. If it's not immediately apparent why a site needs the permission that it requests, that request will get denied, even if it's a valid requirement. Very very few users will think about why a site is requesting the permissions that it is requesting and consider those requests on a case by case basis.

I like the ideas of dat and IPFS, but I can't quite understand the difference.

What I can understand, is, they use new protocols and this is an issue in the Web today. I think the only way they can succeed would be with some laws or misstakes by big corps that would drive customers away from them.

What I also liked was remoteStorage [0] it is a bit like localStorage, but the data is managed independendly from the application itself.

[0] https://remotestorage.io/

There's a lot of similarities, as they are both peer-to-peer and decentralized.

I've mostly done Dat. I want to do a bit more IPFS.

Dat feels a bit more like git for files - you can create a local file archive using the command line tools, and it's a separate step to sync it to the peer-to-peer network. There's a global discovery service for advertising archive keys, but it doesn't work at the level of single files. It's very lightweight.

IPFS supports many of the same operations, but you're mostly interacting with a local gateway server which is continuously connected to the network. I believe IPFS tries to content hash every single file so they are de-duplicated globally.

Does anyone has build-in realtime features?

Yes, Dat's core data structure is an append-only log that is designed for extremely fast real-time replication. http://github.com/mafintosh/hypercore

If you use Dat through its command-line app, you get real-time updates to the files shared using it.

Hashes and append only logs are all not very good for realtime data because of the extra overhead that has to be calculated, but CRDTs are.

CRDTs naturally fit with P2P/decentralized topologies, and we've generalized them in https://github.com/amark/gun which is the most popular (8K+ stars) open source (MIT/Zlib/Apache2) realtime decentralized database.

It is running in production on P2P alternatives to Reddit and other apps, that have pushed over half a terabyte in a day.

It's not incorrect to say that hashes and signatures add some overhead, but the question is whether the overhead is significant enough to matter for the usecase. Probably not.

Dat is realtime. You are notified about updates as soon as they are distributed. In Beaker, the files-archive API has a `watch()` method to do this. If you're accessing a dat files-archive, you're participating in the syncing swarm and so you'll receive those updates automatically.

You'll want to use a UDP socket if you're streaming a high volume of data with low latency requirements, for instance for a multiplayer FPS. But Dat has been used to stream live video, so it's probably real-time enough for most Web use-cases.

Small aside: Comparing Dat to CRDTs is apples to oranges. It's like comparing Javascript to B-Trees; they're not quite the same kind of technology. In fact, the new version of Dat uses CRDTs in order to allow multiple users to write to a files-archive.

When we met just a month or so ago you didn't tell me you were adding CRDTs!!! This is very exciting news. Dominic had mentioned a specific type of CRDT he had added (but wasn't generalized).

Append only logs have overhead that would make it a poor choice for most realtime applications, like GPS tracking, yes FPS games, google Docs, website creators, and many more. Basically any use case where data mutates.

In 2010 I used and built all my own custom event source system, I was the hugest proponent of this / append only logs. It was so futuristic. But 4 years in I hit all sorts of scaling problems and had to redesign everything from scratch and that is when I found CRDTs. On all accounts they are superior, in a mathematical or logical manner, because they are a superset to DAGs, immutable/append only logs, and many other popular data structures. Not apples and oranges.

I built a little side multiuser wiki side project that’s actually using two levels of CRDTs... hyperdb (underneath hyperdrive) and then automerge on top of that. Sort of hard to explain the full design in a short entry, but you can play with it here:


The UX definitely needs some improvement, but it’s just a personal project so far.

I think "laws or misstakes by big corps" are exactly what we are seeing happen #Facebook #GDPR

Does the dat:// scheme have any history or did the developers of this Beaker browser invent it? Just curious since I haven't seen the protocol before.

It's been around for a year or so. I was wondering when it'd make it to HN. It's very nicely done but unfortunately the only implementations are in javascript.

Rust implementation is in the works with grant funding, https://github.com/datrs

I'm not sure Rust is an especially great language for interop purposes or that the productivity will be high enough to keep up with the JavaScript implementation.

If you want to make a library with a C compatible API I'd be tempted to explore SubstrateVM. It can export C symbols to the generated standalone .so / .dll files, maybe you can even reuse some of the JS code, or failing that, a Java/Kotlin implementation would be compiled down to native code or be usable from other scripting languages like Ruby.

It's been around for more than 4 years.

I doubt it. datproject.org was registered two years ago, and their whitepaper was published last year https://github.com/datproject/docs/blob/master/papers/dat-pa...

Where are you getting 4 years from?

It's been almost 5 years at this point somehow!

Check out the first commit: https://github.com/datproject/dat/tree/464679267049899eafa34...

Bit more on the funding history, etc. : https://blog.datproject.org/2017/09/15/dat-funding-history/

I know the people involved and they were definitely working on it (with funding) a long time ago. I remember talking with them about it around the time of the io.js fork which was in 2015. The project was up and running at that point.

Really neat to see beaker broswer support native P2P with progressive enhancements like github.com/beakerbrowser/hashbase providing the cute URL's we all like.

I'm sure hashbase.io will be blocked really quick, so it's important that the core P2P address system stay in the forefront. Transports also need to find many ways to communicate over https, shadowsocks, tor, DNS, and others.

Doesn't answer your original question but I moved to syncthing from dat. It just works out of the box, no manual setup, and a very active community around it. Also opensource and in production for years with rave reviews, and being used in plenty of big scale production projects.

I never understood how Beaker browser can act as a server listening on a port. It sounds like you always need relays on the internet because your router and ISP is gonna block all ports unless requested not to.


Firewalls are a pain.

UDP hole punching (using the UTP protocol) and the discovery network works a lot of the time.

Much of the people publishing public content for access by Beaker are using hashbase.io to "pin" the content and to act as a public peer, and those ports aren't behind a firewall, so the data can be directly replicated easily.

Articles like these would benefit from a link summarizing what that technology is about...

At least add to the title on HN something about what a dat:// link is :)

I still wish these guys all the best, but if you want to start doing 3rd party like this then everything will eventually devolve like it did for the normal web. We do need a new web and way of moving information, but once you start directly connecting to servers they need to know who you are.

I don’t hold this binary view that a “decentralized Web” has to avoid certain technologies. I believe we should aim to use peer-to-peer systems where it’s impactful and practical, and in the rare case you do need a third-party server, there are things you can do to limit your dependence and make them easy to reconfigure.

That’s our approach with Beaker. We use peer-to-peer systems as much as possible, and then plug in servers as minimally as possible.

I upvoted you and wish you all the best, but the core problem I see with this is that it makes it hard to make policy around.

If I'm making a new system or setting policy for a government or other high-security minded client (like a political campaign, military contractor, activist group, or private intelligence corp) I need off the shelf stuff with zero known attack surface OR I need to individual vet every single offering within that protocol suite. This is why you can email members that work for The Government of Ontario, but they won't click on links to non-whitelisted places. The attack surface when clicking a link is fucking huuuuuge (pdf 0days anyone?), while the attack surface for loading an email is much smaller.

There are a ton of interesting web-replacements that hackers are playing around with right now, but the one that wins for the next web is the one that lets stupid people do whatever they want without worrying. In my opinion, 3rd party means worrying, and in an ideal world it would go away.

The irony of this whole thing is that I'm actively arguing against my own long-term interests. A structural change of the kind I advocate for would dramatically reduce the profitability of being in either data science or cybersecurity; both fields I have a foot in. But I don't care.

Securing the flow of information between people is too important to humanity's long term survival.

Yeah that's an interesting perspective. There are a lot of security issues that come into play when we start toying with how the Web platform works, and I'm somewhat curious whether all Websites should have a sort of "uninstalled" versus "installed" mode, where the uninstalled mode is basically able to do nothing. Then users have to go through an "install" flow to enable the riskier APIs.

I think one other area that the Web hasn't tapped into enough is using protocol/scheme identifiers to introduce strong guarantees to URLs. You can compose schemes with a '+', so I think if you wanted an "on click guarantee" that a site is going to have certain security properties, you might try something like:

http+safe://.../ dat+safe://.../

And then the site would load in a "safe mode" which, like the "uninstalled" mode, is extremely limited in what it can do.

Firefox is in the early phase of implementing support for non-HTTP protocols, starting with Tor. My understanding (from reading some Mozilla comments here maybe?) is that their goal is to make this pluggable so that eventually Firefox can support all kinds of non-standard transports, including p2p ones like dat://

We’re not there yet, but Mozilla is making a very positive step in that direction, and hopefully other browser vendors will follow. If I were creating a new protocol, I would be doing it now, to “skate where the puck is going” as they say.

I think the really interesting things here are more than just "support new p2p transports/protocols". The things Beaker is doing around changing the way we author the web are the real important pieces, and finding the "middle ground" or transition path is one of the keys to success. So, to answer the OP as well, supporting loading assets from traditional servers is pretty much a requirement for authors I think.

This is the whole point of 'Serverless', everything 3rd party!

I think you are missing the point here?

Everyone is a server is pretty much the opposite of "serverless". The article is just illustrating the migration path from traditional web assets from centralized servers and how you can use them in the distributed web.

A real life use case would be something like:

You want to build a financial tracking app in the distributed web. But your bank is the central source for your data. With the method shown in the article you could request the api endpoint from bank.com inside your p2p/Beaker app then load that data into local storage/a dat archive/memory and use it to track your data. Think mint but without giving your bank credentials to a 3rd party.

The name 'Serverless' really doesn't explain the concept. Basically it means you dont have to worry management of servers, and you only do low effort work by out sourcing most of your logic (authentication, db, etc) to 3rd party APIs and vendors.


I had to read the title 10 times before I realized it wasn't someone asking: "how do dat?"

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact