
How do dat:// sites interact with servers? - pfraze
https://pfrazee.hashbase.io/blog/dat-and-servers
======
panarky
_" Cross-origin resource sharing (CORS) is a policy that prevents a webpage
from connecting to a server, unless that server has given that webpage
permission to connect."_

CORS only applies to requests for restricted resources like fonts or XHR
requests that aren't simple GETs and POSTs.

 _" So, while this is typically not possible: foo.com/index.html ---GET--->
bar.com/pic.jpg"_

Typically this is possible. Images, stylesheets, scripts, iframes, and videos
aren't subject to CORS.

 _" You can solve it by routing the request through your host server:
foo.com/index.html ---GET---> foo.com ---GET---> bar.com/pic.jpg_

Not necessary, the client's browser can get bar.com/pic.jpg just fine all by
itself.

 _" Pinning tools like Hashbase and Homebase help keep dat:// sites online"_

If you publish a dat archive, how do you notify Hashbase to pin it? Can you do
it through dat?

To keep my dat alive with Hashbase, do I have to set up an account, provide
and confirm an email address, link a credit card, etc.?

Are centralized servers, financial institutions and surveillance all required
components of the anonymous, decentralized, peer-to-peer web?

~~~
pfraze
> Typically this is possible. Images aren't subject to CORS.

You're right, that was a misleading example. I changed it to data.json to be
more clear.

> If you publish a dat archive, how do you notify Hashbase to pin it? Can you
> do it through dat? Does it require setting up an account on Hashbase,
> providing an email address, linking a credit card, etc.?

The "Pinning" system is very similar to Git remotes. You can pin using any
endpoint that complies with [https://www.datprotocol.com/deps/0003-http-
pinning-service-a...](https://www.datprotocol.com/deps/0003-http-pinning-
service-api/). So, similar to a git remote, you do need some kind of
authentication with the pinning service - unless somebody writes one that's
open for anybody to push to.

The UX flow will be similar to a git remote as well, you use an HTTPS to tell
the server to sync the dat. So, it's an explicit user action.

We've got two nodejs implementations of the pinning service API you can self-
deploy,
[https://github.com/beakerbrowser/hashbase](https://github.com/beakerbrowser/hashbase)
and
[https://github.com/beakerbrowser/homebase](https://github.com/beakerbrowser/homebase),
and then we run a Hashbase instance at hashbase.io

~~~
panarky
>> Typically this is possible. Images aren't subject to CORS.

> You're right, that was a misleading example. I changed it to data.json to be
> more clear.

CORS doesn't preflight GET, POST or HEAD methods unless they have custom
headers or content-type other than application/x-www-form-urlencoded,
multipart/form-data or text/plain.

So a simple GET bar.com/data.json works just fine in today's browsers.

[https://developer.mozilla.org/en-
US/docs/Web/HTTP/CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)

~~~
pfraze
I'd be happy to be corrected on this, but here's what I understand:

While fetch doesn't preflight for GET, it does require an Access-Control-
Allow-Origin header. You can specify `no-cors` in the mode to circumvent this,
but then you cant access the response body ([https://developer.mozilla.org/en-
US/docs/Web/API/Request/mod...](https://developer.mozilla.org/en-
US/docs/Web/API/Request/mode))

~~~
panarky
Here's an example using your own site.

Open the console in an empty Chrome window and paste this:

    
    
      fetch('https://pfrazee.hashbase.io/feed.xml')
        .then(response => response.text())
        .then(str => console.log(str))
    

You'll get your RSS feed, no CORS tricks required.

~~~
pfraze
My site has the Access-Control-Allow-Origin:* header set. Try the same thing
on the Beaker site in the devtools for
[https://example.com](https://example.com).

    
    
        fetch('https://beakerbrowser.com/dat.json')
          .then(response => response.text())
          .then(str => console.log(str))
    

It should fail

~~~
bhritchie
It is failing for me because of the content security policy, which is a
different thing.

~~~
pfraze
That's the case on HN. Example.com doesnt have CSPs setup so it's CORS that
causes the issue.

------
quanticle
>For each new origin that the site contacts, a permission prompt will be
presented

I don't think this is an adequate approach to security. When the browser
presents me with a prompt to load data from a third party site, I don't know
what data is being loaded, what it's being used for, or whether this prompt is
expected (as part of the regular functioning of the application) or unexpected
(indicating that the application has been compromised, and I should navigate
away from it).

In general, I've noticed that users react in one of two ways to these sorts of
prompts. Naive users will blanket allow -- allowing all sites to access all of
the capabilities of their browsers, regardless of the reason or necessity of
that access. More sophisticated users will blanket deny. If it's not
immediately apparent why a site needs the permission that it requests, that
request will get denied, even if it's a valid requirement. Very very few users
will think about why a site is requesting the permissions that it is
requesting and consider those requests on a case by case basis.

------
k__
I like the ideas of dat and IPFS, but I can't quite understand the difference.

What I can understand, is, they use new protocols and this is an issue in the
Web today. I think the only way they can succeed would be with some laws or
misstakes by big corps that would drive customers away from them.

What I also liked was remoteStorage [0] it is a bit like localStorage, but the
data is managed independendly from the application itself.

[0] [https://remotestorage.io/](https://remotestorage.io/)

~~~
jimpick
There's a lot of similarities, as they are both peer-to-peer and
decentralized.

I've mostly done Dat. I want to do a bit more IPFS.

Dat feels a bit more like git for files - you can create a local file archive
using the command line tools, and it's a separate step to sync it to the peer-
to-peer network. There's a global discovery service for advertising archive
keys, but it doesn't work at the level of single files. It's very lightweight.

IPFS supports many of the same operations, but you're mostly interacting with
a local gateway server which is continuously connected to the network. I
believe IPFS tries to content hash every single file so they are de-duplicated
globally.

~~~
k__
Does anyone has build-in realtime features?

~~~
marknadal
Hashes and append only logs are all not very good for realtime data because of
the extra overhead that has to be calculated, but CRDTs are.

CRDTs naturally fit with P2P/decentralized topologies, and we've generalized
them in [https://github.com/amark/gun](https://github.com/amark/gun) which is
the most popular (8K+ stars) open source (MIT/Zlib/Apache2) realtime
decentralized database.

It is running in production on P2P alternatives to Reddit and other apps, that
have pushed over half a terabyte in a day.

~~~
pfraze
It's not incorrect to say that hashes and signatures add some overhead, but
the question is whether the overhead is significant enough to matter for the
usecase. Probably not.

Dat is realtime. You are notified about updates as soon as they are
distributed. In Beaker, the files-archive API has a `watch()` method to do
this. If you're accessing a dat files-archive, you're participating in the
syncing swarm and so you'll receive those updates automatically.

You'll want to use a UDP socket if you're streaming a high volume of data with
low latency requirements, for instance for a multiplayer FPS. But Dat has been
used to stream live video, so it's probably real-time enough for most Web use-
cases.

Small aside: Comparing Dat to CRDTs is apples to oranges. It's like comparing
Javascript to B-Trees; they're not quite the same kind of technology. In fact,
the new version of Dat uses CRDTs in order to allow multiple users to write to
a files-archive.

~~~
marknadal
When we met just a month or so ago you didn't tell me you were adding CRDTs!!!
This is very exciting news. Dominic had mentioned a specific type of CRDT he
had added (but wasn't generalized).

Append only logs have overhead that would make it a poor choice for most
realtime applications, like GPS tracking, yes FPS games, google Docs, website
creators, and many more. Basically any use case where data mutates.

In 2010 I used and built all my own custom event source system, I was the
hugest proponent of this / append only logs. It was so futuristic. But 4 years
in I hit all sorts of scaling problems and had to redesign everything from
scratch and that is when I found CRDTs. On all accounts they are superior, in
a mathematical or logical manner, because they are a superset to DAGs,
immutable/append only logs, and many other popular data structures. Not apples
and oranges.

~~~
jimpick
I built a little side multiuser wiki side project that’s actually using two
levels of CRDTs... hyperdb (underneath hyperdrive) and then automerge on top
of that. Sort of hard to explain the full design in a short entry, but you can
play with it here:

[https://dat-tiddlywiki.glitch.me](https://dat-tiddlywiki.glitch.me)

The UX definitely needs some improvement, but it’s just a personal project so
far.

------
Sol-
Does the dat:// scheme have any history or did the developers of this Beaker
browser invent it? Just curious since I haven't seen the protocol before.

~~~
repolfx
It's been around for a year or so. I was wondering when it'd make it to HN.
It's very nicely done but unfortunately the only implementations are in
javascript.

~~~
pfraze
Rust implementation is in the works with grant funding,
[https://github.com/datrs](https://github.com/datrs)

~~~
repolfx
I'm not sure Rust is an especially great language for interop purposes or that
the productivity will be high enough to keep up with the JavaScript
implementation.

If you want to make a library with a C compatible API I'd be tempted to
explore SubstrateVM. It can export C symbols to the generated standalone .so /
.dll files, maybe you can even reuse some of the JS code, or failing that, a
Java/Kotlin implementation would be compiled down to native code _or_ be
usable from other scripting languages like Ruby.

------
symlock
Really neat to see beaker broswer support native P2P with progressive
enhancements like github.com/beakerbrowser/hashbase providing the cute URL's
we all like.

I'm sure hashbase.io will be blocked really quick, so it's important that the
core P2P address system stay in the forefront. Transports also need to find
many ways to communicate over https, shadowsocks, tor, DNS, and others.

------
fjabre
Doesn't answer your original question but I moved to syncthing from dat. It
just works out of the box, no manual setup, and a very active community around
it. Also opensource and in production for years with rave reviews, and being
used in plenty of big scale production projects.

------
EGreg
I never understood how Beaker browser can act as a server listening on a port.
It sounds like you always need relays on the internet because your router and
ISP is gonna block all ports unless requested not to.

[https://www.scuttlebutt.nz/stories/design-challenge-avoid-
ce...](https://www.scuttlebutt.nz/stories/design-challenge-avoid-
centralization-and-singletons.html)

~~~
jimpick
Firewalls are a pain.

UDP hole punching (using the UTP protocol) and the discovery network works a
lot of the time.

Much of the people publishing public content for access by Beaker are using
hashbase.io to "pin" the content and to act as a public peer, and those ports
aren't behind a firewall, so the data can be directly replicated easily.

------
nottorp
Articles like these would benefit from a link summarizing what that technology
is about...

At least add to the title on HN something about what a dat:// link is :)

------
3pt14159
I still wish these guys all the best, but if you want to start doing 3rd party
like this then everything will eventually devolve like it did for the normal
web. We do need a new web and way of moving information, but once you start
directly connecting to servers they need to know who you are.

~~~
pfraze
I don’t hold this binary view that a “decentralized Web” has to avoid certain
technologies. I believe we should aim to use peer-to-peer systems where it’s
impactful and practical, and in the rare case you do need a third-party
server, there are things you can do to limit your dependence and make them
easy to reconfigure.

That’s our approach with Beaker. We use peer-to-peer systems as much as
possible, and then plug in servers as minimally as possible.

~~~
3pt14159
I upvoted you and wish you all the best, but the core problem I see with this
is that it makes it hard to make policy around.

If I'm making a new system or setting policy for a government or other high-
security minded client (like a political campaign, military contractor,
activist group, or private intelligence corp) I need off the shelf stuff with
zero known attack surface OR I need to individual vet every single offering
within that protocol suite. This is why you can email members that work for
The Government of Ontario, but they won't click on links to non-whitelisted
places. The attack surface when clicking a link is fucking huuuuuge (pdf 0days
anyone?), while the attack surface for loading an email is much smaller.

There are a ton of interesting web-replacements that hackers are playing
around with right now, but the one that wins for the next web is the one that
lets stupid people do whatever they want without worrying. In my opinion, 3rd
party means worrying, and in an ideal world it would go away.

The irony of this whole thing is that I'm actively arguing against my own
long-term interests. A structural change of the kind I advocate for would
dramatically reduce the profitability of being in either data science or
cybersecurity; both fields I have a foot in. But I don't care.

Securing the flow of information between people is too important to humanity's
long term survival.

~~~
pfraze
Yeah that's an interesting perspective. There are a lot of security issues
that come into play when we start toying with how the Web platform works, and
I'm somewhat curious whether all Websites should have a sort of "uninstalled"
versus "installed" mode, where the uninstalled mode is basically able to do
nothing. Then users have to go through an "install" flow to enable the riskier
APIs.

I think one other area that the Web hasn't tapped into enough is using
protocol/scheme identifiers to introduce strong guarantees to URLs. You can
compose schemes with a '+', so I think if you wanted an "on click guarantee"
that a site is going to have certain security properties, you might try
something like:

http+safe://.../ dat+safe://.../

And then the site would load in a "safe mode" which, like the "uninstalled"
mode, is extremely limited in what it can do.

------
wesleytodd
dat://pfrazee.hashbase.io/blog/dat-and-servers

------
seymour333
I had to read the title 10 times before I realized it wasn't someone asking:
"how do dat?"

