
Dat: A P2P hypermedia protocol with public-key-addressed file archives - pcr910303
https://www.datprotocol.com/
======
JakeAl
Kind of stagnant, but I liked the IDEA of what was being done with the Beaker
Browser using the DAT protocol to run a self-hosted P2P web. I just wish
someone would take the project to completion/usability. It would really
disrupt things. [https://beakerbrowser.com](https://beakerbrowser.com)

~~~
pfraze
We decided to take a year to go heads down and rework a lot of stuff, so we
did get publicly stagnant but we're fulltime on the next version and should
release a public beta soon. I tweet a lot about our progress if you can sort
through my sillier tweets (@pfrazee).

Past year has had a lot of improvements

\- Protocol moved to a hole-punching DHT for peer discovery (hyperswarm).

\- Protocol now scales # of writes and # of files much better. We were able to
put a Wikipedia export, which is millions of files in 2 flat dirs, into a
single "drive" and get good read performance. This performance bump came from
an indexing structure that's built into every log entry (hypertrie).

\- Protocol now supports "mounting" which is a way to basically symlink drives
onto each other. Good composition tool, esp useful for mounting deps in a
/vendor directory.

\- Browser now has a builtin editor that splits the screen for live editing.
Feels similar to hackmd.

\- Browser added a bash-like terminal for working with the protocol's
filespace. It's glued to the current page so you can drive around the web
using `cd`.

\- Browser added a basic identity system. Every user has a profile site that's
created automatically and maintains an address book of other users.

\- We built out application tooling a fair amount. It's fairly easy to build
multi-user apps now, where previously it was a bit of rocket surgery.

Some of the year was spent prototyping ideas and throwing them away as well. A
bit inefficient, but helped us learn.

~~~
grng3r
I've been reading about dat and doing some basic projects for about a year in
my spare time, stagnating a bit lately as am finishing my degree. As I
remember there was some talk about Python and Rust implementations at the
time, is that still under development as well? I am not as enthusiastic with
Js but really like the protocol and the idea behind it.... Thanks

~~~
pfraze
Yeah I do know there's a Rust impl that people are working on

------
rzzzt
"How Dat Works" linked from the page is still a great visualization of all the
bits that make up the protocol:

\- [https://datprotocol.github.io/how-dat-
works/](https://datprotocol.github.io/how-dat-works/)

\-
[https://news.ycombinator.com/item?id=20363813](https://news.ycombinator.com/item?id=20363813)

~~~
dang
Also
[https://news.ycombinator.com/item?id=20811936](https://news.ycombinator.com/item?id=20811936),
a related project from 2019.

------
duskwuff
Conceptually, this sounds very similar to Freenet; a DAT URL seems analogous
to a Freenet SSK.

What I don't see is anything analogous to a USK -- there's no obvious way for
an author to distribute an update to content they have published. It's also
unclear how much (if any!) privacy this protocol provides to content
publishers or requestors -- the use of discovery keys only provides protection
for users requesting content which an eavesdropper has no knowledge of.

~~~
rakoo
There is no equivalent to an USK because dat doesn't give direct access to old
revisions. What you get is always the latest.

The old content is still there though, and you can access it, just not in an
"easy" manner: [https://docs.dat.foundation/docs/faq#does-dat-store-
version-...](https://docs.dat.foundation/docs/faq#does-dat-store-version-
history).

The dat url references a given private key, and that's about it in terms of
privacy. Transfer is done between two endpoints on the normal internet, so
neither peer is "hidden".

It's a shame but the dat content is spread in many places and it's hard to get
access to all the documentations. The most impressive and interesting part of
the project right now is probably Beaker, you should have a look:
[https://beakerbrowser.com/](https://beakerbrowser.com/)

~~~
duskwuff
> There is no equivalent to an USK because dat doesn't give direct access to
> old revisions. What you get is always the latest.

How do you guarantee that you're getting the _latest_ content, though? If a
peer has a cached copy of an older revision, wouldn't you end up with that
instead, since there's no way to distinguish between them?

> The dat url references a given private key, and that's about it in terms of
> privacy. Transfer is done between two endpoints on the normal internet, so
> neither peer is "hidden".

What concerns me here is that, from what I'm reading, it seems like any client
on the local network could eavesdrop on mDNS requests to determine what
content other clients are viewing. Worse, a client could announce itself with
the discovery key for a well-known piece of content to be notified when any
other client, anywhere, requests that content.

 _This is a worse privacy model than unencrypted HTTP._ Are you aware of any
plans to mitigate this?

~~~
rakoo
> How do you guarantee that you're getting the latest content, though?

Only the original creator can update the content. You can never know you're at
the latest version until you've connected to them and they've told you "this
is the last I have"

> What concerns me here is that, from what I'm reading, it seems like any
> client on the local network could eavesdrop on mDNS requests to determine
> what content other clients are viewing. Worse, a client could announce
> itself with the discovery key for a well-known piece of content to be
> notified when any other client, anywhere, requests that content.

Disclaimer: I'm not an expert on the project, only following it because it's
cool.

As far as I know, the only obfuscation is that keys are hashed so that you
can't infer what content is being exchanged just by listening to the network.
However when you want to watch a specific key, you can get on the swarm and
see who's there.

Note that dat doesn't attempt to solve the same problems as Freenet does. They
have different goals, and as such can't be compared on something that one
explicitly focuses.

~~~
pfraze
You have the details right. I think it's fair to say that the protocol is very
leaky with its metadata right now. In a local network, it would be wise to
only exchange announcements with trusted devices. In the global network, it
would be wise to introduce some kind of proxy (distributed or not).

------
StavrosK
I have failed time and again to grasp how exactly Dat works. IPFS is easy for
me to grok; your content is hash-addressable, thus immutable. Can someone
explain in a sentence or two how Dat handles content?

~~~
rakoo
Bittorrent is immutable, Dat is the mutable version of Bittorrent.

You create an archive, that is identified with a cryptographic pubkey. You add
files in it, dat stores some metadata in it. You give the archive's id to a
friend, who starts retrieving the metadata, and can then see there are files;
he can download files as he pleases. Syncing can also be realtime, so he gets
the new content as soon as you put it.

Only you, the holder of the cryptographic privkey, can add content to the
archive. Crypto is used to sign all content, so there's no doubt it was
legitimately written by you. Since the id doesn't move, it is possible for
multiple peers to inter-connect and exchange data as needed in a swarm
fashion, even if you're offline

~~~
StavrosK
That's a great explanation, thank you. How does this handle versioning? If I
add one file to a vast dataset, can people download just that file if they
already have the previous version? IPFS hashes at the chunk level, so even if
you append something to a large file, you can download just the new chunk and
be up to date.

~~~
rakoo
Yes, in fact dat was initially created for this use case: data analysts want
to exchange their data, and that data evolves in time so it needs to be
transported "efficiently", as in, you don't need to redownload a full .zip
just for a single file change. The only metadata you'll receive will concern
the new file and the old content is still valid. You can even seek inside any
file if you don't want to download the whole file.

Changes _inside_ a file, though, are not handled. Today if a file is modified
dat will consider any bytes of the old one to be garbage and will not reuse
it.

dat is a sexy frontend on top of
hyperdrive([https://github.com/mafintosh/hyperdrive](https://github.com/mafintosh/hyperdrive)),
I personally think it's easier to see what dat can do by looking at what
hyperdrive does

~~~
StavrosK
Thanks, that's very informative, and the intro in the Hyperdrive README
clarifies the goals very well. I have a much better idea now, thanks again.

------
mattlondon
When I last looked at dat some time ago I seem to recall that there was no way
to revoke a dat or set a TTL-type field.

 _Can dat forget?_

E.g. If I created a dat, that was it and the data in the dat was then
potentially out there forever and ever and ever in the distributed network.
There was no way to tell a client "this dat is only good for 90
seconds/minutes/hours/days/years/decades/etc - after that please drop/delete".

I know there is this desire in certain circles for Blockchain-stylee
"everything is stored for ever and we can cryptographically prove every single
byte all the way back to the dinosaurs!" sentiment, but I am not sure that
really jives with a distributed data/Web publishing protocol (at least in my
mind) - I want to be able to reliably "delete" something. If I know that
_anything_ I ever publish will be irrevocably around forever, it has a
chilling effect on what I choose to do and publish with dat (plus maybe legal
challenges? IANAL but e.g. GDPR? Right to be forgotten?)

E.g would we actually gain anything if hypothetically Google supported dat and
so then every single Google search result page ever generated was stored
forever in a dat? Would future users benefit from storing decades of archived
versions of Google search result pages for "Facebook" (because people search
for "Facebook" then click the first result instead of just typing
"facebook.com") or "weather" which then need to be endlessly duplicated around
the network for the rest of eternity? Seems unlikely to be of any benefit to
me - surely better to mark some of it as ephemeral and let the data naturally
timeout and die?

Does anyone know if that has changed with dat and data can now die? Or have I
just misunderstood?

~~~
tangent128
Data in a dat, just like data in a torrent, only lives as long as there are
peers interested in seeding that particular dat; it's not a blockchain-style
"everybody has to replicate everything".

You can't unilaterally withdraw content somebody else is also seeding; the
best you can do to my knowledge is publish a new version that replaces all the
dat's content with a "please delete your history of this archive" message, but
as with any distributed system you have no real way of knowing if that was
respected.

------
karissa
Here are all the projects that are using components from the dat team over the
years:
[https://dat.foundation/explore/projects/](https://dat.foundation/explore/projects/)

and [https://cobox.cloud/](https://cobox.cloud/)

------
Already__Taken
Wouldn't a protocol like this be a killer feature in the next package manager
to solve problems npm had? Then you've just got a discoverability issue
without the whole infrastructure headache on top to maintain.

------
aabbcc1241
DAT is not new, I've been building stuff on top of it. Good to see more people
are awared of it.

------
rapnie
Nice. Aral Balkan of small-tech.org is having plans with Dat for Tincan and
Site.js:

[https://small-tech.org/research-and-development/](https://small-
tech.org/research-and-development/)

------
rasengan0
i swung and missed :-(
[https://github.com/datproject/dat/issues/1008](https://github.com/datproject/dat/issues/1008)

------
dangoor
Can Dat really be described as a "new" protocol 3 years in?

~~~
api
Yes, as it's still under heavy development and has not yet seen heavy usage.

This mentality that things either flip instantly to mass adoption or are "old"
needs to die. All the easy stuff that can be done in 6-12 months has already
been done. Anything worth doing today is going to take a minimum of 1-2 years
of R&D unless it's nothing more than a packaging and polish/branding of
something already in existence.

~~~
dangoor
> This mentality that things either flip instantly to mass adoption or are
> "old" needs to die.

That's honestly not where I'm coming from, though. I think Dat is a cool
project and I've been aware of it from, likely, close to the beginning. The
scale between "new" and "old" is not binary, nor is it necessarily related to
"mass adoption". There are people who still treat React as though it's "the
new hotness", despite the fact that it's been out for 6 years and is widely
adopted.

I'll grant that different people will have different thresholds for what they
see as "new". I just personally think Dat has gone beyond "new" and is in a
phase of maturation. It's even got a browser with deep support!

