So how does this compare with quilt? From what I see 1. Quilt is for profit whil...

tbv · on Sept 27, 2017

I'm one of the creators of the Beaker browser[1] and the reason we use Dat is that as a p2p protocol, it offers a lot of neat properties, including making datasets more resilient. As long as one peer on the network is hosting a dataset, it will be reachable, even if the original author has stopped hosting it.

I won't speak authoritatively on behalf of the Dat team, but I believe one of their goals is to make it difficult for public scientific datasets to be lost, and data living on a centralized server is particularly vulnerable to that.

1. https://github.com/beakerbrowser/beaker

rspeer · on Sept 27, 2017

The use case really speaks to me, but I'm not convinced that decentralization is going to help datasets not to get lost.

I spent a while trying to download recent updates to the Reddit comment corpus [1], which is hosted on BitTorrent. The downloads never seem to finish.

It seems to me that decentralization means that, when a dataset stops being new and exciting, it will disappear. How will Dat counter this?

[1] https://www.reddit.com/r/datasets/comments/65o7py/updated_re...

yoshuaw · on Sept 27, 2017

Because Dat is just a protocol, decentralization is a choice. For quick, ephemeral exchanges direct P2P works brilliantly. For longer lived data sets, sharing it with a (commercial) mirror might make sense. Or perhaps you host it yourself. The beauty is that you, as a user of the protocol, get to decide what works best for you.

filiwickers · on Sept 27, 2017

We have a few approaches to the disappearing data.

First, we are working with libraries, universities, or other groups with large amounts of storage/bandwidth. They'd help provide hosting for datasets used inside their institutes or other essential datasets.

Second, we started to work on at-home data hosting with Project Svalbard[1]. This is kind of a SETI@home idea where people could donate server space at home to help backup "unhealthy" data (data that doesn't have many peers).

Finally, for "published" data (such as data on Zenodo or Dataverse), we can use those sites as a permanent HTTP peer. So if no data is available over p2p sites then you can get it directly from the published source.

As others said, decentralization is an approach but not a solution. It gives you the flexibility to centralize or distribute data as necessary without being tied to a specific service. But we still need to solve the problem!

[1] https://medium.com/@maxogden/project-svalbard-a-metadata-vau...

tbv · on Sept 27, 2017

That’s something we think about a lot, and decentralization isn’t a silver bullet solution to data loss, but I do think it’s more resilient than what we typically do now.

To counter that, you can take measures to mirror important datasets with a dedicated peer. It requires effort, but it at least makes it much, much harder for example, for a government agency to take down public data without warning.

kevinSuttle · on Sept 27, 2017

Why dat or quilt and not blockchain?

tbv · on Sept 27, 2017

A blockchain is an over-engineered solution to the problems we’re trying to solve. Blockchains provide shared global state. We don’t need that.

https://beakerbrowser.com/docs/inside-beaker/other-technolog...

jhoechtl · on Sept 27, 2017

A blockchain is a rather weak database in itself. However to store pointers into a dht like Dat would be fine

kevinSuttle · on Sept 27, 2017

A rather weak database? How do you figure?

pfraze · on Sept 27, 2017

This may not always be the case, but, so far, blockchains have low throughput and fat datasets that you have to sync. Compared to other databases, they don't perform that well, so if you don't need decentralized strict consensus, a blockchain isn't a good choice.

filiwickers · on Sept 27, 2017

Ah cool! I hadn't seen quilt before. You are spot on with the differences.

Dat is targeting similar users to Quilt. But we are also looking more broadly at libraries, labs, or other larger academic/gov't organizations managing data. There are a lot of data publishing tools in the sciences such as Zenodo. We'd love for it to be easier to download/publish data to those places. Dat is decentralized, so it really fits well in integrating other data tools.

You can use Dat to replace file transfer software like rsync, so it is a bit more general purpose.

Another difference not mentioned is that Dat really starts at the protocol level while Quilt is more software-focused. Dat protocol is a peer-to-peer protocol for syncing files, modeled off Git and BitTorrent. We built the data management software on top of the protocol.

Edit: I should mention we don't offer any hosting right now, all the data up there is temporarily cached. There is public hosting via Hashbase[1] from the Beaker team. The cool part about Dat being p2p is that it's really easy to switch hosts or use multiple hosts.

[1] https://hashbase.io/

zitterbewegung · on Sept 27, 2017

If I uploaded a bunch of data that was obtained illegally there is nothing stopping me from doing that right?

Also, is your peer to peer network able to be attacked by nefarious users like a sybil attack? Is there a situation where I could alter or forge data?

filiwickers · on Sept 27, 2017

> If I uploaded a bunch of data that was obtained illegally there is nothing stopping me from doing that right?

The hosting provider is responsible for removing illegal content. Dat itself doesn't track any content. The datproject.org is more of a registry, not a host.

> Also, is your peer to peer network able to be attacked by nefarious users like a sybil attack? Is there a situation where I could alter or forge data?

No, only authorized people can write to each dat key (currently only the owner, but multi-writer is coming soon). All the writes are signed with the writers private key and then verified whenever content is downloaded.

jhoechtl · on Sept 27, 2017

> I understand who Quilt is targeting but I'm having trouble understanding who Dat is targeting

Academics, open data enthusiasts, hackers