Show HN: Rarbg on IPFS

Zopieux · on June 4, 2023

The sqlite db contains a imdb column but it seems like the author forgot to include it in the fts4 index, meaning one cannot search for "tt9916362" even though it's right there in the database. It's a shame because this curated mapping is the most useful aspect of RARBG.

FYI, this dump has 826'201 magnets (torrents) with an associated imdb ID and 2'017'490 without, including lots of porn but also random music and software.

Category breakdown (careful though, most items aren't categorized or use a category code I couldn't interpret):

      XXX ( 1):     2,255
      XXX ( 4):       607
   Movies (14):     3,206
   Movies (17):   117,440
       TV (18):   198,314
    Music (23):    11,621
    Music (24):   471,161
    Music (25): 1,339,739

Avlin67 · on June 4, 2023

> including lots of porn

but not all

away_rar10000 · on June 5, 2023

that's right, it's missing all of the vr180 stuff, and seems like it might have been the only place to find it.

parakovsky · on June 4, 2023

so what's to change in index file?

rhqq2 · on June 4, 2023

Related: I just dropped a mega DB archive dump for everything I have with regard to RARBG. My hope is others will find this useful.

https://github.com/sleaze/rarbg-db-dumps

https://news.ycombinator.com/item?id=36187767

thyrox · on June 4, 2023

This is just amazing work!

I have a question for you as even playing with this or creating oss project using this sounds like inviting trouble. So if any programmer wants to create an open source project using this data (just for kicks) then apart from using a VPN and throwaway email do you need to be careful about anything else? Any tips?

thunkymonkey · on June 4, 2023

Never link discussion of it with your other alts. ;)

tough · on June 4, 2023

Oops too late.

Can ask dang to remove

dark-star · on June 5, 2023

How does this compare to the ipfs dump that someone claimed has over 2 million entries? Your DB dump has only around 1.6 million... so does ipfs have duplicates? or is there something substantial missing in your dump?

mmastrac · on June 4, 2023

I was curious how this works and then I saw the sqlite requests in the network tab. It's amazing to see what we have access to these days -- SQLite over HTTP over IPFS to provide a giant, censorship-resistant database!

remram · on June 4, 2023

IPFS is distributed but not anonymized, so you can send abuse requests to people's ISPs to take content down (like with torrents).

The HTTP part is also not censorship-resistant. ipfs.io has a DMCA process, and they could also be asked to reveal the IP of users.

bscphil · on June 4, 2023

> you can send abuse requests to people's ISPs

In practice it's very rare for me to see a direct ipfs protocol link, almost all the traffic goes through HTTP gateways (which frequently cache the content as well). Hard to imagine that they don't become a target for "hosting" pirated content if / when IPFS becomes more than a negligible platform for piracy. (A significant amount of Library Genesis traffic is already using IPFS via these same gateways.)

As you mention, there's a DMCA process for some of the gateways, but that might not be enough to ward off attention.

tylersmith · on June 4, 2023

IPFS can be accessed through Tor immediately through a gateway, and with a little work through the ipfs client directly.

remram · on June 5, 2023

Tor+IPFS is much more resistant yes.

activiation · on June 4, 2023

Use Tor?

qersist3nce · on June 4, 2023

Can you explain bit more? Isn't the dump of last RARBG magnet links on the order of MBs? We can just download it and grep in plain text. I don't get what is the role of SQLite or IPFS here.

phiresky · on June 4, 2023

This blog post explains how it works: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

Disclaimer: I wrote that article and was somewhat involved in the other sqlite over ipfs project this is forked from.

Yes, for MB size files just downloading the whole thing will be faster and much easier - even if running in the browser. I'd say the boundary is somewhere around 10MB compressed (~20-200MB uncompressed). Looks like the sqlite dump used here is ~400MB in size, ~180MB compressed. Loading that in the browser would probably work but it wouldn't be too great.

runeks · on June 5, 2023

> Isn't the dump of last RARBG magnet links on the order of MBs?

As I understand it, the purpose of using SQLite is indexing and querying, ie. being able to efficiently search through the data via a website.

romnon · on June 4, 2023

this, anyone above 20 can operate.

TekMol · on June 4, 2023

It's censorship resistant if enough nodes mirror the sqlite file I guess.

Is there any incentive for nodes to do so?

Is it possible to see a statistic about how many nodes mirror it?

lyu07282 · on June 4, 2023

torrents work fine without having a monetary incentive, this misconception of the blockchain crowd really has to die its killing real decentral solutions

TekMol · on June 4, 2023

I said "incentive". You made it "monetary incentive".

What do you mean with "torrents work fine"? Do you have any statistics on the lifetime of torrents?

Everything ever written to the Bitcoin blockchain is still available and massively distributed. It seems very unlikely anything will ever get lost during our lifetime. So I would say that is an example that monetary incentives do work?

Barrin92 · on June 4, 2023

>Everything ever written to the Bitcoin blockchain is still available

This isn't useful for 99% of stuff and pushing every video and music file on earth through a global state machine with the performance characteristics of an Atari from the 80s would render the thing inoperable. The entire Bitcoin network has a bandwidth of like 1mb every 10 minutes, the total size of the blockchain is half a TB in total, people torrent more porn in the time it took me to type this.

Maintaining a global, complete history of transactions only ever made sense for one problem, double spending, and is utterly useless for sharing files.

sebmellen · on June 4, 2023

IPFS is not a blockchain in the traditional sense. You might be conflating it with Filecoin.

abwizz · on June 5, 2023

ok.

but nobody said otherwise

bakugo · on June 4, 2023

Public torrents die all the time.

xtracto · on June 5, 2023

They dont. Do a search of something older than 5 years in btdigg and you will most likely get only dead magnets.

xnyanta · on June 5, 2023

They do on private trackers where there is an incentive usually tied to ratio or seed time. It's not a monetary incentive and it works.

xtracto · on June 5, 2023

Aaah private trackers! I remember once I pondered to enter one of those "special groups" but bailed when I read that you had to study for some sort of interview that was required to try to join .

Lol no... I dont like interviewing for a new job, why would i agree to get tested for something as stupid as piracy? (Unless... mention me ONE private tracker that doesn't have piracy).

So for 99% of people, my point stands.

nephanth · on June 8, 2023

I remember what.cd had interviews like that. Tried and failed as a kid, that was a pain

There are definitely private trackers that don't have anything like that though (iirc they're called "semi-private trackers"). One i remember using is t411 when it was still running

WeylandYutani · on June 5, 2023

Sharing is caring!

jazzyjackson · on June 4, 2023

it's content addressed, if an ISP was so keen it would be simple to blackhole requests for a particular file (of course, one need only to change 1 bit to get a new content hash, but you have to redistribute the new file from scratch

dietr1ch · on June 4, 2023

I don't know about how IPFS is implemented, but you could use content-addressed blocks underneath every file too. This way flipping a bit means that only one underlying block changes, and that the new file would share N-1 blocks with the previous one, making redistribution only require sharing a single block.

SpaghettiCthulu · on June 5, 2023

Well, wouldn't they just blackhole all the relevant blocks then?

dietr1ch · on June 5, 2023

Yes. Probably there's no replacement for privacy and anonymity being baked in into the system.

lordofgibbons · on June 4, 2023

A note of caution for those unfamiliar with how IPFS works.

It's very similar to BitTorrent with how content distribution happens. Your local node will broadcast which content it has available (downloaded).

If you access a piece of content you automatically become a host for it so you still need to use a VPN if you live in a country where you can get sued.

malikNF · on June 4, 2023

I think your post needs to clarify this is true only if you use the ipfs client to access things hosted on ipfs.

Accessing ipfs.io/ipfs/ doesn’t do anything you mentioned. Its just a gateway.

You could take the link on the main submission. And replace ipfs.io with any (most, because some are offline) of the links here https://ipfs.github.io/public-gateway-checker/

And it will still work.

lordofgibbons · on June 4, 2023

That's true, and I should have clarified that.

However, is using a proxy like ipfs.io really using ipfs?

If everyone did that, there's no point to using this protocol. The strength of the network comes from the fact that the content gets replicated/distributed when accessed. That doesn't happen when accessed through a proxy.

lgats · on June 4, 2023

The benefit is ease of access, if content gets blocked on one mirror (ipfs.io), it may be available on another (cloudflare-ipfs.com/ipfs/, etc)

Saris · on June 4, 2023

>However, is using a proxy like ipfs.io really using ipfs?

Yes! Because the gateway can go down and you can still access the content via local IPFS node or by using another gateway.

RobotToaster · on June 4, 2023

Tangential, but does anyone know what's happening with the JS IPFS implementation? when I tried it a while ago it seemed broken.

Alphatx · on June 5, 2023

The implementation works well, but has limitations. It only works from browser to node and not browser to browser. Depending on what you are looking for it is more or less fast

Grimburger · on June 4, 2023

Using Brave it makes this very clear, the above link won't open by default without the user choosing to run a local node or use a public gateway instead. It explains the implications of both options.

Screenshot: https://i.imgur.com/ZP6AgPp.png

lgats · on June 4, 2023

Most browsers don't support the ipfs protocol and instead only use the gateway.

evilllkint · on June 4, 2023

I love Brave, so useful.

TekMol · on June 4, 2023

That might be how some client(s) work, but I don't think that is how IPFS works.

If you access a piece of content over ipfs.io for example, I would think you just make https requests like to every other website.

jacooper · on June 4, 2023

Because its using a gateway. I think if you are accessing it natively through IPFS, it will default to adding you as a peer.

TekMol · on June 4, 2023

That would hold true if IPFS would be a specific application.

But it is a protocol.

I would not expect that the protocol specifies that a complient client shares what it downloaded. That sounds more like a choice the client developers make on their own.

But I'm happy to be corrected if someone has a link to the protocol definition and it says different.

nightpool · on June 4, 2023

By default, what the GP says is true. If you can point to specific popular clients that do not have that behavior by default then I think that would be more convincing than saying "Well technically there's no requirement" when it's impossible to avoid in practice without writing your own custom implementation from scratch

wuiheerfoj · on June 4, 2023

Afaiu, [Lassie](https://github.com/filecoin-project/lassie/) is the latest and greatest of IPFS clients and it does not store/provide any data that you access

rahimnathwani · on June 4, 2023

AIUI Lassie uses indexers to locate content instead of using a distributed hash table (DHT). So a content identifier (CID) might be present on IPFS (and downloadable by a regular IPFS client), yet 'invisible' to Lassie.

If my understanding is correct, it can't be considered a complete IPFS client.

https://docs.filecoin.io/storage-provider/architecture/netwo...

https://en.wikipedia.org/wiki/Kademlia

wuiheerfoj · on June 4, 2023

I think it uses indexer and saturn (which ought to fall back to the DHT if it hasn’t cached the content of a CID already). That said they have a GitHub issue tracking CIDs that are retrievable via Kubo (fka go-ipfs) that aren’t yet retrievable via Lassie - it suggests to your point that there must be some difference; it also suggests they intend for retrievability parity though, so fingers crossed!!

rahimnathwani · on June 5, 2023

This issue, right?

https://github.com/filecoin-project/lassie/issues/136

kkielhofner · on June 4, 2023

Yes and the many IPFS implementations (I've tried them all) consume tremendous amounts of bandwidth and require shocking levels of system resources (considering what it does fundamentally).

There's an entire cottage industry of IPFS pinning and gateway providers that exist largely because of the challenges of running your own IPFS node for anything beyond casual use.

somat · on June 4, 2023

Ipfs is is a neat project with a lot of interesting ideas. but, yeah, the deamon is often way too resource hungry. I would like to see a lite/embedded mode that would better run in resource constrained or casual setups.

gkbrk · on June 4, 2023

If you access a piece of content over ipfs.io, and you don't have your browser set up to actually do those requests over a local IPFS daemon, you are not using IPFS. You are just using a normal centralized website.

cramjabsyn · on June 4, 2023

It depends if you run a node or are accessing via a proxy/gateway

robbintt · on June 4, 2023

Is it recommended to run my own proxy then, and is there any boilerplate project out there? I could also use OpenVPN, but seems like I just want to proxy ipfs, not my whole connection.

jacooper · on June 4, 2023

Also AFAIK if no one acceses the content for a long time, it can get lost, like torrents.

willsoon · on June 4, 2023

But now, I dont know why, theres a lot of IPFS pinning service I got HUGO hosting in GH and deploying in FTPS from fleek.com. It has nothing on it but it works like a charm.

willsoon · on June 4, 2023

No, you are actually accessing the web. But you have to download using torrent, so to some extent it is true what you note.

EmilStenstrom · on June 4, 2023

Relevant context: Rarbg just shut down (see message on https://rarbg.to/)

hannofcart · on June 4, 2023

Here's a fun way to spend the next 15 minutes on the site. Find search terms that satisfy both the above constraints:

1. Has at least 20 results

2. None of the first 20 results is porn content

I even tried with Math and Chess. No dice.

progbits · on June 4, 2023

"Quantum" works.

That was my second attempt, I tried "physics" first but oh boy was I naive.

no_time · on June 5, 2023

Easy. Pick a scenegroup that only releases SFW content. "CODEX" "FLT" and such.

primax · on June 5, 2023

Got it on the first try with Star Wars

Retr0id · on June 4, 2023

This works by doing range requests on an sqlite DB.

Very cool, but does ipfs support verification of partial reads like this? (or would I need to download the whole DB and check the hash?)

I can think of some ways it could work using merkle trees or similar, but I have no idea what ipfs does under the hood, if anything.

ianopolous · on June 4, 2023

IPFS doesn't currently. But there are some in the community working on exactly this using Blake3 and BAO.

whyrusleeping · on June 4, 2023

The incremental requests are verified if running through your own node as a side effect of loading the data

ianopolous · on June 6, 2023

G'day Why! This is operating through the gateway. So you have to trust the gateway currently. Yes, you can move the gateway to your machine and then the trust requirement doesn't extend over the network.

However with local ipfs, bitswap doesn't support range requests, so you're at least downloading the enclosing blocks which could be 2 MiB for 1 KiB requested data, or 2000X more data than you need.

rch · on June 4, 2023

Is there an IPFS dedicated to training data? A mirror of input datasets and fully open models resident on HuggingFace could endeavor to cut out onerous license agreements when possible.

radicalriddler · on June 5, 2023

Sigh... came straight off the weekend, onto my work pc, and clicked an ipfs link.

Got a message from the cybersecurity team being like, "pls don't do that". Thanks HN.

Pretty cool though, interested in how IPFS develops in the future.

hirako2000 · on June 4, 2023

How does it get updates?

willsoon · on June 4, 2023

Now IPNS is a thing. Months ago it just didn't work, at least for me, now something is happening in the way IPNS works.

ChadNauseam · on June 4, 2023

IPNS kind of sucks but it’s “worked” for me for years. (sucks = very slow name resolution and slow update propagation, at least when I was using it a couple years ago)

ikesau · on June 4, 2023

Do you have any advice on how to work out if an IPFS project is using IPNS? I can't see anything on this page that suggests it is, but maybe there's a better, more intentional technique over searching "ipns" in the inspect source and network tab : )

lyu07282 · on June 4, 2023

can you elaborate what IPNS is and how it helps with getting newer torrents up there?

tomodachi94 · on June 4, 2023

I'm not the person you replied to, but I'll answer anyway :)

IPNS is essentially DNS over the IPFS network. IPNS domains point to a specific IPFS file (or a set of files, like we see here). IPNS domains are signed with a private key; when you want to update your IPNS entry, you add your new content to a new IPFS file and then you update the IPNS entry by signing it with a public key.

mutant · on June 7, 2023

It don't. This is all archive.

Grimburger · on June 4, 2023

Looking for some recent (last week) content doesn't appear. What's the cutoff here for the DB?

Zopieux · on June 4, 2023

This seems to be derived from a dump that surfaced the internetz around June 1[0], even before the GitHub repo[1], which is probably the result of a random person/team's archiving/scraping effort. We won't be able to know the cutoff or the coverage percentage without that person's insights.

[0] "RARBG Torrent Files (QUICK SORTED) [v0.1].7z"

[1] https://github.com/2004content/rarbg

omginternets · on June 4, 2023

Is there any other interesting content on IPFS? Does HN have any favorite links?

ctrl-vvvvvvvvv · on June 5, 2023

Does anyone know if I can get sued for opening this up and searching "test"?

I live in Germany, and they are pretty harsh when it comes to downloading.

I opened this up while browsing HN without knowing what it was....

Udo · on June 5, 2023

I'm in Germany as well. You're fine. You didn't download anything, and more importantly, didn't UPLOAD anything. Accessing this index site is not fundamentally different from using any other search engine.

jesprenj · on June 5, 2023

How do IPFS HTTP mirrors deal with javascript origins? The origin here is always https://ipfs.io/, right? Or can a http server send a custom origin in the response headers?

lidel · on June 16, 2023

Web Browser use cases without native ipfs:// and ipns:// should use subdomain HTTP gateways like localhost in Kubo, or public dweb.link, cf-ipfs.com, which provide Origin per root CID, sandboxing web apps from each other.

See: - https://docs.ipfs.tech/how-to/address-ipfs-on-web/#subdomain... - https://specs.ipfs.tech/http-gateways/subdomain-gateway/

Alley33333 · on June 5, 2023

They put the database on IPFS network https://play.google.com/store/apps/details?id=app.magnetx.ma...

cookiengineer · on June 4, 2023

On a side note: The Eye is back up!

Fuck yeah <3

Saris · on June 4, 2023

I get an error on Firefox:

"DB worker could not be created

TypeError: second argument must be a function"

hacliff · on June 5, 2023

very cool! if anyones interested here's a generic implementation

https://github.com/hcliff/sqlite-ipfs

hacliff · on June 5, 2023

also, to the author, make sure you turn _off_ persistent local storage, the performance is horrific (https://github.com/ipfs/js-ipfs/issues/4045) - maybe helia is better?

Aachen · on June 4, 2023

I recently received a phishing email, hosted on IPFS. Seeing torrents on there too, now, makes me wonder how they deal with content that's illegal in whatever jurisdiction.

remram · on June 4, 2023

What is a RARBG?

edit: the "Getting Started ->" is actually a dropdown.

grubbs · on June 4, 2023

I just searched for that new Blackberry docu.

There is a lot of weird pornographic films with "Blackberry" in the name.

gigatexal · on June 4, 2023

Look if you want to find isos to Linux (wink wink) off the web just use Usenet.

bazmattaz · on June 4, 2023

Why when Torrents work fine and are free

anaganisk · on June 4, 2023

I think the best arguments I heard about it was, unlike torrents, the availability is longer. Because the "nodes" store them generally for years at minimum UpTo even decades.Downloads are as fast as your broadband connection, just regular SSL communication, hence no VPN. Don't have to join private trackers, and prove you are worthy enough to be allowed to download.

alt227 · on June 4, 2023

You dont have to join private trackers, but you do have to join a private usenet provider. The only companies holding years/decades long retention of files are the paid private ones.

metroholografix · on June 4, 2023

Breadth on Usenet is a lot worse though. One can find a private tracker to cover pretty much every -non mainstream- niche one can think of, and that also comes with a community attached.

mike31fr · on June 4, 2023

https://rarbg.tw still works for me. I don't get it, isn't Rarbg supposed to be down?

Zopieux · on June 4, 2023

Gotta love the deceptive patterns :-)

  Currently RARBG primary domain is:
  https://rarbg.to (actually links to https://rargb.to, note the bg/gb)

KomoD · on June 4, 2023

Because that's not rarbg, rarbg is "rarbg.to", not rarbg.tw, rargb.to, rarbggo.to, rarbgproxy.to, rarbgmirror.org, rarbgaccessed.org, rarbgget.org

Avlin67 · on June 4, 2023

but what's difference between rargb.to and rarbg.to ?

KomoD · on June 5, 2023

rarbg.to is the real rarbg, rargb.to isn't.

tigrezno · on June 4, 2023

The recommended torrents aren't actual though, some things are not working.

Avlin67 · on June 4, 2023

but why it cannot find results like original RARBG ? is it just a rarbg cache ? how can we fill up the cache then ?

VHRanger · on June 4, 2023

[deleted]

magikstm · on June 4, 2023

These are magnet links directly to torrents.

magikstm · on June 4, 2023

[flagged]

jakkos · on June 4, 2023

Why do you want rarbg to die?

magikstm · on June 4, 2023

[flagged]

ilaksh · on June 4, 2023

What do you mean ended their life? Like they killed themselves because someone pirated their software?

magikstm · on June 4, 2023

Yes. I know two persons that did.

jrflowers · on June 4, 2023

That’s interesting, it would be the first time I’ve ever heard of anyone committing suicide because of pirated software.

What was the precipitating cause? Lost revenue?

justinclift · on June 4, 2023

Evidence?

magikstm · on June 4, 2023

I can't share personal messages or such here. I don't feel it would be ok to share either of their obituary.

One of them was a very close friend. I knew the other one and there were articles on her published online.

Piracy could hurt someone several ways...

justinclift · on June 5, 2023

So "trust me bro" then.

Sure...

magikstm · on June 5, 2023

What's your email? I'll send you one.

isoprophlex · on June 4, 2023

Aren't these magnet links useless without trackers?

r3trohack3r · on June 4, 2023

No, most BitTorrent clients support a Kad based DHT for discovering peers.

tredre3 · on June 4, 2023

Pure DHT peer-discovery alone isn't that great but keep in mind that the magnet points to a .torrent file, which usually does contain a list of traditional trackers for the file.

phire · on June 5, 2023

Actually, a minimal magnet link only contains the infohash which "points" at just the info section of the torrent file. The info section contains the name, list of files and SHA1 hashes for all pieces, and the private flag.

The trackers are outside of info section, I assume so the list of trackers can be modified without effecting the info hash.

Though, some magnet links include an optional xs parameter which points at an http url containing the .torrent file, which would include the trackers.

In my experience, you don't receive the trackers from other peers (though maybe other bittorrent clients can?). However, if at least one peer you discover via DHT supports the peer exchange protocol, and has the trackers, your client can quickly query all the relevant peers.

somat · on June 5, 2023

fun, trivial trick, the magnet link is not necessary. all you need is the infohash. it is however very slow without the tracker hints.

    echo "magnet:?xt=urn:btih:${info_hash}"

ipfs basically feels like "this is it guys, our entire access pattern is going to be this"

npteljes · on June 5, 2023

The magnet doesn't point to a torrent file, but it can contain links to trackers (and other metadata as well).

r3trohack3r · on June 4, 2023

Why isn’t it great? You’re the first person I’ve met that has made this assertion.

npteljes · on June 5, 2023

Discovery via DHT is much slower than a tracker.

isoprophlex · on June 4, 2023

O wow, TIL... thanks!

SergeAx · on June 4, 2023

> SQLite compiled into WebAssembly fetches pages of the database hosted on IPFS through HTTP range requests using sql.js-httpvfs layer, and then evaluates your query in your browser.

Please don't do that with the non-static data. This is okay only for archived project, except the moment when my PC downloads SQLite indexes from some blockchain based paifully slow storage for the first time. BTW, how about a user-governed local data cache, is there any recent quirks in browsers for that? LocalStorage is still inconsistent and unreliable, right?

nl · on June 4, 2023

IPFS isn't blockchain based. It's slowness is mostly because it is badly implemented.

SergeAx · on June 12, 2023

TIL, thank you!