> How does this work?
> SQLite compiled into WebAssembly fetches pages of the database hosted on IPFS through HTTP range requests using sql.js-httpvfs layer, and then evaluates your query in your browser.
The same guide, https://libgen-crypto.ipns.dweb.link/, also explains how you can also download the page to search locally without constant internet access.
sql.js-httpvs was previously discussed on HN here: Hosting SQLite databases on GitHub Pages or any static file hoster (1812 points) https://news.ycombinator.com/item?id=27016630
I think also with IPFS i can share files with peers pretty easily? It's nicer than uploading to a filesharing site, and easier than setting up a torrent.
So, what's next @sixtyfourbits? Is there a read-only wikipedia on ipfs yet?
edit: found it, but I think it's not searchable https://en.wikipedia-on-ipfs.org/wiki/
The torrents are already widely replicated, but whether or not IPFS is going to be able to scale to the level required for 85M articles is still an unknown.
Also, in future there can be other access paths to this search. All of them go through different infrastructures and may or may not work for a particular user.
I understand that this isn't really their fault because they'd have to get a CA to issue a cert for the non-standard .crypto TLD, but has to be untrue, assuming I understand correctly that the HTTP version is just hosting a JS IPFS client? And therefore the non-IPFS link is suceptible to MITM attacks if I understand correctly.
Where can this MITM stick his foot in? The blockchain domain record is encrypted and contains the publicly visible target IP-address. Even if accessed via http, it can be instantaneously checked. And upon visiting that IP address, the site forces https encryption. Nobody can know from mere traffic analysis what exactly you are doing within the site, since it's encrypted by SSL.
IPFS encryption isn't necessary either, since content is addressed by its hash which automatically guarantees its authenticity. It doesn't matter if you use https or http.
It's only imperfect in the transition during DNS resolution. A 3rd party can know what site you go to, but not more than that. Suspecting a public generic blockchain-domain resolving node set up for freedom in defacing would be a bit too much. Usually those are OpenNIC servers or huge DNS providers, unless you specify a custom DNS resolver in your settings.
For a book site it's a pretty decently protected transition, actually maximal available for a non-expiring (unmanned) service which it is. Legit certificates expire or cost money. The system behind libgen.crypto is fully unmanned, i.e. eternal, except the files themselves which need hosting.
This isn't true if there is a MITM attacker. When you visit an HTTP website, the website doesn't get to redirect you to HTTPS or anything else, because the game is already lost.
When you visit a website over HTTP the attacker goes first. The legit response never made it to the client, because it was replaced in transit with a redirect to the attacker's scam/phishing/malware website.
The random on the coffee shop wifi.
Or the hacker on your apartment/university's poorly configured network.
Or your shady ISP.
Or your own government (especially likely in authoritarian countries decentralized solutions are supposed to help).
I see elsewhere you mentioned certificates on the blockchain. That could work, but someone has to actually create a standard and write the code to validate the certificate and get other people to use it, which hasn't happened yet.
Have a glass of wine and relax. There is a long line of people to catch before you get on the list.
Story time: back when I was in college a few years ago, the university network had some weird configuration where everybody in my dorm was on a single large local network. Somebody thought it would be funny if they ARP poisoned the network and redirected all HTTP traffic to shock websites. This would last 2-3 weeks until either they decided to stop or University IT finally caught them.
Regardless, I'm glad we moved the goalposts to "you don't need privacy" and conceded that my original comment pointing out how insecure this was is correct.
Papers are distributed with IP-addresses stamped in many pdf files upon their downloading from publishers, and nobody seems discussing it. This is incomparably more harmful than some random MITM somewhere done by someone and requiring an infrastructure invasion. But even this has not yet posed a real threat.
BitTorrent: anybody directly intercepts the IP-addresses of seeders, and again, no much worry. No need to hack in as with MITM, it's just yours, go watch.
So, no problem with MITM in this project, at all. People who want to steel the projects reputation or name, simply squat domains or make various groups.
In my opinion MITM is no much different from intercepting a phone conversation by connecting to physical wires going to your apartment. It's very localized.
It is not less secure since there's no equivalent more secure option. Don't mix problems of your network access with global decentralization. Decentralization alone is a way for better security by obscurity, but you should appreciate that whoever makes the project are volunteers having scarce resource and who don't want to make it a job for making it perfect for infinitesimal concerns.
I have no idea what "original" you refer to in this context. If the Web is more secure with broken HTTPS here and there and fully centralized access, you probably didn't fully understand what the dWeb project is doing.
And yes, the equivalent more secure option is running a website on the boring old normal internet. This solution actually gives more power to centralized operators, allowing your ISP and government to take over the connection whenever they want, a problem that doesn't exist for normal websites.
If your more secure alternative must be decentralized, then Tor hidden services are the go-to option, running on a decentralized network with actual working and battle tested security.
You can claim the problem is "infinitesimal" all you want, but until you point out a problem this solution solves that has more users being actively attacked than every person in China, I'll just assume you must be trolling.
What is the point of a decentralized solution that is less secure than the original and can be easily thwarted by more actors than the original (which we know happens to entire countries in the real world)?
About MITM I'd like to add that this event is an exception even for a single person, since (the same) MITM cannot occur on different millions of network we all randomly switch. Anybody would see that the target site doesn't behave as normal at some point, should such an event happen.
Indeed, malicious networks exist and the key points here about them would be:
1) the current libgen.crypto implementation is read-only and doesn't request anything of value to be transmitted over the network;
2) your personal visiting statistics would quickly reveal, if MITM attack occurred. Eventually MITM is not more than site defacing. It's not going to be unnoticed in a read-only project, if starts behaving suspeceously.
Everyone knows what results to expect from LG (remember, the original LG project sets reputation and ethics as the top priority), there should be no issue to simply stop browsing.
Also, to avoid local network tricks (which can be very harmful), use VPN whenever possible. Nowadays it seems to be a universal tool everybody should have.
And don't connect to random WiFi networks ever. Only to those which belong to organizations you visit and are trusted.
Your post was correct, yes, since it stems from a mere HTTP protocol observation, but it ignores why it's the only way to access for some systems with some features, and that the expected harm of it for an average individual is practically zero. All variations of LG have been running without SSL for longer than a decade globally, and no problem. So, on the practical foot it's not a concern, (take into account my other comments about various issues introducing HTTPS in every part of the system).
Let's quantify it somehow to actually see if this is a concern beyond an academic exercise:
1 user out of a million users on a million networks a year may get a wrong forward due to a MITM attack on his network and notice that it is not the site he has seen a hundred times before. The probability of such an event for an average individual is something like 0.00000000000001 per annum. I call it a practical zero.
Should one get a small permanent job servicing certification for a dozen randomly expiring systems and paying money with the risk that an expired certificate, should the person die, would practically block access to resource, to get the practical zero to real zero?
My answer would be definitely not, this would be waste of life. We all know Http has this flow, but return to that comment about using http: it actually tells you may not have access at all, if you use https (not always, though, but that comment is a hint, not a statement you don't need security). Here's the choice: access with http or secure no access via https? I think there is no real choice. Neither that comment tells you more than to remember a pattern to use with dWeb domain names which reliably works.
Summarizing, your logic is correct but not practically helpful.
Story time: about 10 years ago a forker from ebookoid came in to the LG forum and started aggressively promote his site, an LG fork, selling books, while pointing out how poor LG's security was since it had no SSL/HTTPS, and his site had it. A scammer with a legit encryption was humiliating a legit project without encryption.
I hope you get my point: don't make a storm in a glass of water, because some less knowledgeable people may take it as a real breach which it is not )
1. Affiliated sites listed by bookwarrior, the Founder of LG:
2. Blockchain records viewable via blockchain explorers and similar public tools. E.g., you may check the libgen.lib record on https://peername.com/, press Whois button after the search. The Peername extension simply can't handle SSL, and EmerDNS domains such as .lib aren't yet supported on IPFS by browsers. It's being worked on, though. For now only IP address forwarding works, but you can choose another way as per below.
libgen.crypto record is googlable and can be seen on OpenSea. I'm not finding the IPFS CID, though, but .crypto does support HTTPS, and so do IPFS gateways, after which the CID takes you to the correct location. You may use https://libgen.crypto/ However, in this case there can probably still be MITM with legit SSL certificates. I'm not sure.
Concluding, if you once learn a legit blockchain domain name, you can trust it's record since the record cannot be modified without direct owner's intervention. It's cryptographically strong. It's not the case with conventional Web domains which are fundamentally rented.
It's cool for preventing things like censorship. Something like SciHub would really benefit from it.
However, for "real world" use cases, many people want to be able to remove or modify what they've uploaded. With IPFS, as far as I'm aware, doing either doesn't really change the underlying data but just creates a new object in IPFS instead which you'd point to via IPNS. Anyone who still wanted to view the old content still could, provided they had the right content id.
God forbid you accidentally upload a "personal" photo, your only hope is that someone never comes across the content id of that image. There is no way to undo it!
- Both use DHTs to search for sources of fingerprinted content.
- Both use nodes (seeds in BT terminology) that actuallu store the content.
- Both don't have an "archive" system, and so if at least one node doesn't have the file, it may as well not exist at all.
- Both can have content censored by going after the node operators.
Am I getting any of these wrong?
IPFS doesn't help much with censorship, as it has all the same issues as torrent in that area. It doesn't help much with privacy either, as it's all rather public. It's really for legitimate uses, not outside-the-law kind of stuff.
The benefit of IPFS is that its granularity makes it much more useful for smaller tasks. For example you can host Git repositories or source trees on there. And since IPFS on Linux can be mounted as a file system, you can just access them with a simple `cd` command, no manual download or extraction needed.
Edit: Actually BitTorrent v2 dedups files so it seems like IPFS and BitTorrent are now functionally identical.
This is much neater in IPFS. Files and data blocks are individually handled, and there is no situation when a hash embraces several independent file fragments. Adaptive block sizes are also supported which is extremely useful for handling such collections as LG is (however I haven't checked if it's really used at present) instead of having an extra layer of hashes to rehash files individually after the torrent hashed their chopped "tape/tar" chucks. The forced
BitTorrent serialization and subsequent fixed-size chunk chopping are basically absent in IPFS. This helps structuring the search and facilitates deduplication, too, through the strict Merkel tree correspondence to files, as opposed to the randomized data chunks having a fixed size for no real necessity and meaningless hashes for wider applications.
To me these are the key aspects, even torrent bug fixes, that IPFS possesses.
BitTorrent 2 fixes that: https://blog.libtorrent.org/2020/09/bittorrent-v2/ (hash trees section)
Who knows, BitTorrent might have never fixed this without seeing how IPFS works.
Multiple things make IPFS a more architecture-oriented solution than application-oriented BitTorrent.
There are various application features yet holding back BitTorrent and LG will utilize them in future.
Permanence comes from distributivity and from real hardware, not from IPFS. Its pinning is only a few days long, in reality. It's not really a hosting, rather a sporadic buffer.
Wow, this is way more user friendly than normal! Nice work.
1. > Show HN
Did you @sixtyfourbits make this? Any stories about how you came to be involved in the project? IPFS seems like a pretty ideal way to handle sharing documents like this, I'm surprised LibGen hasn't used it before (previously, you would get redirected to one of many constantly dying domains that may have ads and frequently 404 on the actual book you're looking for).
2. Also, this interface frequently doesn't work in Firefox for me. It hangs while trying to load the file. Fortunately, you can check the browser dev tools and find the actual IPFS gateway link (which uses ipfs.io), and go to that link directly. My experience is that the direct link not only works far more frequently, it's actually faster as well. So this raises an obvious question: rather than load the file in a fancy interface, why doesn't the link just take you directly to the IPFS gateway?
3. Is there any concern that systematically using a legitimate service like IPFS to share illegal material will create a situation similar to that of Bittorrent, which is similarly often presumed illegal until proven otherwise? That seems like a shame. I suspect the only reason why rights holders have not cracked down on IPFS is that it's not yet big enough to be on their radar.
Everybody who makes a contribution, even a smaller one, no need to make revolutions every day, directly affects the civilization through Library Genesis. LG is built of contributions in the same way as our body is made of cells. It's our heritage.
3. An arbitrary IPFS gateway can be set up our rented, it's not a taboo. They are usually $10/Mon.
3. It's not about whether it's difficult to act as an IPFS node, it's about whether doing so will (in the future) bring you under legal scrutiny the same way running a node serving copyrighted content on the Bittorrent network will do now. DMCA against the major gateways will probably work to make files difficult to access, and IPFS necessarily reveals the IP address of the node you connect to, if you don't reveal a gateway. Similar techniques are used to get the IP addresses of Bittorrent users, and send them demands for financial compensation or sue them in court for distributing copyrighted material. If the same becomes common for IPFS, it would not be unlikely to see college networks come under pressure to ban access to IPFS, and this would limit access to LibGen's database in a significant way.
3. IPFS node and gateway are different things. A gateway is vulnerable to takedowns. A node isn't nearly as vulnerable. And if you run IPFS Desktop or similar software on a VPN connection, what's really left to be afraid of, conceptually? Pretty much nothing. It may throttle the traffic a bit, but that's no problem. Pick MullVad VPN or some other like NordVPN etc. Free test days are available usually.
I don't know why exactly, but edonkey was killed by intercepting participants' IPs, but even without encryption it has not yet happened to BitTorrent. With VPN it's just unfeasible.
That might be true, but what I was saying in the OP was that the interface at libgen.fun usually does not work for me in Firefox, but the direct link to the gateway (the same one that the interface is using internally) almost always works for me.
> And if you run IPFS Desktop or similar software on a VPN connection, what's really left to be afraid of, conceptually?
Yes, but most users are going to download through an interface like this one. The concern is that this will put a legal burden on prominent public gateways once they become the targets of DMCAs and that rights holders may even be able to put pressure on university networks to block IPFS entirely, harming the whole network.
You seem sure that people downloading or pinning content on the IPFS network are all using a VPN. I'm not so certain of that. The situation is rather similar to Bittorrent, I expect. It's certainly true that Bittorrent as a whole hasn't been killed (nor could it plausibly be), but (a) it's routinely blocked on certain networks, making it harder for certain people to use it even for legal purposes, (b) most people in fact don't use VPNs and rights holders do send takedowns (via ISPs) threatening lawsuits or demanding payments, (c) even on private trackers, VPN use isn't ubiquitous. If any of these trackers reuse public torrent hashes, their users are at risk of being port scanned.
Real life has shown users are never hunted. Operators are, since they hold stuff. A random user out of a million of such a month on mostly a protected connection should really not think anybody will have a wish to find him. It's absolutely unfeasible and should only be mentioned as a joke.
This is fantastic yo, mad props.
Thanks for liberating our collective knowledge, ipfs style. Keep it up!
It makes use of the sql.js-httpvfs library, previously discussed on HN here: https://news.ycombinator.com/item?id=27016630
Loading Worker from “http://libgen.crypto.ipns.localhost:8080/dist/257fb50677e116... was blocked because of a disallowed MIME type (“text/plain”)
Follow this manual, if it's a bit new to you:
There are some good critiques and ideas in this thread I won't go over since it's been done, I have been putting off building something akin to this (not the same thing but somewhat similar) hoping someone more motivated would do it, and I'm very excited to see it happen.
I support you, though, that independent of the search time it is valuable to have a linking standard like that.
LG should fit in the abyss for the poor, but let the business evolve. A rebalancing from the legal entities will be required, but then a global balance can be established. Even widely known, it should take its place, and businesses their place. The two sides aren't mutually exclusive, but rather complementary.
Business cannot offer what LG does, and in this frame it is pointless to battle LG.
Relative to the usual website, it is a bit crude. A lot of metadata is missing, thus it is hard to decide which book to download. Particularly annoying are the lack of ISBNs and such, and the inability to click the author and see other books by them.
The worst point is that files come with no proper filename (just a hash!), thus inviting everyone to rename them in a non-standard manner, rather than offer a filename people won't have to rename.
The clickability problem is explained down this thread, it's the same as making an URL addressing a book. Not that nice. It's a technological peculiarity.
The naming may have a solution a bit later. It wasn't clear initially how to approach it.
Maybe have a step of indirection (an extra page) showing the metadata, with the ability to download directly still in the index should the metadata "page" not be fetchable.
>The clickability problem is explained down this thread, it's the same as making an URL addressing a book. Not that nice. It's a technological peculiarity.
It's good as long as libgen is aware. To be clear, it's good to be up at all, and it doesn't need to be perfect on the first iteration.
>The naming may have a solution a bit later. It wasn't clear initially how to approach it.
Showing a filename somewhere to allow downloaders to manually rename the file to a standard form would be a step in the right direction.
My browsers actually do offer to rename the files. It might be that yours is set not to prompt.
... using multiple free opportunities (accounts)...
I love libgen will switch to your version for sure :)
EDIT: if you do have IPFS and the companion extension then libgen.crypto should resolve to something like `http://libgen.crypto.ipns.localhost:8080/` which currently works as advertised.
To access it, you need to use software given in the description
Also, consider pinning as described on the libgen.crypto antisite itself. If you pin it, the search is going to be instantaneous.
I thought the web and internet was decentralized already?
Interesting project, btw! Many thanks to the devs.
Additionally, IPFS's devs have already stated on their community forums that content like sci-hub is not welcome there.
From what I understand, the notion of sci-hub/libgen "not being welcome" was only about discussion on the official forums. See https://discuss.ipfs.io/t/mirror-of-sci-hub-in-ipfs/1613 and https://news.ycombinator.com/item?id=25209246. But IPFS is a protocol just like bittorrent or HTTP, and the software is open source; it doesn't and can't enforce copyright restrictions.
Yes, but it's a protocol with a centralized single group doing development who can change whatever they want without the users' consent. Take a look at what is happening to the Tor ecosystem on Oct 15th this year: all tor v2 routing support is being dropped from the main client and infrastructure (for security reasons). Entire communities built on onionland and other tor v2 features, as well as all URLs/links, search engine databases, etc, will just go poof when the devs drop support.
Unfortunately being a protocol isn't enough. It has to be a community protocol, not a proprietary one where everyone follows one group's code. HTTP and bittorrent are safe from these kinds of attacks. IPFS isn't (yet) and that's why their butt-covering anti-sci-hub/libgen stance is worrying.
You are welcome to download the Tor source code and add v2 functionality back in, and you’ll be able to visit sites hosted by people who have done the same. No one is stopping you.
To very quickly summarize why we are deprecating, in one word: Safety. Onion
service v2 uses RSA1024 and 80 bit SHA1 (truncated) addresses . It also
still uses the TAP  handshake which has been entirely removed from Tor for
many years now _except_ v2 services. Its simplistic directory system exposes
it to a variety of enumeration and location-prediction attacks that give HSDir
relays too much power to enumerate or even block v2 services. Finally, v2
services are not being developed nor maintained anymore. Only the most severe
security issues are being addressed.
That being said, the deprecation timeline is now quite simple because v3 has
reached a good maturity level:
* v3 has been the default since Tor 0.3.5.1-alpha.
* v3 is feature parity with v2.
* v3 now has Onion Balance support 
* Entire network supports v3 since the End-of-Life of 0.2.9.x series earlier
Citation (literally) needed.
Everything has its limits. Perfect decentralization of development lead to a libgen collapse which not many know about thinking that it's bigger servers are still a real libgen. They are not and are mimicking libgen after capturing its vital parts. This happened exactly due to idealistic views that development can be entrusted to different anonymous individuals without regard to centralized management.
Decentralization implies loss of control. While it's good for results, it's very destructive for development and team building.
I'm not familiar with IPFS: is content on IPFS actually moderated somehow?
Theres nothing stopping you frm actually doing that, just talk about it somewhere other than their official forums.
Have a look what the fair use doctrine tells us
Do you have a link?