Either way, nice work. I'm sure some future Tim Berners Lee will do something crazy cool on top of this.
TorrentPeek goes one step further and puts this idea to use in sites, where interactions on an HTML page can be programmed to perform queries. Instead of AJAX calls, you get these sqltorrent calls.
Otherwise if the search result don't look like what the person wanted, how long should they wait to see if it pops in? Until the whole DB is downloaded?
The insight here is that we can tell SQLite (via VFS) to talk torrent-pieces rather than disk-pages, so we can essentially query torrents (torrents containing an SQLite database) on demand.
I know there are a ton of technical reasons why this is difficult, but web-scraping and the like always seemed like a problem that could be done fairly elegantly with distributed systems.
It requires a lot of resources with little to show for it.
From the readme it sounds like there's a community of people working on this problem. How much of this is stuff you built from scratch versus libraries you used to create the search engine?
I'm wondering what's the minimum set of items I'd need to copy over to reproduce this technology :)
I'm hoping TorrentPeek can become something like ZeroNet eventually - or maybe the ZeroNet guys can incorporate this technique in their project.
Does this mean you (essentially) need to download the index before doing any searches? Or is there trickery to minimize even the FTS index pieces you need?
SQLite is already optimized to minimize disk-seeks, so it already knows to do the bear minimum amount of reads - which translates great for even slower I/O such as this (network access is much slower than disk access).
Imagine sharing wikipedia dumps, along with an index.html page. TorrentPeek will render the index.html and expose a `sqlTorrentQuery(yourQuery)` method so that site owners can program the site around that.
You'd be able to do stuff like:
<form onSubmit=() =>
sqlTorrentQuery('SELECT * FROM wikipedia MATCH ' + this.value + ' LIMIT 50;'),
result => console.log(result))> ... </form>
This look like a really cool project. Could you please clean up the repo so that other people can easily try it out?
But you're right, it does require more attention towards index construction for site owners.
What bothers me is that you've rounded the time down in your calculation despite ignoring throughput factors like packet overhead, piece message metadata, other protocol messages and processing time. I'd be interested to see some real time measurements. If you decide to do it (maybe for a research paper?), consider using kibis explicitly (and mind capitalization - b vs B).