
Show HN: TorrentPeek – Distributed search engines using BitTorrent and SQLite - sktrdie
https://github.com/lmatteis/torrent-peek/blob/master/README.md
======
MR4D
Do I understand this correctly - you're essentially replacing the file system
with BitTorrent, as opposed to replacing the database with BitTorrent?

Either way, nice work. I'm sure some future Tim Berners Lee will do something
crazy cool on top of this.

~~~
sktrdie
Exactly. The database syntax is the same. You're still writing SQL queries
against the SQLite engine. I'm just replacing disk-access with network-piece-
access.

TorrentPeek goes one step further and puts this idea to use in sites, where
interactions on an HTML page can be programmed to perform queries. Instead of
AJAX calls, you get these sqltorrent calls.

~~~
Sunset
This is interesting and reminds me of Zeronet.

------
skrowl
How does it only work on macOS if it's based on Node? Seems like they could
have gotten it to work on Windows and Linux.

~~~
sktrdie
The main reason that there are no binaries for Windows/Linux it's because I
don't have these machines to compile the project against. I will eventually
create binaries for all platforms. Letting it compile by node (node-gyp) would
require me to distribute all of libtorrent/sqlite/boost sources which could be
feasible - I'll investigate on that.

------
degenerate
Is there indication when the result list for a given query is 'complete'?

Otherwise if the search result don't look like what the person wanted, how
long should they wait to see if it pops in? Until the whole DB is downloaded?

~~~
sktrdie
Author here. Yes the SQLite engine returns an SQLITE_OK value when the query
is complete, just like if it were a database file on disk.

The insight here is that we can tell SQLite (via VFS) to talk torrent-pieces
rather than disk-pages, so we can essentially query torrents (torrents
containing an SQLite database) on demand.

------
gravypod
Do you think something like this could be used for some form of a distributed
blob store that you can query? Things like package managers could use this
effectively.

~~~
sktrdie
Yep. The SQLite database can virtually contain anything. TorrentPeek is about
sites, but there's nothing stopping you from using it to store information
about software packages.

------
tombert
This reminds me a lot of the (I think) very underrated YaCy project:
[http://yacy.net/en/index.html](http://yacy.net/en/index.html)

I know there are a ton of technical reasons why this is difficult, but web-
scraping and the like always seemed like a problem that could be done fairly
elegantly with distributed systems.

~~~
j3097736
AFAIK YaCy "peers" are just instances of solr doing search/indexing on their
own (as set by the user) and the "distributed search" page only redirects the
user to one of the available peers with external port open, not much p2p going
on.

It requires a lot of resources with little to show for it.

------
ZainRiz
This is really cool.

From the readme it sounds like there's a community of people working on this
problem. How much of this is stuff you built from scratch versus libraries you
used to create the search engine?

I'm wondering what's the minimum set of items I'd need to copy over to
reproduce this technology :)

~~~
sktrdie
Actually I'm not sure about the "community". The credit for this idea goes
mostly to the person behind
[https://github.com/bittorrent/sqltorrent](https://github.com/bittorrent/sqltorrent)
\- but I don't think there are projects using this technique just yet.

I'm hoping TorrentPeek can become something like ZeroNet eventually - or maybe
the ZeroNet guys can incorporate this technique in their project.

------
Groxx
Neat, both this and sqltorrent. I'll have to read up on this in more detail...

Does this mean you (essentially) need to download the index before doing any
searches? Or is there trickery to minimize even the FTS index pieces you need?

~~~
sktrdie
No you don't need to download the indexes beforehand. The download always
starts from the first piece/page (0). From there it essentially performs
binary search and only downloads the least amount of pieces possible in order
to satisfy the query.

SQLite is already optimized to minimize disk-seeks, so it already knows to do
the bear minimum amount of reads - which translates great for even slower I/O
such as this (network access is much slower than disk access).

~~~
Groxx
Huh. So it's basically just a torrent-VFS, and sqlite already plays nicely?
Neat application though! Seems like it works really well :)

~~~
sktrdie
Right, and it's also HTML/CSS/JS for site owners to program interactions of
their site using SQL queries that are prioritized against the underlying
torrent.

Imagine sharing wikipedia dumps, along with an index.html page. TorrentPeek
will render the index.html and expose a `sqlTorrentQuery(yourQuery)` method so
that site owners can program the site around that.

You'd be able to do stuff like:

    
    
      <form onSubmit=() => 
        sqlTorrentQuery('SELECT * FROM wikipedia MATCH ' + this.value + ' LIMIT 50;'), 
          result => console.log(result))> ... </form>

------
JD557
The repository seems to contain a lot of garbage (such as an `a.out` file and
a `reddit-browser` folder).

This look like a really cool project. Could you please clean up the repo so
that other people can easily try it out?

~~~
sktrdie
Yes sorry about that. The project is still an experiment, mostly used to get
real world numbers on the feasibility of the idea. I intend to maintain it and
develop it much further so I will definitely clean it up.

------
chx
So if I am not mistaken this only works well when a query is well indexed?

~~~
sktrdie
You mean when the database is well indexed? Yes. SQLite supports Full Text
Search ([https://sqlite.org/fts5.html](https://sqlite.org/fts5.html)) which
builds all the indexes for you, allowing you to perform search-engine-like
functionality.

But you're right, it does require more attention towards index construction
for site owners.

------
halflings
Cool idea! Surprised that SQLite can know exactly which pieces it will need to
answer any query!

~~~
misterdata
Without indexes, SQLite will do a table scan of sorts, which would still
require all pages (=torrent pieces) that have data for a particular table.
With indexes however this can be a lot smarter (client likely only needs a few
pages from the index for a single query).

~~~
tangent128
You still need to walk the tree, so you don't know which pages you need
immediately- but since the root pages for the indexes should be well-
distributed in the swarm, it shouldn't take very long to figure out which
leaves you need.

~~~
sktrdie
Yes. I ran real world numbers. Before getting a result you still have to
download on average ~20 pieces (32KB * 20 = 640KB). With healthy swarms and a
500kb/s connection you should get results as soon as ~1 second.

~~~
paxcoder
Is 640KiB not the whole index? Because partial download would mean having to
request new pieces based on the previously-downloaded ones, adding latency to
the mix. Assuming just 50ms latency, for 20 packets that would account for a
whole second before seeing a result. And 20 decisions seems a lot, though I
don't know how the index works.

What bothers me is that you've rounded the time down in your calculation
despite ignoring throughput factors like packet overhead, piece message
metadata, other protocol messages and processing time. I'd be interested to
see some real time measurements. If you decide to do it (maybe for a research
paper?), consider using kibis explicitly (and mind capitalization - b vs B).

------
formula_ninguna
where is the source code?

