To respond to some of your ideas feedback:
* I do plan to sell the service by some combination of charging to buy the native version and a monthly/yearly fee for the cloud version.
* By offering the native version I hope to assuage any privacy or legacy concerns -- all you data is on your machine (encrypted and backed up however you see fit). You'll even have access to a local API to extract or do whatever you want with it.
* One idea I've had is to offer a cloud version / native version combo. You would sync to the cloud only your bookmarked sites -- all the other indexed pages you visit would stay on the local version. This way you control what gets put up on the servers but can still have access to your links from all your devices. Thoughts?
* I'd also consider open sourcing it (it's built on Meteor and ElasticSearch) but really do need to get paid for my efforts (just had a baby) and am not familiar with all the ins/outs of open source based businesses. I'd love to hear ideas and advice!
* This has turned out to be quite a lot more difficult than I'd thought but I'm real happy with how things are coming along. Two words: ElasticSearch Rocks.
Update: Another idea - maybe you could integrate with pinboard or delicious.com (if anyone still uses it) to backfill-index all links saved to those services. Maybe this could be a premium feature.
Here's my feedback: I do want the native version for privacy concerns, but I also want the syncing. Why not offer a program (or Docker container?) that I could put on my cloud of choice? That would be the real freedom. If people don't want to hassle with it, they will just pay your cloud offering.
I really value products that pay attention to this 'detail'.
So yeah, that's what I would pay for in this case as well.
I'm otherwise a bit paranoid to let all the text of every homepage I visit be captured by a closed source plug in. Still - amazing job, and technically very, very impressive.
I'm a regular user of Diigo, about 10K links, 500 different tags. Don't like it being cloud only.
My favourite feature is my ability to annotate a link (mostly highlighting text), so it effectively creates a chronological and topical feed of the exact sentences of what I want to remember from a link. It's kind of a self writing blog of what I read and experienced, complete with what stood out to me, and any notes I wanted to make.
I find I more remember a point or a sentence from a link than the link itself, and having a full text search of the words I remember highlighting and saving is incredibly powerful. I actually end up revisiting those links.
I have some experience with research and filing large databases of articles and images at a job in another life.
Look forward to chatting :)
does it index existing bookmarks? seems its not doing it now. The reason i wanted to build this is because my bookmarks are grown toooo big. And i wanted a way to search.
Please add this feature. and indexing the history too if possible
This product is amazing, had something like this in mind for a long and couldn't find a proper implementation.
I would be glad to pay reasonable price for such service.
Localhost/native version is a killer feature. Don't drop it! If you open source the code I'll be glad to contribute...
I am wondering how localhost works. When Linux version comes out, will it support storing database on a remote host? In other words, using own virtual host as a server.
btw will it have a way to export the data also ?
i'am ready to pay for this service
keep me updated for the linux version :)
bussiere AT gmail.com
Shameless plug, I'm attempting to do something similar specifically for science, but make those local results also available globally using a distributed network based on WebRTC. It's also a browser extension, which detects if you're on a page of a scientific article. If you are, it takes the body of the article and indexes it, by putting its contents into a DHT. You can then use the extension to search through this distributed network.
For those interested, the post back from June is available here: http://juretriglav.si/an-open-distributed-search-engine-for-... with the source code here: https://github.com/ScholarNinja/extension
The project will get a lot more love soon, as it turned out it was a bit too early back then because WebRTC implementations were buggy (since fixed in Chrome, but e.g. it resulted in 100% CPU usage in Chrome after a short while, gigabytes of memory used).
Anyway, best of luck making Fetching.io sustainable, flippyhead!
I've been told (by slingbox folks some time ago) that EFF argues that automatic updates are never (edit: generally not) a good idea. They can be used to add or remove functionality by court order.
Another scenario where automatic updates of a native app hurt users is when your company is purchased by a larger company who then shuts down the product. Please reconsider that feature for the localhost version.
BTW I'm just going by the green checkbox in the features comparison table to conclude that you have this ill-advised feature.
Sorry for latching on to that one thing, but it's important imho.
Other than that, this is something I've wished for many times, so great to see it becoming real. I loved clamprecht's suggestion of backfill from history -- that would be great!
BTW I am not affiliated in any form.
Without a clear alternative, the likely conclusion is that user data will be used for advertising some time in the future.
When it works, it's fast, clean, and really well integrated into the workflow of my browsing, since I use the address bar to control basically everything.
If you can figure out the Safari issue, I'd happily pay a few bucks a month for the cloud version.
Quick edit: turns out the Safari extension is definitely indexing the browsing, just the keyword search shows issues. Restarting the browser also kills the authentication every time. Latest Safari on OS X 10.10, if it helps.
1. Only fetched things available publicly
2. Was going to charge $5/mo
The rationale behind only fetching public things was to avoid indexing people's banking records or other sensitive information.
The $5/mo was because I wasn't looking for venture funding and I wanted to get paid.
Ultimately I gave up on the idea once it started to get difficult to implement. Probably my biggest failing; I'm easily distracted.
I hope these folks do something to address privacy concerns and make their business sustainable.
Is there any client-side encryption done? If so, where is the publicly auditable code? And how does the search work? It fetches everything and decrypt it for each single query?
The idea is very good, but this should not be done in the cloud, it has to be done locally, and potentially securely synchronized among different machines.
EDIT: Okay it's not mentioned on the landing page, but there is an option to use it locally. Cool!
EDIT2: hey downvoters, when I read "Your cloud data is visible only to you. You can optionally install fetching as an application on your computer." I assumed that the app was a client for the cloud service that was distinct from the web interface usable in a browser. This is a totally legit interpretation, especially when the next title is "It's accessible from anywhere". I don't see why it is wrong in that case to raise the privacy concerns that I mentioned. I cared enough to continue investigating and found on an other page that the product can actually be used locally. I then edited my comment (maybe 4 or 5 minutes later) in accordance with that new knowledge. Knowing that the concern I raised are still valid for users who would chose to use it with the cloud, what does your downvotes mean?
Basically this + all of the above + a browsable timeline interface was my idea. Please take it and build it if you have the time / motivation, and I'll subscribe to your service (or help hack on it if you open-source it). The best competitor I've found so far is Pocket with the premium features (I'm a subscriber).
Good luck, and great work. I also love meteor and ES :)
One of the biggest problems I can see is with the increasing popularity of web apps that load as a single page and use JS to load/parse/display the data; only the browser can get the actual content in that case.
And you can observe all the requests/responses and wait for an http 200 for the entire page (excluding intermediate 200's for things such as images). Example: https://github.com/MachinePublishers/ScreenSlicer/blob/maste...
The best approach for this would probably be to tie into a page unload event or some hybrid approach.
Getting started: https://developer.mozilla.org/en-US/Add-ons/SDK
The difference in CPU time between downloading a page and rendering it (even virtually as with say PhantomJS) was sufficiently large that running in the browser (not centrally) seemed to be the only general purpose way and maintaining _browser_ extensions is... a pretty major job. I was looking into writing a daemon to externally monitor the browser and it's cache (such as Chrome's Current Session file) when I left off.
Hopefully the use of server-side prerendering will catch on, be it through Node or other systems...
Like others here, I'm very curious about your business model, especially since this is closed source.
If I'm running localhost, why do I need to create an account?
Interestingly I created it initially to deal with the anxiety of not being able to read all the great content available on the net.
Then the service broke (kind of) and I realised I didn't care that I couldn't search all the stuff I tweeted.
So even though it's kind of broken it still solved my problem :)
I have been meaning to fix it up and get it operational.... One of these days....
PS: If you ever decide to open source it I'd be happy to contribute.
Could the index data be added to http://commoncrawl.org?
Can't wait for firefox support, then I can start using it.