
Falcon – a Chrome extension for full text browsing history search - anishathalye
https://github.com/lengstrom/falcon-js
======
diggan
This ( and TreeStyleTab ([https://addons.mozilla.org/en-US/firefox/addon/tree-
style-ta...](https://addons.mozilla.org/en-US/firefox/addon/tree-style-tab/))
) is the main reasons I keep preferring Firefox for leisure-browsing before
Chrome. Every time I use Chrome, it's impossible to find previous pages based
on the title or URL, while in Firefox is super simple and works really well.

Which is kind of ironic since Google is all about search and data but can't
handle a browser address bar...

~~~
izacus
Yeah, it's incredible how search in the Chrome's URL doesn't find URLs (or
page titles) I've visited 15 minutes ago. It's utterly frustrating.

~~~
KirinDave
I'm not sure why you're experiencing that, but Chrome's bar certainly does let
you talk about titles. Just type "Hacker" or "news" and see.

At least, that's what it's _supposed_ to do and is currently doing for me.

I just fired up Firefox, and neither actually seem to index into the _content_
of the page, which is what's so cool about this modification.

~~~
diggan
I'm on Ubuntu and just tried this in Chrome, I went manually to
news.ycombinator.com, then open a new tab and type "Hacker" and HN is nowhere
to be find there...

~~~
KirinDave
There may be a regression with the Chrome support in that OS. As it is a very
tiny share of the Desktop market and is the hardest shell to maintain of the
3, I'm not surprised.

I have confirmed it works fine on Windows 10 and OSX variants both in Canary
and Stable.

You perhaps might be interested in reporting the bug. That's a major
regression. I'd be annoyed if it was missing too. Not enough to go to Firefox,
mind (the speed loss is too great once you have 20-40 tabs open), but it'd
definitely lower the speed at which I work.

~~~
diggan
> the speed loss is too great once you have 20-40 tabs open

That leads me to my second point, TreeStyleTab. I commonly have 50+ tabs open,
and they are nested nicely under each other so I actually can find them. In
Chrome, there is bunch of half-baked addons for easier finding, but nothing
like TreeStyleTab for Firefox.

~~~
hs86
Have you tried Tab Outliner? [1]

Compared to TST it has two (major?) drawbacks upfront:

1\. It sits on a separate window next to your other browser windows.

2\. Some of its features are paid features.

I always thought that TST for Firefox has ruined all other browsers for me but
I actually like Tab Outliner more than TST. Here are my reasons:

1\. It manages all my browser windows in a single tree view. Separate windows
are children of the current session.

2\. Its has a much better keyboard support like rearranging tabs with your
arrow keys while holding CTRL or indent/unindent a tab further into the tree
with the usual TAB / Shift + TAB shortcuts.

2.1. Even rearranging your tabs with the mouse seems much more deterministic
than in TST. In TST I struggle with rearranging a tab as a new child vs. as a
new sibling tab.

3\. You can unload tabs or entire subtrees (collapse its children and then
press the green unload button) to free your RAM.

4\. Google Drive backup feature to sync your entire tree to your other
computers. You can restore your entire tree or just a subtree on an other
device just via drag and drop.

I only use a subset of all Tab Outliner features but it is already giving me a
much saner tab management experience than TST could do. The better keyboard
support and syncing my tabs across devices while preserving the tree structure
are essential features that I can't have with TST on Firefox. Personally I
have more sympathy with the TST developer (always nice on GitHub) but Tab
Outliner is still the better browser extension.

[1] [https://chrome.google.com/webstore/detail/tabs-
outliner/eggk...](https://chrome.google.com/webstore/detail/tabs-
outliner/eggkanocgddhmamlbiijnphhppkpkmkl)

------
avian
> you can clone it on your local machine, read through our code to verify that
> it is not malicious, and then install it

I like that the authors share my concern about installing an extension that
would by design record every page I visit. However the repository contains
several minified Javascript files [1]. This somewhat contradicts their
invitation to read through the code.

[1]
[https://github.com/lengstrom/falcon/tree/master/extension/js...](https://github.com/lengstrom/falcon/tree/master/extension/js/lib)

~~~
mindcrash
I agree that the thirdparty javascript files also should be supplied in full,
and minified during the build process.

However, I've found the originals so you can still check if they contain
'contaminated' code.

chrono: [https://www.npmjs.com/package/chrono-
node](https://www.npmjs.com/package/chrono-node) \- a natural language date
parser for Node and Browserify

notie:
[https://www.npmjs.com/package/notie](https://www.npmjs.com/package/notie) \-
a clean and simple notification, input, and selection suite for javascript,
with no dependencies

readability: [https://github.com/arrix/node-
readability](https://github.com/arrix/node-readability) \- Node implementation
of Arc90's Readability (however seems this code has been slightly modified)

semantic: [https://github.com/Semantic-Org/Semantic-
UI](https://github.com/Semantic-Org/Semantic-UI) \- Semantic UI JS support

stopwords: list of stopwords for the english language

~~~
diggan
It serves no purpose, reviewing third-party code that you don't even know is
the same that is distributed. But anyways, since Chrome has autoupdate for
addons, it doesn't matter if you're reviewing the addons you install or not,
because it can change at any point.

~~~
Falcon9
If you clone it on your local machine you also won't receive any automatic
updates.

------
alistproducer2
I wrote an extension that did the same thing a couple years ago.
[http://lifehacker.com/deeper-history-searches-the-
contents-o...](http://lifehacker.com/deeper-history-searches-the-contents-of-
visited-pages-1502340820)

I voluntarily removed it from the web store after realizing it was caching
lots of sensitive data. I eventually started encrypting the stored info but I
realized that if the extension ever became very successful, it would become a
target and I wasn't comfortable with that.

I hope the developer of this extension will invest more effort in their user's
security than a simple blacklist.

~~~
yoavm
Seems like this extension stores all the data locally, so it's probably much
less of a problem.

~~~
alistproducer2
As did mine. There are many well known attacks that breaks locally stored data
out of its sandbox. If attackers are sure there are bank account numbers,
balances, email addresses, and other sensitive info in plaintext, they'll come
after it.

~~~
samplers
Do you still have the code somewhere? I would like to take a look at it.

------
cchan3141
Really really useful extension, whoa. Searching the _content_ of pages you've
browsed. I need it legitimately multiple times a day lol.

Two caveats though: 1) obviously it can't index the pages you browsed before
installing the extension and 2) it's a bit unclear how to use it (in searchbar
press f tab).

I'm also interested to see info on storage usage after a long time using it.

~~~
jrowley
Upon installation, if you the user opted, couldn't it crawl the history up to
some specific date? Or does chrome not allow extensions to access browser
history?

~~~
alistproducer2
My defunct extension did exactly that. So, yes, chrome extension can access
history.

------
flippyhead
This is awesome! We develop a tool that does exactly this and found that
getting the search right can be really tricky given the very large volume of
data. Love the simplicity of making it a chrome extension. Excited to try it
out!

------
Raphmedia
Gifs are great to explain visually.

The gif on this page is really bad at it.

Slow it down. I have watched it loop 5 times in the last 30 secondes and I
still cannot tell what it is without reading the text. I feel dizzy.

------
kovek
Thank you for this! Many times have I wished the browser's history search
could provide this.

Now that school is starting again, and I had some free time, I was thinking of
working on a project that would allow to search through the websites you've
visited, the documents you have on your machine, the photos and music you have
on your machine (if you can run some program which generates a description for
your photos and run some mp3 to lyrics program for the music), and all the
same across many machines. I started looking at elasticsearch, because that is
what I found during my research for the search tool I would need for this
project.

------
amckinlay
Kippt used to do this. Unfortunately, I never got the email that their service
was shutting down. And I lost all 500+ of my tagged bookmarks.

------
rl3
It would be great if this worked for bookmarks as well. On average I probably
accumulate about 10 new bookmarks per day of notable content. Over the years
that adds up.

Obviously searching pre-existing bookmarked content (and not just history)
would entail far more complexity, probably requiring a back-end service.

~~~
alistproducer2
This can be done entirely client side.

~~~
rl3
You're right. A cursory investigation suggests spinning up a bunch of client-
side HTTP requests from a Chrome background page should do the trick.

------
soundoflight
This has been one of my favorite things in Opera since they introduced it in
the late 2000s. It seemed weird Chrome wasn't better at this. I'm going to
have to give this Falcon a try because it looks just like what I would want!

------
ysleepy
I built something like this as well!

I wonder how you solved the data storage and indexing. Does it scale to multi-
month heavy usage ? Does it deduplicate multiple visits?

Cool stuff, gotta put mine somewhere. Always planned to, but never got around
to it.

~~~
unclesaamm
Looking in the code, it loops over all the indexed text and does substring
matching on tokens from the search query. Good enough for everyone so far,
but...

------
ilostmykeys
I made one a couple years ago called All Seeing Eye.. have not had time to
update it since. Did screen capture too. On Github .

~~~
crawfordcomeaux
I've missed All Seeing Eye & haven't found another free solution til today.
fetching.io was nice, but I really wanted a local (or self -hosted) solution.

~~~
flippyhead
fetching.io is self hosted on OSX at least

------
Globz
Very nice work, this will be useful in my day to day browsing for sure!

------
Shank
How similar is this to fetching.io?

~~~
flippyhead
I can't speak to the internals of this extension but a few things jump to
mind: this works with chrome, fetching.io works with Safari, Firefox and
Chrome. This works on any OS that has chrome, fetching.io is only self hosted
on OSX (otherwise there's a cloud version). This appears to implement its own
indexing scheme, fetching.io uses elastic search. Fetching.io has it's own
search UI, tagging notes etc, this is integrated directly into chrome. Which
is best for you probably depends on your needs. Oh and fetching.io isn't open
source ;)

