You'd need to provide at least the ability to selectively delete portions of the history. But you can selectively delete portions of your browser history too, and people don't - because it would be too easy to miss something. Instead, they just nuke the whole thing. How is your tool different?
Sure people clear their browser history because their embarrassed by their porn obsession, but I think this tool could be very useful for pornaholics too.
I was under the impression that redis wouldn't be all that useful to store a lot of data. Would be great if something as quick as redis could work with large data sets.
Curious how. O(1) an array index lookup, not a string lookup, I thought.
O(1) implies that the location of the member in the list is already known, with no search required. I don't see how that could be the case when it's a key lookup. The key could be anywhere in the list, even if the list is sorted. They key would have to be searched for, it seems.
Think of it this way (this isn't literally what happens, but it's close).
1) Take the url you're looking for. Run it through a hash function. This takes an (amortized) constant amount of time.
2) Now you have an index to check. (the return value from the hash function). So index into your actual table, and check to see what's stored there. If there's a value stored there, then the url is a member of the set. This also takes a constant amount of time.
Does this help?
But that check isn't a constant time lookup. The lookup time can vary. (Analogously, a lookup in a phone book can vary in time; we can't necessarily go to the exact spot the first time.) So the total time for both steps must vary as well. I think.
That's not what's going on here. Instead, you use the value as in input into a function that tells you where to look for it, then you look, to see if it's there.
If it's not there, it won't be anywhere else, so you don't have to keep looking. Things get interesting with collisions but that's a subject for another time.
The same could be done for a "Porn" profile too I guess, sand-boxing any history, extensions and bookmarks to that profile. You could even associate tie it to a Google account for portability.
The noise introduced by phrasing my query differently is a real problem in search that Google hasn't fixed yet.
Whenever I have tech discussions with friends I would recall something mentioned in a article I read via HN. But it would take me a whole lot of effort to get that link. Oftentimes I simply couldn't get hold of the link even after an hour of searching.
Please do get the Firefox extension out. Would love to use it. Also, please do make sure the extensions/addons are stable. Have been facing problems with Annotary's extensions , for instance.
By the way, do you have a crawler fetch the link content or do you send it from the user's browser?
How do you deal with 2 users looking at the same URL but seeing different things? example.com/me would be different for user1 and user2.
Some pages would be very dynamic, eg. Facebook. And not everyone browses facebook/twitter behind https (which you do not index). Do you not index social networks?
I like the fact that the extension requires no user input and works silently in the background. Has some trade-offs, but worth it. Cannot comment on the search quality yet because Chrome is only my secondary browser; not enough history to search for anything meaningful.
- How much data do you store per user?
- How do I delete certain results? (preferably after the search comes back)
- Another thing to consider is - After how much time does this just become as painful as finding that page through a search engine?
- What version of the page gets stored? The latest or the one that I saw?
I guess its one step better than Evernoting a page and adding tags myself.
I'm a paying Xmarks user, but if you were to add a way of tagging sites or adding a note, I'd happily pay for this instead. Just a freeform text field that I could add some keywords into that gets treated as part of the search would actually be sufficient for me.
I'm sure there is a configuration setting somewhere to deal with that, but it would be yet another thing to take care of.
Without this, assuming this plugin is always-on on all the computers one uses, breaking user's privacy just becomes too easy.
And there's a lot of data one might want to leave private except porn(and usually don't post them in facebook): medical issues, sexual issues, marriage and some other relationship issues, drugs issues and probably others.
Do you store just the URL and depend on Google returning the results? How does it work exactly?
But Seen Before requires less effort on my part as a user -> I am more likely to use it. I just continue to google as per normal and now I have an extra option on the right to filter results.
So no matter where you read it you can still search for it even when your browsing history has been deleted."
I think it's a good idea but I think many people would need convincing on the security front.
Specifically I'm not comfortable for big web company to keep the history of my web activity. So I make it work completely locally. My project did not get much uptake, probably my lackluster marketing and other assorted issues are to blame. So good luck on this one!
Yeah I could code/hack together something myself and have been thinking of doing it [for fun], but ya know :p
So, yeah, count me in as being interested.
They should link to the study.
> 40% of searches online are people simply looking for what they have already seen before.
Citation link needed.
Information Re-Retrieval: Repeat Queries in Yahoo’s Logs
Abstract: "This paper explores repeat search behavior through the analysis of a one-year Web query log of 114 anonymous users and a separate controlled survey of an additional 119 volunteers. Our study demonstrates that as many as 40% of all queries are re-finding queries. Re-finding appears to be an important behavior for search engines to explicitly support, and we explore how this can be done."
The other way of looking at it is that maybe it's actually 35% or 45% but either way, that's still interesting, even with a rougher approximation of the actual "answer". If, for some reason, you needed to know if it was 40% or 40.01% because that mattered to you then you would absolutely be annoyed at the small sample size.
If the finding was 2% then we would care about the uncertainty of +/- 5% since the finding is dwarfed by the error rate. That's a smaller effect size so you would need more samples to separate reality from the noise.
I am, by the way, pulling all of these numbers out my ass. Your stats 101 class will teach you the formulas to calculate the actual error bars at work here as well as the assumptions you need to make about the distribution of the data to use those formulas.
I often find interesting articles on Hacker News while I'm at home that I want to find again when I'm at work. Being able to search by browser history across machines is fantastic for me.
Not ideal, several flaws, but works well enough for me so far.
I was going to hack it by making chrome bookmark every site I visit with a tag:history then when I wanted to search for a site that I've already visited I was going to just search with that tag.
Show Search Tools -> All Results -> Visited Pages
In theory Chrome lets you search through your history for pages, but it doesn't seem to actually work very well for me.