

Chrome user? Want to help out an intrepid young researcher? - keeganpoppen

So I'm doing a research project on browser usage for a class on designing CS experiments, and I was hoping that if you were so inclined you'd download a Chrome extension (code-named "honeypot") to help collect some usage data for me and help me out. The basic thrust of said research is to look into some of the limitations of the current window/tab paradigm as an imperfect mapping on the way that we try to / would like to use our browsers. While data sources such as Mozilla's test pilot (https://testpilot.mozillalabs.com/) et al. (or even just dumps of Chrome's history) are certainly helpful, they don't quite record the data I'm looking for.<p>What I'm recording is tab and window usage over time, along with the urls visited and a snapshot of the text on the page (more specifically, the term frequencies of the text) in order to be able to do higher-resolution comparisons (such as cosine similarity) between documents in and across browsing sessions. All this data is anonymized before it's even sent to me, so I don't have any direct way (nor, frankly, any desire :) to figure out who you are.<p>Link to more info, download instructions, and actual download:
http://keegansawesomecs303project.posterous.com/the-chrome-extension-honeypot<p>tl;dr - I did a Chrome Extension for a project for a CS class, and I'm running a bit behind, so I need help from the Hackernews crowd to bail me out and get me lots of data-- I know a website that could help with that ;).
======
keeganpoppen
Actual download link: [http://keegansawesomecs303project.posterous.com/the-
chrome-e...](http://keegansawesomecs303project.posterous.com/the-chrome-
extension-honeypot)

Also, if anyone has any suggestions on what I should look into, or other cool
data sources please let me know! :)

~~~
personalcompute
Yeah, suggestion, instead of pussyfooting around (whether you mean to or not),
use concrete language about privacy. Yeah, yeah, I don't 'plan' to sell any of
my user data either, but nobody (especially the law) cares about my wishy-
washy 'plans'.

Tell me in no uncertain terms that you won't ever manually view the urls,
won't ever manually view the page snapshots, and won't ever publish that
individual data without having all parties involved agree to the same terms.

~~~
keeganpoppen
...ok. Thanks for the suggestion. I certainly don't mean to indicate anything
other than being serious about user privacy. I'm sorry that I come across as
being cavalier-- I just meant to show both that I will be the only person who
ever has control of this data, for all of eternity, and that on top of that
it's unclear to me what the magnitude of the absolute worst case scenario is:
even assuming it's possible to find out who one of the users actually is
(which, as I said earlier, I will be doing under no circumstances), about all
I can say is that I know what URLs they visited between yesterday and Friday
morning.

But yeah-- I will at no point manually read more than an incidental number of
any particular user's history entries and snapshots (by incidental I mean that
if I observe some sort of pattern I will probably look at the urls in the
pattern, but not even look at the rest of the user's data stream other than
via algorithm).

I will update the wording of the blog right now-- thanks for the concern. I
don't want people to not download it because they think I'm trying to do
something sketchy.

