
Idea dump, January 2011 edition - jacquesm
http://jacquesmattheij.com/Idea+dump+January+2011+edition
======
bambax
I don't know if this is the proper place (if not, sorry), but there's a device
I've been dreaming about and that I would gladly pay for: a home router /
service that would also do antivirus, content filter and VPN in a user-
friendly, turn-key way.

There are products that address some of these features (and yes it can be
custom built "by hand"), but I'm not aware of something that does it all out
of the box? Here's a more detailed description.

1\. Antivirus

1.1 At the gate

Since the only way to get something on my computers is via the Web, it doesn't
seem efficient to run an AV on each one; this device would check all incoming
traffic.

1.2 "AV DMZ"

Ideally, this device would also be able to do an AV check on hard disks and
such when connected to it via USB, so that if someone comes to my house with a
USB drive I can check it there without having any other machine with an
antivirus installed.

2\. Proxy / filter

2.1 AdBlock

This is currently worse than antivirus software (except that it's free): one
needs one AV per computer, but one AdBlock per _browser_ (and some browsers
can't have any, except by using a proxy). Having it centralized would be so
much simpler.

2.2 Content filter / blocker

Most home routers have some sort of white list / black list mechanisms, but
they have to be edited manually; this device would filter content based on
crowd-sourced lists; there could be different lists of differing strength (cf.
all the recent talk about content farms).

3\. VPN

Content is being more and more controlled, either at the source (Hulu) or by
governments, even in the West (France, Australia and others -- cf. Wikileaks).

A VPN service that would a/ encrypt traffic and b/ make it originate from
different places in the world (user-selectable among different solutions
offered by the service, maybe for different prices) would alleviate all those
problems.

~~~
IgorPartola
HTTPS will get in the way of most of these. If someone on your network decides
to search for content you have blocked, all they'd have to do is use HTTPS to
access Google, Yahoo, etc. Unless of course you require all your browsers to
go through the proxy server on your device that forces HTTP on them. Which is
more of a pain.

~~~
bambax
> _If someone on your network decides to search for content you have
> blocked..._

I don't mind; it's a device for the home, not for the office. I'm not trying
to enforce anything for my "users", I'm just looking for convenience for me.

Content blocking is not a punishment that people would want to escape: it's a
reward! Use it if you want to, don't use it if you don't want to...

------
coderdude
I'm intrigued by the EditDistance idea. Right away I started trying to think
of how I could implement it. It seems like in order for this to work, the edit
distance would have to be based on something more averaged than a single pixel
color due to the various types of image compression (jpeg being particularly
troublesome for this task, I would imagine). Perhaps the image can be split up
into four-pixel blocks whose constituent parts can all be averaged together to
form a new color which can then be used as part of the matching.

I must admit however that I have doubts that an edit distance for images would
yield interesting results unless the pixels were split into a grid, or maybe
something like a quad-tree. Then the edit distance could be applied to each
portion of the grid to see how many adjacent sections of the grid are also
found to be adjacent in other images.

~~~
jacquesm
I figure that you need to do some filtering first.

I'd love to go and spend a few months on building a prototype of 'spindisp',
if I had the time right now I'd do it.

~~~
coderdude
I don't doubt that you've tinkered with this idea in your head as well, and
where I went with it is the immediately obvious first place to start. ;)

RE: wishing you had more time -- I feel you on that one. It's taking a lot of
personal restraint to keep myself from deviating towards each interesting
project that springs up in my head. I never thought that time would end up
being such a valuable asset.

------
stcredzero
Instead of a 90's style "search aggregator," how about a "SEO-less search"
website? Basically, it would be a search engine that would aggressively quash
SEO spam. It would be monetized through micropayments up-front through
services like Paypal, Amazon Payments, and Google Checkout. There would also
be search-results ads.

~~~
phpnode
Here's the problem - how do you 1. define SEO spam, 2. detect SEO spam. It's
not as simple as saying "deindex anything that has optimized title tags or
lots of links or shows adsense" because plenty of useful websites optimise for
search engines, show ads and have lots of links. What you'd end up with is the
dregs of the web, because although you'd reduce the number of spam sites in
the results, you'd likely also reduce the number of useful sites in the
results because you're specifically targeting and removing the sites that
optimise for search engines, which is 99% of the most successful sites on the
web today.

SEO is not an inherently bad thing, it's valueless, spammy content that's bad.
The real challenge is in determining whether a page is useful, original and
relevant to the search query, and that's where Google et al are falling down
at the moment.

~~~
stcredzero
_It's not as simple as saying "deindex anything that has optimized title tags
or lots of links or shows adsense"_

No, it's not. But undoubtedly, SEO spammers are leaving behind the same kind
of clues that email spammers do. Using techniques like Bayesian classifiers
should be able to classify such pages. Just detecting all the pages that are
slightly edited/SEO-optimized stackoverflow answers would be a good start for
me.

 _you'd likely also reduce the number of useful sites in the results because
you're specifically targeting and removing the sites that optimise for search
engines, which is 99% of the most successful sites on the web today._

I'd be putting useful sites in the "useful" corpus.

 _SEO is not an inherently bad thing, it's valueless, spammy content that's
bad._

Agreed, just as email marketing is not an inherently bad thing, just spammy
emails.

<http://xkcd.com/810/>

~~~
phpnode
There are billions and billions of pages in the index, let's say they produce
a page classifier that can correctly identify spam an unrealistic 99% of the
time. They'd _still_ misidentify tens of millions of pages. Regarding your
second point, how do you determine usefulness in a way that can't be gamed? My
point is, this is a very, very, very hard problem to solve at their scale, a
lot of posters on here are trivialising it.

~~~
stcredzero
99% classifier? So I get to have a search that's like Google/Bing for all the
non-spammy searches, and it only gets 1/100th the spam for the spammy
searches? I think I could sell a proposition like that!

------
RiderOfGiraffes
Do we want some sort of Hacker News only half-bakery?

~~~
coderdude
I keep forgetting that site exists. Had to Google it to be reminded of what it
is. That isn't a bad idea at all -- just don't make it a public Google doc.
Then again, maybe we could just post all these ideas to half-bakery. I can't
resist the allure of consolidation.

~~~
jacquesm
> I can't resist the allure of consolidation.

Neither can I that's why I posted this on HN :)

So many sites to keep track of...

------
revorad
Shoppert sounds like a shopping trolley* with bugs.

* [http://www.transport-impacts.com/wp-content/uploads/2008/05/...](http://www.transport-impacts.com/wp-content/uploads/2008/05/shopping_trolley_bag.jpg)

~~~
stcredzero
Shopping trolleys already have bugs.

