

Idea: Idiot Filter - atte

I think an "idiot filter" for search results would save me a lot of time.  Particularly when I'm digging through forums for information, I tend to skip over entries with poor grammar and spelling. Occasionally someone just speaks poor English, but more often this is an indicator that the submitter is unintelligent (or drunk) and their response will not be useful to me.<p>A simple idiot filter might just work as a layer above Google and snip out results with a high percentage of grammatical and spelling errors.  A more refined one (probably a browser plugin) would act on content within pages and hide or dim unintelligible blocks.  If the focus was on forums, I don't think it would be too hard to come up with an algorithm for guessing what encompasses a single user's submission by analyzing the structure of the page.<p>What do you guys think?  Anyone want to work on it with me?
======
ChuckMcM
Its an interesting concept, although the definition of 'idiot' is not very
precise. As others have pointed out, sometimes brilliant people can't compose
gramattically correct english. That being said ...

Its fairly easy to identify forums on the web (they have a form which is
generally very common, inspired by PhPBB way back when). And you could
identify users, take the sum of all their contributions and try to generate
some sort of 'evolved' karma score for their posts. Things you might consider
are things that academics use, how many times was the post referred to
(similar to citations in papers), what sort of traffic follows the posting
(similar to counterpoint papers), Etc. But even if you end up with a perfect
score, you won't benefit until you've been able to process several postings.
If poor quality posts are the norm in your particular research area you will
still deal with a lot of junk while the algorithm is learning that it _is_
junk.

Finding a way to predict that the posting is going to score high on the
suppression scale as its being posted would be helpful but new posters appear
quite rapidly mitigating the benefit significantly.

------
devs1010
Hey, I'm working on an open source project that I think could application for
this. Its something I've termed a "web gatherer" basically it provides the
framework for crawling web pages and then has workflows where custom code is
written to determine certain things about each page, if it meets criteria that
is programmed for that workflow then the page is added to the results queue,
the others are filtered out. I'm planning to implement an NLP component at
some point using one of the open source NLP libs availabe. Overall, I think of
this project as sort of a web scraper / search engine that sits above the base
layer (such as Google) which can be used to refine results. Anyways, you may
be interested, if so feel free to contact me:
[https://github.com/devs1010/WebGatherer---Scraper-and-
Analyz...](https://github.com/devs1010/WebGatherer---Scraper-and-Analyzer)

------
dlitz
You'd end up filtering out really good blogs like ERV, because its author
objects to apostrophes and sometimes writes like a LOLcat:
[http://scienceblogs.com/erv/2009/12/drug_resistant_prions_vi...](http://scienceblogs.com/erv/2009/12/drug_resistant_prions_via_quas.php)

------
johnl
I search forums for DIY home projects and have found I need at least 10
responses to my question before I can arrive at a result I feel comfortable
with. Going back over the responses with the overview from the search, I can
now understand responses that I originally thought were poor, weren't. I keep
thinking something like a do-it-yourself thread builder that you build, save
and share while you do your Google search, sort of a tumblr except you access
multiple sites might be a better approach than an exclusion approach.

------
glimcat
NLP is hard, particularly for highly general problems conducted on small
samples of text.

Here's a problem case that you will find to be very common: a 20-second post
with the right answer to a difficult problem by someone who's busy and typing
on their phone. Riddled with typos, weird corrections, transposition errors,
etc. - but still something you'd want to be a high-ranking result.

~~~
devs1010
You can use a large set of sample articles that are well written, the machine
learning is done on these, then the search engine would compare against these
for each web page if finds when searching. You could easily have a search go a
few hundred pages, just have it take 20+ pages of google results. I worked on
a project that used NLP (for a startup), I wasn't the one doing most NLP
related stuff but it was definitely an interesting experience and the guy who
was in charge of the NLP was able to get some pretty interesting results, even
from fairly small data-sets.

------
gujk
Done.

<http://www.chrisfinke.com/addons/youtube-comment-snob/>

------
meatsock
this could be accomplished more simply by counting the number and size of the
avatars on the forum in question.

~~~
atte
Haha, I do like that suggestion.

------
mrkmcknz
I know some highly intelligent people who are dyslexic. How would you tackle
that?

~~~
astrodust
Better auto-correct would be one idea.

~~~
glimcat
I've been turning autocorrect features off for ages, whenever possible. They
break more than they fix.

