
Twitter's garbage problem - jmillerinc
http://jmillerinc.com/2010/05/16/twitters-garbage-problem
======
sketerpot
I can't believe nobody has mentioned naive Bayesian text classification yet.
It sounds like it could work wonders for Twitter. I'm much more likely to be
interested in tweets with words like "hylomorphism" than tweets with words
like "omglol", and a text classification algorithm could learn that if you
trained it up some. It doesn't have to be perfect; it just has to improve the
signal-to-noise ratio significantly.

~~~
kmavm
I've looked into this a bit, albeit more in a spam-filtering context; tweets
have very little text for naive Bayes to latch onto. 140 characters would be
20-30 words, tops. That is so few words that it is hard to move the prior very
much, unless there are blockbuster words that almost _always_ indicate a bad
tweet; as the article suggested, "breakfast", "beer", etc.

~~~
JCThoughtscream
Filter by account, then, instead of by post? With a sample body of, say, the
ten or twenty of an account's most recent posts.

~~~
kmavm
The article's whole premise was that tweet quality does not correlate well
within an account; e.g., some marvelous twitter streams include breakfast
tweets.

------
lftl
I've always thought of this problem in reverse for both Twitter and Facebook.
I have certain followers/friends that are interested in my thoughts about
programming and business and others who would care more about where I'm going
this afternoon. It would be nice if there were different publishing channels I
could publish to different friends.

~~~
_delirium
You can do that on Facebook: when posting a status update, the drop-down box
under the lock icon has a "Customize" option, which will pop up a dialog where
you can select specific people or pre-created lists of friends for it to be
visible to (the lists function essentially as channels, like "colleagues" or
"family"). You can also set one of those channels as the default. I know a few
people who do that fairly regularly to separate friends vs. colleagues vs.
family updates. But it's a bit of a hassle and sort of buried.

For Twitter, I know a few people who have two Twitter accounts, a "work" and a
"personal" one. Also somewhat of a hassle, though maybe actually less of one.

~~~
natrius
That's close, but not exactly the same thing as selecting people to publish
to. If I set a post to go to the "Tech People" friend list on Facebook, none
of my other friends can see it, nor can the general public. I'm okay with
everyone seeing it; I just don't want to pollute people's news feeds with
things I know they won't be interested in.

------
sliverstorm
In all honesty, I always got the impression that the point of twitter was to
tune into other people's noise, not to harvest useful information.

~~~
jmillerinc
That's how I view Facebook. Twitter on the other hand is rich with useful
information.

------
PanMan
Sometimes I really miss Jaiku. There you could add multiple streams to your
feed, but when following, could also decide to follow certain streams, or not.
So I could follow someone's post, but not their flickr pics or delicious
bookmarks. Comments on a post where separate from posts as well. Twitter makes
this all one big mess. Let's hope annotations will be used to add this
filtering (although I haven't seen post client, which is an annotation as
well, used for this).

~~~
julio_the_squid
That's about what I envisioned while reading the article. All you'd need is
different sections on Twitter, and to choose which to subscribe to when
following someone. All the clients would need to know is that you're
subscribed to X's stream 0 but not their stream 1.

------
pmcginn
I've never met the author of the article, but I assume he's the type of person
who is bothered when his Reader unread tally switches over to the plus mark.

I mean, honestly, tweets are 140 characters or less. The average tweet takes a
handful of seconds to read. Is my time so important that I can't spend a few
minutes of my day learning what my friends thought was important to share with
me? Must I tailor their interests down to only those that I deem relevant? Am
I so bad at skimming content or choosing which content is worthy of in-depth
inspection that I must have a computer do the editing for me?

Obviously, the answers to those questions are highly subjective and use-
dependent. Personally, the only editing I need is the unfollow button. If I
respect a person enough to want to hear what he has to say, I gladly take the
risk that sometimes his output won't be immediately relevant. (As an aside, a
year ago I met one of my favorite journalists. I asked how his dog was, since
he had been tweeting about his new puppy. It was a nice ice breaker. I didn't
follow him for dog-training updates.)

I should make a disclaimer: I'm not a heavy Twitter user. The ratio of
feeds:twitters followed for me is something like 10:1.

~~~
avk
I respect your sympathy for and willingness to read what those you follow deem
interesting. But I think the author of the blog post is speaking of a problem
that emerges at a larger scale when you follow more & more people. Your stream
becomes bigger while the amount of time you have stays the same so your
options are 1) unfollow people, 2) don't read everything, or 3) put in more
time. Personally, I don't like 1 or 2 because I feel like I'll miss out on
timely, relevant information and 3 isn't an option for most of us. Thus, the
need for a solution like the one the blog post describes. Shameless plug:
here's how I'm trying to fix the problem - <http://slipstre.am/>

~~~
pmcginn
I agree with all of that, except I've found 2 to be the most useful,
especially with regards to how I manage Reader. My time spent in feeds has
been a lot less hurried since I learned to stop worrying and love the Mark as
Read button.

I just feel like I personally get information in so many different ways, that
if something is big enough, I'll see it in many places and am pretty much
bound to read one of them. I can count on seeing any big tech story here on
five different feeds, Google News (go to time-killer on my phone in the rare
occasion my feeds are empty), several twitter accounts, and probably here on
HN (more like 15 feeds if it has Apple in the title.)

I went to your site but you lost me with the lack of content--maybe at this
stage that's important to weed out the laziest of the "testers." Have you
thought of doing just a quick before/after screenshot to show default twitter
vs. your application, or is it too early for that?

Edited to add: er, my fault. My brain skips over flash almost automatically. I
didn't watch your video.

------
petercooper
The author misses a trick (or I missed a mention of it). Filtering out by the
_client used_ by the third party helps a lot. You can immediately filter out
tweets coming automatically or semi-automatically from systems like Gowalla,
Foursquare, blip.fm, Sharefeed, last.fm, auto news posting services, or even
just through the Twitter API. Anyone else who posts crap in a manual,
deliberate way should just be unfollowed.

------
pskettiwestern
This isn't _Twitter's_ garbage problem. It's the garbage problem of the people
this guy follows. Seems to me that building out a complicated system for
channeling different tweets would hardly be worth the resultant complexity to
Twitter and their users.

Is it so much to ask to employ a little restraint in publishing, and on the
other hand, a little taste in following?

~~~
jmillerinc
I disagree. Some of the tweets I subjectively label as garbage might be legit.
Hey, maybe some guy's mom is on twitter and wants to know what he had for
lunch. It's not a people problem, it's a drawback of the platform that all of
this different subject matter has to be broadcast in the same stream.

~~~
rogerclark
It's a people problem, but the problem is you. If you don't care about what
he's having for lunch, then I'm not so sure what's so hard about ignoring his
tweet. It's only going to be on your screen for a split second as you're
scrolling through other stuff. Presumably anyone you follow would post more
things that are worthwhile than not; otherwise, you should just find new
people to follow.

One of my friends tweets quite a bit about Magic: The Gathering. I don't care
about that. Somehow I get along just fine by not paying much attention to the
posts I don't care about. I don't quite understand your dilemma.

~~~
jmillerinc
I value my time and believe in improving things. What more can I say?

~~~
rogerclark
What I intended to convey was that your mind already possesses a filter for
this type of thing that you could never hope to reproduce in software. I just
don't understand what's preventing you from unconsciously applying it, since
plenty of other people seem to be doing perfectly fine at doing that.

------
jpcx01
I would post much more "garbage" if there was a decent filtering system. I
only post substantive tweets as I'm worried I'll annoy people with useless
stuff. Why would someone care about where I'm eating or what movie I'm
watching if they don't even know me. So I keep it focused on purely technical
topics.

------
mgrouchy
I honestly don't see this as a problem. I don't read every Tweet that comes
through my stream its kind of random access information, I look at twitter
every now and then and if something interesting strikes me I look into it.

While filtering would be a good feature, I think saying its a killer feature
is going a bit far. I don't think the reasoning that you could follow twice as
many people would make much sense from a roi perspective. If your interested
in roi, you wouldn't even worry about following people, you would create
custom searches about topics you are interested in and deal with those. You
would certainly get more info for your time spent on twitter that way and much
more focused. You can do this with basic tools like Tweetdeck search columns
or even or saved searches in the standard twitter web interface.

~~~
jmillerinc
Custom searches don't capture all of the good information that's out there
because I may not specify the right keywords. It's ANTI-search that I'm
personally looking for.

~~~
mgrouchy
You are not going to catch all the good information that is out there with
your follow list. So searching fills in the gaps(if your interested in a
subject area or something). Twitter lists help too. (ie/ gather up all the
people who are known to talk about x subject and put them in a list)

------
techiferous
I have the same problem with Hacker News. I would love to be able to filter it
based on my preferences.

~~~
mistermann
If you caught the thread on whether _search_ should be available on HN, and
pg's response, I feel pretty safe saying that ain't going to happen, ever.

~~~
techiferous
Thanks. I'm okay with that. Hacker News seems more like a labor of love than a
monetizable product, so I don't expect pg to focus on it.

------
teaspoon
Anyone else find the "noise" posts less banal than the "signal"?

~~~
jrockway
My thoughts exactly. "quote from a blog post that everyone read three weeks
ago" <http://shrt.rl/omgaweomse> #imanentrepeneuroneoneone!!

I read Twitter because I like to see what "meaningless" activities my friends
are up to, not because I want a rehash of HN.

------
PanMan
BTW, twitstat mobile, <http://m.twitstat.com/> has some of this filtering in
place: don't show 4sq, etc, if you want.

------
Super74
I totally agree and was just contemplating the same topic. Spooky.

What gets me is even some of Twitter's own employees don't know how to use the
service in a 100% useful manner. And you would be surprised who....Tweets
like: "He said that?" or "Can't wait to see it" without references are
GARBAGE. I also don't care that you are currently eating or thinking, in
general, without learning SOMETHING in the process. They don't teach that at
the company? In this age of videos, location-based tagging and pics, we have
the power to do so much more than just talk.

Finally, I think this is an author controlled issue. We have the power to
dictate what we write/ tweet and should respect our audience for listening/
following us. Unless we don't understand the tool, then it's a training
problem.

I, @super74, try to direct message any personal or conversational tweets to
only those who know what I'm talking about. My public messages tend to lean
towards disseminating information, sharing creative ideas and offering my
opinion on public topics.

Although I'm not perfect and may err from time to time, I have occasionally
looked back and have been somewhat pleased with the bulk of my tweets.

------
avk
Excellent post! I came to the same conclusions in February and am working on
fixing the Twitter garbage problem (what I see more broadly as information
overload) with Slipstream: <http://slipstre.am/>

Would love your thoughts here or over email: arthur@slipstre.am

------
bartl
>A first approach is to simply filter out tweets by keyword. I think of this
as anti-search: specify a keyword, and never see any updates containing that
word.

>Keyword filtering alone could probably solve 1/3 of the Twitter garbage
problem

This sounds like an old problem, reiterated to me, with working solutions:
[kill files](<http://en.wikipedia.org/wiki/Kill_file>) in news reader
programs, especially [score files](<http://en.wikipedia.org/wiki/Score_file>).

------
macemoneta
If you are interested in particular subject matter, as opposed to genuinely
interested in the people you follow, just run a search or advanced search.
Both auto-update and provide filtering functionality.

------
Tichy
I can't tell which of the examples are supposed to be the valuable ones...

~~~
nandemo
Oh, come on:

 _"A startup is a human institution designed to deliver a new product or
service under conditions of extreme uncertainty."_

 _"I just ousted Owen S. as the mayor of Samovar Tea Lounge on @foursquare! "_

Clearly, the _second_ tweet can possibly be informative to _some_ people.

------
robgough
Would be nice too if I only saw an RT once, not each time someone else on my
list RT's it.

Heck, why not update the RT'd by info to include several people - to make it
clear. But please, only show me the thing once!

Only a slightly different note, would be nice if twitter would store the
"read" status - so that my different clients know what I've read and what I
haven't. If it was core twitter functionality then different clients could all
sync together nicely.

