

5000 human feelings in real time - from blogs - khmel
http://wefeelfine.org/

======
khmel
At the core of We Feel Fine is a data collection engine that automatically
scours the Internet every ten minutes, harvesting human feelings from a large
number of blogs. Blog data comes from a variety of online sources, including
LiveJournal, MSN Spaces, MySpace, Blogger, Flickr, Technorati, Feedster, Ice
Rocket, and Google.

We Feel Fine scans blog posts for occurrences of the phrases "I feel" and "I
am feeling".

Once a sentence containing "I feel" or "I am feeling" is found, the system
looks backward to the beginning of the sentence, and forward to the end of the
sentence, and then saves the full sentence in a database.

Once saved, the sentence is scanned to see if it includes one of about 5,000
pre-identified "feelings". This list of valid feelings was constructed by
hand, but basically consists of adjectives and some adverbs. The full list of
valid feelings, along with the total count of each feeling, and the color
assigned to each feeling, is here.

If a valid feeling is found, the sentence is said to represent one person who
feels that way.

If an image is found in the post, the image is saved along with the sentence,
and the image is said to represent one person who feels the feeling expressed
in the sentence.

Because a high percentage of all blogs are hosted by one of several large
blogging companies (Blogger, MySpace, MSN Spaces, LiveJournal, etc), the URL
format of many blog posts can be used to extract the username of the post's
author. Given the author's username, we can automatically traverse the given
blogging site to find that user's profile page. From the profile page, we can
often extract the age, gender, country, state, and city of the blog's owner.
Given the country, state, and city, we can then retrieve the local weather
conditions for that city at the time the post was written. We extract and save
as much of this information as we can, along with the post.

This process is repeated automatically every ten minutes, generally
identifying and saving between 15,000 and 20,000 feelings per day.

------
khmel
When the applet is first opened, the initial dataset consists of the most
recent 1,500 feelings collected by our system. The applet's panel can then be
used to arbitrarily specify different populations, constrained by any
combination of:

\- Feeling (happy, sad, depressed, etc.)

\- Age (in ten year increments - 20s, 30s, etc.)

\- Gender (male or female)

\- Weather (sunny, cloudy, rainy, or snowy)

\- Location (country, state, and/or city)

\- Date (year, month, and/or day)

