Friday, July 20th, 2012
Sentiment analysis isn’t perfect and anyone who has tried to do it with social media data will confirm that. The nuances of language, including sarcasm, emoticons, slang, spelling errors, grammar creativity, and more mean that 100% accuracy is simply unattainable. But in market research, we aren’t looking for 100% accuracy, not even 90% accuracy. We know those kinds of numbers are unrealistic. What we expect, however, is to see that social media data has some relationship with real world data. And that is what we investigated here.
This project began by simply finding a third party source of fuel prices and we turned to Gasbuddy to give us average monthly US gas prices. Given that we estimated data points by carefully eyeballing a chart on the screen, the Gasbuddy numbers aren’t accurate to the last decimal place. But if you compare our Gasbuddy chart with the official chart, you’ll see that the trend is accurate. This is our criterion dataset.
The second dataset came from Conversition’s Evolisten database. We collected hundreds of thousands of verbatims from thousands of websites all of which in some way referenced fuel or gas prices or costs. Twitter, Facebook YouTube, Flickr, any type of website where people felt like sharing their opinions about gas prices was our target. After cleaning out the spam, we measured the sentiment of the remainder of opinions. Then, we calculated the inverse of the sentiment score. For example, a score of 5 (very positive) was changed to 1 (very negative), and a score of 1 was changed to a 5.
What you see in this chart is a correlation of 0.65. In other words, as the price of gas increases, sentiment decreases.
It just makes me think… what if everyone tweeted and messaged that the price of gas was really low. Could we turn this correlation into causation? It’s worth a try!

Comments Off
Category conversition | Tags: Tags: conversition, evolisten, fuel price, gas price, sentiment,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.
Sunday, October 18th, 2009
tweetfeel/biz is ready to leave the nest and we’re ready to announce it at two upcoming conferences.

First off is TWTRCON in DC on October 22 where we are a sponsor and will be running demos of tweetfeel/biz. Then, on October 27, we will be at the 140 conference in LA also sponsoring and running demos for attendees. Both Jean and Tessie will be there signing autographs as well so be sure to stop at our booth! They’d also love to give you a personal demonstration of how you can use tweetfeel/biz to help your business. See you there!


Comments Off
Category conversition, tweetfeel | Tags: Tags: 140, 140conf, sentiment, tweetfeel, twitter, twtrcon,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.
Friday, July 17th, 2009
tweetfeel gives you a taste of it, but really, what is sentiment analysis all about?
At it’s most basic level, sentiment analysis involves reviewing messages or conversations and evaluating the writer’s opinion towards the topic. For instance, someone who tweets a message such as “I like Chuck Norris” is telling people they have a positive opinion towards Chuck Norris. On the other hand, someone who writes “Chuck Norris sucks” clearly has a negative opinion. After assembling all of the messages that mention Chuck Norris, one can easily bucket them into messages with positive opinions and messages with negative opinions.
But, the easy part isn’t so easy. First, one needs to determine which sentiments are positive or negative. Obviously, we’re talking automated sentiment analysis so we need some solid indicators for positive opinions such as words like happy, love, or delightful. Solid indicators for negative opinions would be words such as hate, stupid, or ugly. Simply coming up with that list is difficult enough, but some words just aren’t so easy to assign to buckets. For instance, is “Way to go” positive or negative? People often use this phrase in a positive way but in recent years, it has become a very sarcastic remark that one uses in a negative fashion. The written word is full of words and phrases that have contradictory, ambiguous, or sarcastic meanings. Humans can only catch about 85% of those which means it’s pretty much impossible for an automated process to catch all of them either.
Another problem with bucketing messages is that people don’t think linearly. If I say “I love Chuck Norris and football sucks,” it’s clear to people that I’ve messaged two distinct opinions about two distinct topics. Once you start getting into more complicated grammar though, it can become impossible to tell which topic was rated which way. Automated evaluations of the message have a much harder time differentiating the two. It’s a topic of great interest to academics and eventually, we’ll figure it out.
In the end though, it’s not about individual messages. It’s not about me and what I have to say. It doesn’t matter that your uncle Bob is always wrong and that your Aunt Mary doesn’t know who Chuck Norris is. It doesn’t matter that 5% or 10% of the messages are in the wrong bucket. What matters is the collective wisdom, the wisdom that comes from large sample sizes. When you average opinions across hundreds or thousands of people, the final answer is usually the right one.
Comments Off
Category tweetfeel | Tags: Tags: chuck norris, conversition, emotions, feelings, sentiment, tweetfeel,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.