Monday, November 8th, 2010
MRA recently released version 1 of the MRA/IMRO Guide to the Top 16 Social Media Research Questions, a tool to help newcomers and vendors communicate with each other about this new datasource and method. Conversition was a key contributor to this document which is now available on the MRA website.
.
This blog is #4 in a series of 16, each one addressing Conversition’s viewpoint on one of the items in the guidelines. We welcome your questions and comments, and look forward to further discussions on this exciting new trend in the market research industry.
.

deanjenkins from morguefile
.
How reliable are SMR results?
.
Reliability and validity are topics of particular interest to Conversition because they are staples of the market research industry that are not always well understood by market researchers and users. These quality measures apply to all methods of research, including survey and focus group research, and for best results must be considered jointly.
To begin, reliability refers to results which can be replicated across numerous occasions. For instance, if several different people were to conduct the same research, each person would achieve the same results. Or, if the same study was conducted across several time periods, each study would each achieve the same results. No matter who or where or when, every time the study is performed, the results should be the same. (Unless of course you are conducting a pre/post or time series study where you are looking for different results each time.)
Because different survey panels have different incentive and recruitment strategies, high reliability across survey panels is generally not expected nor appropriate. The same follows for social media research. Every vendor has different methods of data collection and treatment, and they each follow more or less stringent standards and processes. But, within each social media research vendor, an acceptable level of reliability over time and among product categories should be achievable.
Now, remember that reliable results do not quality make. One can easily achieve the same wrong results over and over again by using the same bad survey or bad focus group or bad SMR over and over again. This brings us to validity, the significant other of reliability. Validity refers to results which reflect exactly what was intended to be measured.
When we ask people to select their favourite item from a list, we achieve validity when people read the entire list instead of choosing an item at the top of the list. We achieve validity when people tell us which political candidate they honestly plan to vote for instead of giving us the name of most socially desirable candidate. Of course, people are not robots and these validity issues pop up all the time, but, we have learned many research techniques to solve these problems.
Validity in social media research comes down to the treatment of data. Data quality measures must ensure that the right data is being selected for analysis. As such, data for Apple Computers must not include data for apple pie. Similarly, data for British Petroleum (BP) must not include data for Basis Points or Blood Pressure or Boston Pizza.
Data quality practices extend beyond simply gathering the right set of data. They must be applied to other data treatments as well including sentiment analysis and content analysis. Thus, sentiment analysis must distinguish between dope that is smoked illegally and dope that is hip, cool, and totally rad. And, content analysis must distinguish between the orange fruit and the orange color and the Planet Orange charity so lovingly built by the folks at ING.
For all of these purposes, validity can be evaluated with a fairly simple process.
- Randomly select 1000 records from across different topics, dates, and data sources.
- Score each record yourself. a) What brand name does it reflect? b) What sentiment score does it deserve? c) What variable does it reflect?
- Run the data through a second system whether it be your automated processes or a second person.
- Match the two sets of results together.
- Calculate the percentage of results that agree. a) What percentage of the data was actually about the intended brand name? b) What percentage of sentiment scores matched? c) What percentage of variables were correct?
.
The most important component of this validation work is that the two sets of data are scored blindly. In other words, I don’t know how you scored them, and you don’t know how I scored them.
Reliability and validity are essential components of all quality market research methods, including social media research. You can have one without the other, but without both you really have nothing. You need to ask your social media research provider how they address validity and reliability. Are these words essential components of their work? Do they have processes in place? Are those process grounded in solid research standards?
Go forth and inquire. It’s time.
.
.
Related links
MRA IMRO Guide #1: Advantages and Disadvantages of SMR
MRA IMRO Guide #2: Datasources of SMR
MRA IMRO Guide #3: Data Fusion and SMR
MRA IMRO Guide #4: Reliability of SMR
Comments Off
Category conversition | Tags: Tags: conversition, guide, guidelines, imro, market research, mra, mrx, navigating, reliability, smr, social media research, validity,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.
Sunday, July 25th, 2010
An article in the New York Times this week discussed a research project that is attempting to track the mood of Americans using Twitter as the data source. The project involves researchers from Northeastern University College of Computer and Information Sciences and Harvard Medical School. It is certainly reasonable that a group of scientists can develop algorithms that accurately predict the mood of Americans. However, Twitter data is not simply and instantly predictive of the general population of Americans. Given that only 7% of people who are online even use Twitter, it is risky, and can easily lead to wrong conclusions.
Want to see a real example? No problem.
Let’s look at consumer opinions related to one specific product, the iPad.
- First, we gathered thousands of opinions from across the internet, from blogs, microblogs, forums, question and answer sites, personal sites, all of which mentioned the iPad. Sites like YouTube, Blogger, Twitter, and thousands more were included.
- Then, we categorized all of the conversations into two groups, 1) everything from Twitter and 2) the entire internet space.
- Next, we determined the level of emotion for every online conversation. Specifically, we determined whether the emotion of the conversations was extremely happy, somewhat happy, neutral, somewhat unhappy, or extremely unhappy.
- Finally, we created the pretty little charts that you see on the right of this page.
What’s the first thing you notice from these charts?
Not one single chart has two bars that look the same. What is the percentage of tweets that reflect an extremely happy opinion? 15%. What is the comparable number for the entire internet? 5.6%. I hope it’s not just me, but 15% doesn’t look like 5.6%, not even if the 5.6% is averaged up to 6%. There is a big difference in the percentage of people who have extremely happy opinions on Twitter vs the entire Internet.
The same trend is apparent when we look at the percentage of people who are extremely unhappy with the iPad. 11.3% of tweeple are extremely unhappy compared to just 1.9% of the entire internet space. All five of the charts lead to the same conclusions. Twitter results do not equal Internet results.
It’s not 1 to 1
Clearly, the relationship between Twitter data and total internet data is not 1 to 1. It’s impossible to gather Twitter data, analyze the sentiment, and be confident that it represents a wide, more general audience.
Perhaps people on Twitter have more extreme opinions than everyone else; perhaps they are less likely to guard their remarks so that the more extreme opinions are shared; perhaps Twitter opinions are in fact the closest to the average American opinion. Whatever the reason, it is undeniable that the mood on Twitter is unlike anywhere else.
Prepare to be wrong. Prepare to explain contradictions. Generalize Twitter mood at your own risk.
Links that might interest you:
iPad on EvoPlay
New York Times article
Conversition on Facebook
3 Comments
Category conversition | Tags: Tags: business research, harvard, internet research, invalid, mood of americans, new york times, qualitative research, research examples, sampling, twitter, twitter mood, validity, weighting,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.
Thursday, June 3rd, 2010
In the right hands, text analytics can turn a nightmare into a dream come true. With the increasing popularity of social media research, companies are regularly collecting thousands, and even millions, of verbatims that require analysis. On the other hand, human coders have been carrying out text analytics for decades now, and in particular, why use automated systems when humans are doing the job so well?
Here are some guidelines to help you decide which method is right for you.
- Sample sizes – Sample size will likely be the most prominent variable in choosing a method. If you’re working with thousands or millions of verbatims, automated systems are your best friend. On the other hand, databases of several hundred verbatims are best done by hand. Remember, even if an automated system is used on a small dataset, you would still end up reading every verbatim to get a human flavor for the data. If you’re going to read every verbatim, you might as well do the analysis by hand.
- Number of constructs – If you normally use only a small number of predefined constructs, the human method is works great. Coders can easily remember all the intricacies of the coding scheme if it is strict and well-defined. And of course, it’s fun and interesting to get your hands right in there. But, if the research plan uses coding systems with hundreds or thousands of constructs, it is simply impossible for coders to remember all of them with sufficient within or between-rater reliability. Automated systems can really ease this process.
- New constructs – Are you open to discovering and implementing any number of new constructs? If you’re open to adding a handful of new constructs, then automated systems won’t make it much easier for you and you will be happy with your standard manual processes. But, if you want to be surprised and see where the data takes you, automated systems can provide that.
- Timing – This is the business world, after all. Are you in a rush? Are the results required yesterday? Well, if the data is already in a clean, computerized format, an automated system will work nicely. But, if your data consists of 20 sets of handwritten notes, most of which are barely legible, you might prefer the brain power of human coders who can turn scribbles into codes without any intervening translations.
- Coder reliability – Are you able to train and retain enough reliable coders? If you have a good team of trusty reliable coders, then keep them happy. They are valuable people who should be treated with kid gloves! But, if you’re having trouble finding those gems, an automated system will ensure that a high level of within-rater and between-rater reliability is maintained. It will even eliminate within-pair compromise.
In the end, you must choose the system that works best for you. Whether automated or human, one method with have the pros and cons that suit your specific needs. Choose well!
Comments Off
Category conversition | Tags: Tags: constructs, content analysis, conversition, evolisten, market research, rater, reliability, sample size, sentiment analysis, speed, text analysis, validity,
Social Networks : Technorati, Stumble it!, Digg, de.licio.us, Yahoo, reddit, Blogmarks, Google, Magnolia.