How important is sampling? Well, how important is gay marriage?

August 8, 2010 | Comments Off

Are you in favor of gay marriage? Are you against gay marriage? One way or the other, many people have extremely strong feelings towards this topic. It’s a topic that has the ability to quickly divide us and turn normally civil people into very angry people.  There is almost no room for error.

But let’s take a quick step backwards. When a data provider gathers online conversations, or crawls the internet searching for conversations about a topic, it is impossible for them to gather every single conversation. We can’t all be Google but we can gather very large  samples of conversations. We can crawl blogs and microblogs and forums and consumer websites, searching as much as we can for relevant conversations.

Let’s say I gather my sample of gay marriage conversations by focusing on certain websites. You gather your sample by focusing on certain websites. Someone else gathers  their sample focusing on certain websites.  Though we MAY all touch the same websites, we touch them all in very different ways. The internet is one single giant database, but we now have three different completely collections of conversations about the identical topic. What are the consequences?

In a previous blog, we saw that Twitter data is unlike other online data. Twitter has much higher highs and lower lows, likely because twitter more closely resembles mouthing off, spur of the moment, off the cuff remarks.

Twitter is just one of several very popular microblogs represented by the red “Micro” line in the chart. Look at how the positive emotions towards gay marriage range from a high of about 22% of conversations in April to a low of about 1% in May.  If you had gathered a sample of conversations about gay marriage that focused heavily on microblog data, you would think people’s online opinions about gay marriage are all over the place.

(Please note: This blog only shows positive emotions. It does NOT show the % of opinions that fall in the neutral range or the negative range. Do not interpret the 1% positive to mean the other 99% is negative.)

We know that just looking at Twitter or microblog data is not a fair measure of online opinions. So how should we measure online opinions towards gay marriage? Should we let opinions from blogs count a lot more because they are well thought out  (the light blue line) or should we let blogs and microblogs contribute an equal amount toward the overall opinion (green line)? Should we let the data fall however it wants to fall (purple line) or should we make sure that each website contributes an even or consistent amount of results each month (black line).

What is clear from this chart is that the way you gather conversations from the internet determines what your results will be. You can create a more positive or more negative average opinion easily enough through careless or unthoughtful sampling.

We’ll let you ponder which method of sampling produced the correct answer, and whether we’ve even provided the correct answer here. (We haven’t.) But the conclusion to draw from this demonstration could not be more serious or important. If you’re going to tackle social media research from a social policy point of view, you had better be an expert in sampling.
.

.
[Method: Over 30 000 opinions gathered from thousands of websites, processed, cleaned, validated through Evolisten]

.

Related links
Quirks Magazine: Thoughts on sampling and weighting in social media research
Tracking the mood of Americans: Use Twitter if you want to prove they’re happy


We're sorry, but comments are closed.

No Responses to  “How important is sampling? Well, how important is gay marriage?”





By submitting a comment here you grant Conversition Social Media Market Research a perpetual license to reproduce your words and name/web site in attribution. Inappropriate comments will be removed at admin's discretion.