Data Mining Flaws Exist that Every Business Needs to Be Aware Of

Rick Dane
Much is made of the proliferation of data available on the internet and its increasing use in data mining practices for business use, political use as well as illicit use by cyber criminals. With even a fairly small-sized operation an organization gain scrape a great deal of data from various Web sites and piece together a picture of different segments of the population. For example, a company that sells video games can scrape profiles on MySpace and Facebook and get a profile of the average use of their service. In this instance, let's say that the company comes up with information saying that 75 percent of the users who mentioned their product on one of the social networking Web sites were between the ages of 13 and 28. This surprises the company executives, who had initially thought that that their main user base was aged 13 to 19, a stark difference from 13 to 28. In this hypothetical example, the company decides to shift their focus and launches a new advertising campaign aimed at adults in their mid-20s. What they haven't taken into account, however, may come back to hurt them.

I use social networking sites on occasion, one thing I don't do, however, is take them seriously or as something on which I need to be honest with information that I post about myself. As someone who does a business ventures online I know of many people and businesses who create fake profiles for their business. These profiles are made to look like any other profile but obviously the data in them would be of no value. To further add to this "watering down" of data, is the fact that many profiles created by real people are still done so for their amusement and the amusement of others, and are not necessarily a reflection of the actual person. If you want to test this, just take browse through MySpace profiles, you will undoubtedly come across a few profiles that have a young looking person who has listed their age as being much older than their picture would indicate, for example a teenager who lists their age as 50.

Why people use fake ages and other information on their profiles is anyone's guess as it varies from circumstance to circumstance. The one thing that is clear, however, is that companies need to be careful when using this data as it is most certainly not going to be completely accurate and, in some cases, could be laughably inaccurate, a warped picture based upon completely false data that, if used to guide a business, could lead to catastrophe. In the earlier example I gave, if the company had calculated in profiles from people listing their ages as "60" when they are really "17" then it wouldn't take long for their statistics to go way off into fantasy land. One might say that you could just throw out ages over a certain number since the company knows these aren't accurate, but even in this case, it only takes people exaggerating their age by a few years to once again throw off the data.

Certainly there is benefit to be gained by companies from using data scraped from social networks but to take the data gathered as being absolute would be, at best, foolish, at worst it may lead your business down a dark unpaved road and off a cliff.

Sources:

http://www.thearling.com/text/dmwhite/dmwhite.htm

http://www.statsoft.com/textbook/stdatmin.html

  • Much is made of the proliferation of data available on the internet and its increasing use in data m
  • Certainly there is benefit to be gained by companies from using data scraped from social networks bu
  • Why people use fake ages and other information on their profiles is anyone's guess as it varies from

To comment, please sign in to your Yahoo! account, or sign up for a new account.