‘Big Data’ is a term for the large amounts of data that is collected and then sorted through. Sometimes the results can be very intrusive. An excellent example is the purchasing history data that Target used to determine if someone would be interested in baby products, since they were most likely pregnant. A father was upset when Target started sending his teenage daughter baby products.
He complained to Target that they were sending his daughter baby products and they she was only a teenager. It turned out that she was pregnant and Target’s big data algorithm knew before he did.
For the successes of big data algorithms there are a number of failures. In 2011, Google Flu debuted- it was supposed to be able to determine where there were cases of the flu by what people were searching for ‘flu’ on Google. It then used this information to predict flu outbreaks. It turned out that its predictions were not accurate.
There have been many attempts to use big data to play the stock market. Usually, although the data works well on historic patterns, it fails to predict future behavior. Big data has been used on company traits to try to determine which company will be the next Google or Amazon. Big data is used to analyze consumer buying habits to determine which companies to invest in. Other characteristics from the big data have been identified to determine investments that will be lucrative. In general, these big data algorithms have not been wildly successful.
Big data can be used to determine information about people, which can lead to privacy issues. It can also lead to completely erroneous correlations. In the US, June, July, and August are a popular time to purchase swimsuits. In Australia, that is winter, so it is not a good time to sell swimsuits. If the dataset included the US and Australia it is necessary to understand that the swimsuit sales correlate with warm weather not a certain time of year. Determining whether a correlation is actually meaningful is a bit of a black art.