Theme 3: Integrating third-party data into predictive analysispart 1, part 2).
Perhaps one of the most significant opportunities for organizations using predictive analytics is incorporating new relevant third-party data into their analysis and decision-making process. Investment in a targeted and relevant dataset generates far greater returns than spending time in developing sophisticated models without the right dataset.
Suppose a wedding gown retailer wants to pursue a geographical expansion strategy. How would they determine where to open new stores? For that matter how should they evaluate the performance of existing stores? Should a store in Chicago suburbs produce the volume of business as a store in Austin?
To answer the above questions, you need a lot of data that is not within the organization’s firewalls. One will need to know where people are getting married (demand in a market), how many competitor stores sell wedding gowns in the same area (competitive intensity in a market), how far potential brides are willing to travel to buy a wedding gown (real estate costs in city vs. suburbs will be vastly different), income and spend profile of people in the market (how much are customers willing to spend)
Marriage registration data from NCHS, socio-demographic data from a company like Claritas or US census, business data from Dun & Bradstreet or InfoUSA, cost data for real estate and maybe a custom survey data of potential brides should all be input variables into the store location analysis. Data about existing store sales and customer base are important, but they tell only part of the story and do not provide the entire context to make the right decisions.
Using the above data the retailer will be able to identify favorable markets with higher volumes and growth in marriages and appropriate competitive profiles. It can also use existing store performance data to rank the favorable markets using a regression or cluster analysis and then corroborate the insights using mystery shopping or a survey. Such a data driven methodology represents a quantum improvement over how new store locations are identified and evaluated. While the datasets are unique to the problem, I find that such opportunities exist in every organization. A clear framing of the problem, thinking creatively about the various internal or external data, and targeted analysis leading to significantly better solutions is what information advantage is all about.
We are in the midst of an open data movement, with massive amounts of data being released by the government under the open government directive. Private data exchanges are being set up by Microsoft, InfoChimps among others. Not to mention all the new types of data now available (e.g., twitter stream data). Companies that build capabilities to identify, acquire, cleanse and incorporate various external datasets into their analysis will be well positioned to gain the information advantage