Feeds:
Posts
Comments

Theme 7: Prototype, Pilot, Scale

Edison did not invent the light bulb. He took a working concept and developed hundreds of prototypes rapidly, tested them and along the way figured out improvements that were required to scale his invention for commercial use. Julian Trubin writes about the prototyping process:

In 1879 Edison obtained an improved Sprengel vacuum pump, and it proved to be the catalyst for a breakthrough. Edison discovered that a carbon filament in an oxygen-free bulb glowed for 40 hours. Soon, by changing the shape of the filament to a horseshoe it burned for over 100 hours and later, by additional improvements, it lasted for 1500 hours.

Edison’s primary contribution to the development of light bulb was that he carried the idea from laboratory to commercialization, taking into consideration not only technical problems, but also issues like economics and the manufacturing of bulbs.

We took a leaf from Edison’s book when we developed, our prototype, pilot and scale approach to deploy analytics solutions for clients.

In our experience rapid prototyping is essential to show the value of the initiative to senior executives.  One of our health care clients wanted help in institutionalizing data driven culture within its sales organization especially in identifying and focusing sales effort on high potential customers. At first, we developed a prototype predictive scoring model to identify the high potential customers. Mapping the results of the model to existing effort demonstrated that greater than 50% sales force time was used ineffectively and the client was leaving a lot of dollars on the table.

However, for organizations to see bottom line benefit, adoption of predictive analytics based solution is key. Piloting helps refine the prototype and plan for potential adoption pitfalls amongst the end users.  At our healthcare client, we knew that there were skeptics amongst the sales people who do not trust the model and there were change management blind spots which we wanted to discover prior to the national roll out. We designed a pilot with the following objectives:

  1. Prove the validity of the predictive model
  2. Create evangelists from the sales team of the pilot regions
  3. Identify the big data gaps and establish a process of continually refining CRM data
  4. Establish and refine the key performance metrics to report to senior management
  5. Understand the key questions and concerns of the sales team in adopting the system

We collected a lot of rich quantitative and qualitative data during the pilot phase, which conclusively proved the value of the predictive model but also provided us with insights to incorporate into the roll out process.  For instance we learned that in a few instances customer address data was not getting updated  in the data warehouse and that sales managers wanted to understand the factors that went into calculating the predictive score of customer before they felt comfortable using it.

Scaling the pilot requires cross organization coordination and strong program management to ensure that the pilot learnings are incorporated in the roll out, there is a positive word of mouth buzz for the solution and there is minimal impact to day-to-day business. The inputs from pilot helped us better design the compensation rules and reporting metrics, which helped us roll out the system which head the trust of the sales force.

Our client saw significant uplift in revenue in the first 3 months of rollout. The sales organization started realizing the value of data driven approach and hired a team to support other sales analytics initiatives.

Are there other tips and tricks which you have successfully used to deploy predictive analytics solutions?

Theme 6: Delivering the prediction at the point of decision is critical

In 2007, my wife worked as a hospitalist in Charlotte.  Around that time, I noticed a strange pre-work ritual she followed. She took printouts of a few pages from the Wal-Mart website before going to work.  The behavior was surprising and one day I asked her about the mysterious printouts. It turns out that she was printing out the list of generic drugs that were covered in Wal-Mart’s $4 prescription plan.

Like most physicians, she used to struggle with the fact that some of her patients were not complying with their medications as they were not able to afford the medicines she was prescribing. Wal-Mart had introduced a plan, where they sold some common generic drugs at $4 per prescription.  A lot of patients were able to afford the Wal-Mart medicines and it was popular amongst the doctors to prescribe from the Wal-Mart covered drugs. However there was one issue. The Wal-Mart covered drug list was not integrated with the Epocrates system, the mobile clinical decision support software that most doctors use to verify drug dosage and interactions prior to writing the prescriptions.  At the time of writing the prescription, the doctor did not know whether the specific drug was covered within the Wal-Mart plan, unless she chose to make the extra effort and carry a printout of covered drugs and refer to it prior to writing every prescription.  A great idea suddenly became less attractive to act upon, as the right information was not made available at the point of decision.  I refer to this as the last mile decision delivery problem of predictive analytics projects.

Most of the effort in analytics projects is spent on defining the problem, aggregating data, building and testing models. Getting the information of the model to the decision maker at the point of decision is at many times an afterthought.  However, the benefits of the project are dependent on solving this critical last mile problem.

In my experience decision delivery is challenging as it requires cross-organizational coordination. Successful analytics projects are a partnership between the analytics, business and IT groups.  The analytics group needs to work very closely with decision makers or the end users to put the analysis results in context of the decision maker’s workflow.  The actual delivery of the information is done through a mobile handheld device to a distributed sales force, CRM system integration for call centers or executive dashboard delivery through reporting system integration. All of them require close collaboration with the IT group which has to take the results of a predictive model and integrate it with the relevant front end or reporting infrastructure.  Then there is end-user training to ensure the end-users know what to do with the new information. The program management effort required to execute such a cross-organization initiative is significant and very often not anticipated or planned by the project sponsors.

A good program manager is critical to most complex predictive analytics projects. He/she is able to coordinate the various stakeholders to align on problem definition, outcome format, technology integration and training to drive user adoption of predictive analytics solution. Something to keep in mind as you plan your predictive analytics initiatives.

Have you seen predictive analytics projects getting derailed due to lack of coordination  between various groups within the organization or under investment in program management resources?

PS: Last year Wal-Mart solved its last mile problem and integrated the covered $4 prescription drug list into the Epocrates application.

Previous parts of the series are available here: part 1, part 2, part 3, part 4, and part 5

Cross posted on TheInformationAdvantage.com

Theme 5: Good data visualization leads to smarter decisions.

The above visualization has arguably saved more lives than all predictive analytics initiatives combined.  It is visualization created in 1854 by Dr. John Snow. As Miles Dowsett has written,

In 1854 London was the biggest city the world had ever seen, and was literally drowning in its own filth. There was no sewage system in place and those that lived in the capital literally threw their waste and cesspits out into the crowded streets and river Thames. As a result, London was a disgustingly smelly place to be and was periodically engulfed by disease – most notably Cholera.

However at that time it was believed that it was the smell of London that was the root cause of diseases such as cholera. It came to be known as the Miasma theory of disease. However, Dr. Snow was skeptical of the Miasma theory of smell causing disease and in his 1849 essay On the Mode of Communication of Cholera actually introduced the water borne theory of Cholera. He presented his findings to London’s health authorities but couldn’t convince anybody of its merit, and was largely ignored.

In August, 1854 there was another outbreak of cholera in the Broad Street neighborhood. Over a 10 day period cholera decimated the population; 10% of the neighborhood died.

Since Dr. Snow was convinced that Cholera was a waterborne virus, it didn’t take him very long to identify that it was the infected water pump at 40 Broad Street that caused the epidemic.  Instead of solely relying on numbers, he produced a visual of the Broad street neighborhood where he marked the water pump at 40 Broad Street and designated each cholera death with a bar. The concentration of deaths around the pump at 40 Broad Street, which trailed off the further out from the pump one went, was so convincing that the authorities finally accepted the theory of Cholera being water borne. They removed the handle of the pump and it ultimately stopped the spread of the disease in 1854. It also paved the way for a sewage and sanitation systems to be put in place; one of the greatest engineering feats to be undertaken in London’s history, changing the way that urban systems exist and continue to grow to this day.

As the story demonstrates, good data visualization leads to smarter decisions.  Far too often, analysts focus all their efforts on data collection and modeling and pay very little attention to presenting the results in a way that decision makers can relate to.  Every successful predictive analytics project leads to a change from the status quo way of doing things, and it is never easy to convince the decision makers that the new way is better.  Most decision makers need more than R square or a mean absolute percentage error metric to be convinced about the efficacy of a solution based on predictive analysis and feel comfortable approving the change. This is where data visualization skills become important.

James Taylor wrote that visualization is more relevant in context of strategic decisions and not so much for the operational decisions.

Decision making at the operational level is too high-speed, too automated for much in the way of visualization to be useful at the moment of decision.

While I do agree with him, I know of instances where visualization has been creatively used in the very operational environment of call centers. Speech analytics and emotion detection are growing areas in call center technology where depending upon the choice of words, the speech analytics system detects the emotion level of the caller and displays an appropriate emoticon on the agent’s desktop right when the call is transferred to them. Even without understanding the complexity of the caller’s issue, the agent immediately gets a guidance about the emotional state for the caller.

As consultants we are always trying to create that ‘money-visual’ in our presentations, a slide which brings all the analyses together and unambiguously calls out for a need to change or drive action. I feel every predictive analytics project needs one such ‘money-visual’.

Do you have any examples of visuals which you have used to convince people to drive change?

Here are the links to previous postings: part 1, part 2, part 3 and part 4

Cross-posted on TheInfromationAdvantage.com

Theme 4: Statistical techniques and tools are not likely to provide competitive advantage

I read this interesting post from Sijin describing his journey to master a video game (emphasis added by me)

All this kind of reminded me of my experiences with finding the perfect weapon while playing Call of Duty 4 over the past year. I spent 3 hours a day almost every day for the past one year playing this game, reaching the max prestige level (the “elite” club) in the multi-player version. I became really good at it… no matter what weapon I was using. But I remember when I started out and I really sucked, I became obsessed with finding the perfect weapon with the perfect set of perks and add-ons. I used to wander the forums asking people about which weapons and perks to use on which map and what the best tips were etc. Thinking that having the perfect weapon would make me a good player. In the end, the only thing that mattered was all the hours I put in to learn all the maps, routes, tricks and my ability (I like to think). The surprising thing was that once I mastered the game, it didn’t really matter what weapon I chose, I was able to adapt any weapon and do a decent job.

This story captures the essence of the theme of this post.

The popular statistical techniques frequently used in business analytics like linear regression and logistic regression are more than half-a-century old. System dynamics was developed in 1950s. Even neural networks have been around for more than 40 years. SAS was founded in 1976 and the open source statistical tool R was developed in 1993. The point is that popular analytical techniques and tools have been around for some time and their benefits and limitations are fairly well understood.

An unambiguous definition of the business problem that will impact a decision, a clear analysis path leading to output, thorough understanding of various internal and 3rd-party datasets are all more important aspects of a predictive analytics solution than the choice of the tool. Not to mention having a clear linkage between the problem, the resulting decision, and measurable business value.  The challenge is in finding an expert user who understands the pros and cons and adapts the tools and techniques to solve the problem at hand. Companies will be better served by investing in the right analytical expertise rather than worrying about the tools and technique as the right analytical team can certainly be a source of competitive advantage.

While this theme is fairly well understood within the analytics practitioner community, the same cannot be said about business users and executives. It is still easy to find senior executives who believe that ‘cutting edge’ techniques like neural networks should be used to solve their business problem or predictive analytics tools are a key differentiator while selecting analytics vendors.  The analytics community needs to do a better job in educating the business user and senior executives about this theme.

You can read the previous installments of the series here (part 1, part 2, and part 3).

Cross-posted on TheInformationAdvantage.com

Theme 3: Integrating third-party data into predictive analysis

This is the third installment of the eight part series on predictive analytics (see part 1, part 2).

Perhaps one of the most significant opportunities for organizations using predictive analytics is incorporating new relevant third-party data into their analysis and decision-making process.  Investment in a targeted and relevant dataset generates far greater returns than spending time in developing sophisticated models without the right dataset.

Suppose a wedding gown retailer wants to pursue a geographical expansion strategy. How would they determine where to open new stores? For that matter how should they evaluate the performance of existing stores?  Should a store in Chicago suburbs produce the volume of business as a store in Austin?

To answer the above questions, you need a lot of data that is not within the organization’s firewalls. One will need to know where people are getting married (demand in a market), how many competitor stores sell wedding gowns in the same area (competitive intensity in a market), how far potential brides are willing to travel to buy a wedding gown (real estate costs in city vs. suburbs will be vastly different), income and spend profile of people in the market (how much are customers willing to spend)

Marriage registration data from NCHS, socio-demographic data from a company like Claritas or US census, business data from Dun & Bradstreet or InfoUSA, cost data for real estate and maybe a custom survey data of potential brides should all be input variables into the store location analysis. Data about existing store sales and customer base are important, but they tell only part of the story and do not provide the entire context to make the right decisions.

Using the above data the retailer will be able to identify favorable markets with higher volumes and growth in marriages and appropriate competitive profiles. It can also use existing store performance data to rank the favorable markets using a regression or cluster analysis and then corroborate the insights using mystery shopping or a survey. Such a data driven methodology represents a quantum improvement over how new store locations are identified and evaluated.  While the datasets are unique to the problem, I find that such opportunities exist in every organization. A clear framing of the problem, thinking creatively about the various internal or external data, and targeted analysis leading to significantly better solutions is what information advantage is all about.

We are in the midst of an open data movement, with massive amounts of data being released by the government under the open government directive. Private data exchanges are being set up by Microsoft, InfoChimps among others. Not to mention all the new types of data now available (e.g., twitter stream data). Companies that build capabilities to identify, acquire, cleanse and incorporate various external datasets into their analysis will be well positioned to gain the information advantage

Earlier this week, I suggested a potential business application of IRS’s internal migration data for a moving and relocation company.

Folks at Neilsen Claritas just found a far more interesting correlation which should have driven a lot of business decisions.  They note:

Today’s presence of underwater mortgages, or homes with negative equity, seem to be correlated to two common regional U.S. population trends: 1) domestic immigration from the Northeastern region to the South and Southwestern regions of the U.S., and 2) migration from coastal California inland

While such retrospective analysis is interesting for reports and blogs, it is not particularly useful for businesses. Maybe as means to generate  interesting hypothesis for future. It would have been useful had the chart been available to the strategic planning or risk group of businesses signing up people for these housing loans in 2006 and 2007.

Data is valuable only when it is used to drive decisions. Most companies have a huge opportunity to do a better job in bringing together data, analytics and visualization and delivering them to the points of decision.

One subject that has not received a lot of coverage in the analytics blogging circle is the current administration’s data.gov project. While still in its infancy the data.gov is an outcome of the government’s transparency initiative called Open Government Directive. In December, all government agencies were asked to produce and publish three new ‘high value’ public data feeds on data.gov website.

The data.gov site still has to work through some of the kinks but eventually it will become a wonderful resource for the data analytics industry. Probably as critical as the US Census data and its American Factfinder tool, which has spawned multiple companies and supports all kinds of interesting analysis across a wide range of industries.

The Sunlight Foundation tracks the new datasets that are being released. For example one of the Labor departments datasets is the “weekly reports of fatalities, catastrophes and other events.” The data, compiled by the Occupational Safety and Health Administration, briefly describes workplace accidents, identifies the company at which and the date when the accident occurred. I think a lot of  insurance companies with worker compensation insurance products will be interested in analyzing the data to better price their products. Or take for instance the IRS internal migration data by state and county based on tax returns. Can it be used by moving companies to better understand the shift in demand for their services? There are thousands of such datasets available, and a lot of them will potentially be valuable to businesses. The value of a dataset to business like beauty, is in the eyes of the beholder. This makes the categorization challenging but at the same time makes it interesting for businesses as it can be a potential source of competitive advantage. If you can figure out to interpret the IRS migration data to better align your marketing campaigns for your moving and relocation assistance business, you can get better return on investment on your spend than your competition.

It is time for organizations to look outside their firewalls and build a strategy of collecting, incorporating and analyzing external data into their analytics and strategic planning efforts.  Companies like Infochimps, which is a private clearinghouse and market place for third-party data are betting on this trend. They already collect, cleanse and  format the data.gov datasets so that it is analysis ready.

Take out the time to check the datasets that are available. You never know what you may find.

Theme 2:  Modeling Strategic vs. Operational Decisions

In the first post of this eight part series, I wrote about the importance of understanding the cost of a wrong decision prior to making an investment in a predictive modeling project.

Once we determine the need for the investment, we need to focus on type of modeling approach.  The modeling approach depends on the type of decision that we want predictive model to drive. Decisions can be broadly categorized as either operational or strategic .

I define operational decisions as those that have a specific and unambiguous ‘correct answer’, whereas in strategic decisions an unambiguous ‘correct answer’ is not available. Moreover such decisions have a cascading effect on adjacent and related decisions of the system.

Think about a health plan predicting a fraudulent insurance claim versus predicting the impact of lowering reimbursement rates for patient re-admissions to hospitals.

An insurance claim is either fraudulent or not. The problem is specific and there is an unambiguous correct answer for each claim. Most transaction level decisions fall in this category.

Now consider the second problem. Lowering the reimbursement rate of patient readmission will certainly incent the physician and hospitals to focus on good patient education, follow-up outpatient care and to ensure complete episodes of care for the patient during their time in the hospital. This should result in lower cost for the health plan. However, it can also lead to hospitals delaying the discharge of patients during their first admission or physicians treating patients in an out-patient setting when they should be at the hospital and ending up in emergency room visits. This is the cascading effect of our first decision that will increase the cost of care. Strategic decisions have multiple causal and feedback loops which are not apparent and an unambiguous right answer is hard to figure.  Most policy decisions fall in this category.

The former is an operational decision and requires established statistical (regression, decision tree analysis etc.) and artificial intelligence techniques (e.g. neural networks, genetic algorithms). The key focus of the problem is to predict whether a claim is fraudulent or not based on historical data. Understanding the intricacies of causal linkages is desirable but not necessary (e.g.  neural networks).  The latter needs predictive modeling approaches that are more explanatory in nature. It is critical to understand causal relationships and feedback loops of the system as a whole.  The idea is to develop a model which accurately captures the nature and extent of relationships between various entities in the system based on historical data, to facilitate testing of multiple scenarios. In the re-admission policy example, such a model will help determine the cost impact based on the various scenarios of provider adoption and behavior change (percentage of providers and hospitals that will improve care vs. those that will not adapt to the new policy). Simulation techniques like systems dynamics, agent based models, monte carlo and scenario modeling approaches are more appropriate for such problems.

Bottom line, it is important to remember that strategic and operational decisions need different predictive modeling approaches and the two questions you have to ask yourself:

  1. Is the decision you want to drive operational or strategic in nature?
  2. Are you using the appropriate modeling approach and tools?

Cross-posted on TheInformationAdvantage blog

The theme of this blog is to understand how actionable information in form of decision support tools will lead to next wave of efficiencies and competitive advantage. However, the reverse is probably more stark. Not investing in the table stakes data aggregation and reporting process capabilities can also hurt, and it can hurt big time.

The financial crisis in Greece is a case study on how easy money and uncontrolled government spending during boom time can come back to hurt in a weak economy. However, one of the confounding factors has been the Greek government’s repeated revisions of its budget deficit data. In 2008, it reported the deficit to be 5.0% of their GDP in April. Later that year they revised it up to 7.7%. Similarly, in 2009 April, the official forecast figure for the deficit was 3.7% of the GDP which was later revised to 12.5% of GDP. It is the last revision that started the full blown crisis.

Digging a little bit deeper, it is easy to discover that one of the key reasons for revisions. It is the lack of a modern budgetary process and financial reporting system.

Past budgets have rested on some 14,000 separate expenditure lines. This year’s has brought the figure down to about 1,000. In this system, the evaluation of public spending in any particular area is almost impossible. The amount spent on education, for example, is defined as the total sum of money allocated to the Ministry of Education and it is very difficult to monitor where it goes. Currently, most of Greece’s 15 ministries and dozens of other government bodies handle their own payroll accounts, making it difficult to gain a complete overview of government spending.

No wonder, they could not trace reliably how much money was being spent!

Last year, the Greek government had also approached the OECD to conduct a study and recommend improvements in its budgetary processes, and one of the recommendations was around managing the deployment of the new accounting and financial information system.

Ill-defined processes and weak information management systems tend to exist in certain quarters of most organizations. The key question to ask yourself is whether this under-investment in information systems:
1) exposes you to a big risk
2) makes you inefficient or
3) prevents you from gaining some potential competitive advantage?

I, along with two of my colleagues (Anand Rao & Dick Findlay), recently conducted a workshop at the World Research Group’s Predictive Modeling conference at Orlando. As part of the workshop, I spoke about a list of 8 things that organizations should keep in mind as they consider investing in predictive analytics.

In this post, I will list the 8 points and discuss the first one. Subsequent posts will explore the rest of the themes.

  1. Understand the cost of a wrong decision
  2. Strategic and operational decisions need different predictive modeling tools and analysis approaches
  3. Integration of multiple data sources, especially third-party data, provides better predictions
  4. Statistical techniques and tools are mature and by itself not likely to provide significant competitive advantage
  5. Good data visualization leads to smarter decisions
  6. Delivering the prediction at the point of decision is critical
  7. Prototype, Pilot, Scale
  8. Create a predictive modeling process & architecture

Theme 1: Understand the Cost of a Wrong Decision

Is it even worth investing the resources on developing a predictive analytics solution for a problem? That is the first question which should be answered. The best way to answer it is to understand the cost of the wrong decision. I define a decision as ‘wrong’ if the outcome is not a desired event. For example, if the direct mail sent to a customer does not lead the desired call to the 800 number listed, then it was a ‘wrong’ decision to send the mail to that customer.

A few months ago my colleague Bill told a story which illustrates the point.

Each year Bill takes his family to Cleveland to visit his mom. They stay in an old Cleveland hotel downtown. The hotel is pretty nice with all the trappings  that you would expect of an old and reputable establishment. Last time they decided to have breakfast at the hotel across the street at the Ritz. After the breakfast when Bill and his family were in the lobby, the property manager spotted him and the kids and walked over to talk. He chatted for a few minutes and probably surmised that Bill was a reasonably seasoned traveler and told the kids to wait for him.  He walked away and came back with a wagon full of toys.  He let each kid pick a toy out of the wagon.  Think about it. They were not even guests at the Ritz, all they did was have breakfast at the Ritz! The kids loved the manager and Bill remembered the gesture. Fast forward to this holiday season, and sure enough Bill and his family booked a suite at the Ritz for six days. For the price of a few nice toys, the manager converted a stay that generated a few thousand dollars in room charges, meals, and parking.

Now suppose Bill did not go back to the hotel, which was the desired outcome by the hotel manager. What would have been the cost of manager’s ‘wrong’ decision?  The cost of a few toys. The cost compared to the potential upside is negligible. Does it make sense for the hotel to build a predictive model to decide which restaurant diners to offer toys so that they come back and stay? I don’t think so.

Understanding the cost of wrong decision upfront saves one from making low value investments in predictive analytics.

PS: My colleague Paul D’Alessandro has also used this story to illustrate experience design(XD) principles.

Photo credit: GJones

Older Posts »