Morning star has a story about rise of passive investing which is an interesting use case of human as a hero vs. human as a hazard discussion. They conclude:

.. the growing use of passive investment vehicles reflects the times in which we live. With algorithms helping determine which online ads we’re exposed to each day and new metrics being invented all the time to aid in arenas as diverse as business, politics, and sports, perhaps it should come as no surprise that more people are willing to rely on an inexpensive, systematic, formula-based approach to investing rather than on the judgment and decision-making ability of a living, breathing fund manager.

The article also talks about the obvious next step – that the simple index tracking form of passive investing will be supplemented with smarter algorithms for those investment decisions.

More recently, increasing attention has been paid to alternative indexing approaches–so-called smart beta–that are built around specific factors (stock price/earnings ratios, company performance, share-price volatility, to name a few). Some consider this a hybrid of indexing and active management styles.  

This seems to be a case where the human as a hazard point of view is winning and the design choice has been to upfront capture human insights in algorithms (as in smart beta approaches). Going back to the factors that influence the design choice, the factor that matters here is the most is the type of decision that is being influenced. Investment returns is not about managing extreme decisions but improving the average across several small decisions with well-established rules of good vs. bad decisions.


Humans and/or Algorithms

Walter Frick has written a great blog post about the issue of human and algorithm collaboration.  While it is a great philosophical question, I am interested in it from a systems thinking perspective.

The central question seems to be: Human as a hero (humans augment algorithms based judgment) or human as a hazard (humans provide inputs that will make better algorithms for automated decisions) in the decision-making context

With all the promise of Big data in business, I believe it is a key design choice for ‘smart’ systems that are being deployed to improve business decisions – what the next best product that the customer should be offered, which patients should be targeted for medication adherence intervention, or where to locate the manufacturing capacity for an efficient supply chain etc. The human as a hero vs. human as a hazard choice will define whether the smart systems are accepted, adopted and ultimately drive value.

Based on my experience, I do not believe it is one or the other that is superior.

I have seen consumer goods companies who have deployed suggested ordering systems where the algorithms were accurate 90% of the times about the expected order and the sales person was able to override the systems when things did not match his/her experience or they had unique insight (like they know about a church event over the coming weekend that will increase consumption and the suggested order needs to be bumped up). And I have seen examples of retailers that do not want any store replenishment decision to be made by humans once the input parameters are set. They want the algorithms to take over and it has worked for them.

There are three factors that influence the design choice.

  1. Talent in the organization
    The quality of the decision maker defines how much leeway you provide.  I have seen clients in emerging markets with double-digit growth rates and facing a perennial talent shortage of good sales people, trying to use these automated ‘smart’ systems to standardize decision-making. At the other end of the spectrum are the startups in silicon valley who are designing systems to aid the physicians in making evidence based decisions.
  2. Organization’s decision making culture
    Organizations have an inherent decision-making culture which will influence whether human as a hero model will work or not. At a high level there are organizations who have a command and control structure of decision-making (e.g., a technology company that has a central team of pricing experts who determine the pricing for all their business customers across all lines of business) and there are organizations which have a very decentralized decision-making culture (another technology company where each division or business unit can set its own pricing and discounts)
  3. Type of decisions to be improved
    Automated decision-making systems are efficient when the goal is to improve the average across all decisions and the range of decisions follows a normal distribution. Basically there are less extreme or one of a kind decisions. A lot of high frequency operational decisions (think call centers/factory floors) in organizations will follow this behavior. However, when most of the decisions are one of a kind and ‘extreme’ events then human as a hero model becomes more appropriate. A lot of infrequent strategic decisions will fall into this category.

Human as a hero vs. Human as a hazard is an explicit design choice to be made and organizations that are able to make the right choice will have less false starts and drive more value for themselves and their customers.

[tweetmeme source=”atripathy” only_single=false] Lately, I have been thinking about the entire big data trend. Fundamentally, it makes sense to me and I believe it is useful for some enterprise class problems,  but something about it had been troubling me and I decided to take some time and jot down my thoughts.  As I thought more about it, I realized my core issue is associated with some of the over simplified rhetoric that I hear about what big data can do for businesses. A lot of it is propagated by speakers/companies at big name conferences and subsequently echoed by many blogs and articles. Here are the 3 main myths that I regularly hear:
1. More data = More insights
An argument which I have heard a lot is that with enough data, you are more likely to discover patterns and facts and insights. Moreover, with enough data, you can discover patterns and facts using simple counting that you can’t discover in small data using sophisticated statistical methods.

My take:
It is true but as a research concept For businesses the key barrier is not the ability to draw insights from large volumes of data, it is asking the right questions for which they need an insight. It is not never wise to generalize the usefulness of large datasets since the ability to provide answers will depend on the question being asked and the relevance of the data to the question.

2. Insights = Actionability = Decisions
It is almost an implicit assumption that insights will be actionable and since they are actionable business decisions will be made based on them.

My take:
There is a huge gap between insights and actionability.  Analysts always find very interesting insights but a tiny fraction of it will be actionable, especially if one has not started with a very strong business hypothesis to test.

Even more dangerous is the assumption, that because an insight is actionable, an executive will make the decision to implement it. Ask any analyst who has worked in a large company and he /she will tell you that realities of business context and failure of rational choice theory stand in the way of a lot of good actionable insights turning into decisions.

3. Storing all data forever is a good thing
This is the Gmail pitch. Enterprises do not have to decide which data they need to store and what to purge. They can and should store everything because of Myth 1. More data means more insights and competitive advantage. Moreover, storage is cheap so why would you not store all data forever.

My take:
Remember the backlash against Gmail which did not have a delete button when it started. The fact is there is a lot of useless data which increases noise to signal ratio. Enterprises struggle with data quality issues and storing everything without any thought to what data is more useful for which kind of questions does more harm than good. Business centric approaches to data quality and data architecture have a significant payoff for downstream analytics and we should give them their due credit when we talk about big data.

In summary,

1. There is a lot of headroom left for small data insights that enterprises fail to profit from.
2. There are indeed some very interesting use cases for big data which are useful for enterprises (even the non-web related ones)
3. But the hype and the oversimplification of the benefits without thoughtful consideration of issues and barriers will eventually lead to disappointment and disillusion in the short run.

Some interesting perspectives on the topic: James Kobielus , Rama Ramkrishnan

[tweetmeme source=”atripathy” only_single=false] Ken Rona tweeted earlier today about a subject which stuck a chord. He was writing about the difference between a business analyst and data analyst (or data scientist as they are increasingly called). I wanted to expand on the idea as It is important to distinguish between the two roles. I have seen a lot of confusion around the definitions and some executives thinking they are one and the same and others who believe they are totally different. The truth is probably somewhere in between. Here is my attempt at comparing along the key skill set dimensions:

Business Analyst Data Scientist
Business domain knowledge Expertise in industry domain Very good working knowledge of industry domain
Data handling
Ability to handle multiple CSV files and import them into Access or Excel for analysis Ability to write SQL queries to extract data from databases and join multiple datasets together
Analytics skills Knowledge of simple business statistics(statistical significance, sampling), Able to use statistics functions in Excel Proficiency in advanced mathematics/statistics (regressions, optimization, clustering analysis etc.)
Insight presentation skills Storytelling skills using PowerPoint Storytelling skills using information visualization and PowerPoint
Problem solving
Proficiency in hypotheses driven approach is good to have Proficiency in hypotheses driven approach is must have
Tools Access, Excel, PowerPoint etc MS SQL, Oracle, Hadoop, SQL SAS, SPSS, Excel, R, Tableau etc

I think this is a good starting point but can be refined further. Feedback/comments are welcome.

[tweetmeme source=”atripathy” only_single=false]I have earlier written about that Insight at the point of decision making/action is critical. I came across a great example of it from the good folks at Sunlight Foundation, who are trying to bring transparency to political influence.

Inbox Influence is a browser extension that adds political influence data to your Gmail messages. With Inbox Influence installed, you’ll see information on the sender of each email, the company from which it’s sent, and any politician, company, union or political action committee mentioned in the body of the email. The information is added unobtrusively and nearly instantaneously, and includes campaign contributions, fundraisers and lobbying activity. You can use it to add context to news alerts, political mailers and corporate emails, or just to see who your friends donated to in the last election.

By focusing on email they have provided a tool which provides insights where the action (solicitation, support, contribution commitment) is most likely to happen and makes it part of the normal workflow.

I played around with the tool a bit and it was interesting to see the campaign contribution and lobbying activity of financial institutions, cable and cell phone companies from their statement notifications that they sent to my Gmail account.

This blog entry explains the technical challenges that the developer had to overcome to build this nifty tool and description of the back-end databases it searches. The key take away is not to underestimate the effort it takes to overcome the last mile infrastructure issues as they are thinking about their BI architecture. It is normally the difference between a success and failure of the project from a business perspective.

[tweetmeme source=”atripathy” only_single=false]Think about the large successful organizations which are known for harnessing information for competitive advantage; P&G, Goldman Sachs, Capital One, Harrah’s, Progressive Insurance and you will find one thing in common. Their C level executives drive data driven decision making top down. And the more organizations I see, the more I get convinced that it is one of the most important factors for a company which wants to ‘compete on analytics’.

Here is my hypothesis of why it is so:

There is a fundamental Catch 22 situation in most large companies. Organizations do not have consistently good quality data (mainly due to process issues during intake) and unless the data is used to making real business decisions, it is hard to improve its quality.

This Catch 22 can only be resolved by very senior executive (read C level)  who commits himself to making decisions and measuring performance based on analysis done with imperfect data (but good enough for many types of decisions/relative measurements). Once middle management understands how the data is being used, it spurs process changes to fix the quality issue which in turns increases the accuracy and reliability of analysis. The virtuous cycle is key for large companies which ‘compete on analytics’

In contrast, the middle management never wants to be in a situation to justify their decisions knowingly made using imperfect data. It is easier to justify subjective gut feel than objective decisions made with data with known quality issues.

In summary – the culture of analytics is a top down phenomenon

What do you think? Do you agree with this observation?

Photo credit

[tweetmeme source=”atripathy” only_single=false]The central thesis of my work for last few years has been focusing on the rise of integrated decision support systems which combine data, analytics and visualization to solve a very specific problem extremely well. These systems are usually prescriptive (instead of predictive) in nature and prove to be game changers in their space.

I recently came across one company that neatly falls into this category. They are called ValueAppeal and they answer a very specific question:

Are you paying too much in property taxes?

ValueAppeal saves property owners thousands by evaluating their property taxes and then guiding them through a simple 3-step process to create a custom appeal. The key to our process is our proprietary Assessment Analyzer.

Enter your address in our free Assessment Analyzer. Using the same official data the assessor uses, ValueAppeal’s proprietary algorithms dig deep to determine if your home is over-assessed and how much you can save on property taxes!

For a one-time fee of $99 (with money back guarantee) the company has seen average savings of close to $900

There are three things which are very interesting  and clever about the solution:
1) it uses the available public data and not any proprietary dataset
2) the analytical algorithm focuses on the datapoints that helps to make a favorable case
3) it generates a report that can be simply printed out a dropped into the mail by the user

Please drop a note if you have seen other interesting companies which develop such niche analytics based decision support solutions.

Interesting stories on information & decisions that influenced my thinking and were tweeted by me:

  1. Extending the value of operational #data to serve your customers. Netflix ISP comparison http://tinyurl.com/4whjd3o
  2. Coal economics and computer chips. Demand driver for need for more #analytics http://bit.ly/f6Hme2
  3. Excellent paper for #analytics practitioners on customer lifetime value and RFM models http://bit.ly/dkdkaa
  4. #visualization of loobying efforts http://reporting.sunlightfoundation.com/lobbying/
  5. #Analytics of dating http://blog.okcupid.com/
  6. LinkedIn’s @PeteSkomoroch on the key skills that data scientists need. http://oreil.ly/hXZTVJ

Theme 7: Prototype, Pilot, Scale

Edison did not invent the light bulb. He took a working concept and developed hundreds of prototypes rapidly, tested them and along the way figured out improvements that were required to scale his invention for commercial use. Julian Trubin writes about the prototyping process:

In 1879 Edison obtained an improved Sprengel vacuum pump, and it proved to be the catalyst for a breakthrough. Edison discovered that a carbon filament in an oxygen-free bulb glowed for 40 hours. Soon, by changing the shape of the filament to a horseshoe it burned for over 100 hours and later, by additional improvements, it lasted for 1500 hours.

Edison’s primary contribution to the development of light bulb was that he carried the idea from laboratory to commercialization, taking into consideration not only technical problems, but also issues like economics and the manufacturing of bulbs.

We took a leaf from Edison’s book when we developed, our prototype, pilot and scale approach to deploy analytics solutions for clients.

In our experience rapid prototyping is essential to show the value of the initiative to senior executives.  One of our health care clients wanted help in institutionalizing data driven culture within its sales organization especially in identifying and focusing sales effort on high potential customers. At first, we developed a prototype predictive scoring model to identify the high potential customers. Mapping the results of the model to existing effort demonstrated that greater than 50% sales force time was used ineffectively and the client was leaving a lot of dollars on the table.

However, for organizations to see bottom line benefit, adoption of predictive analytics based solution is key. Piloting helps refine the prototype and plan for potential adoption pitfalls amongst the end users.  At our healthcare client, we knew that there were skeptics amongst the sales people who do not trust the model and there were change management blind spots which we wanted to discover prior to the national roll out. We designed a pilot with the following objectives:

  1. Prove the validity of the predictive model
  2. Create evangelists from the sales team of the pilot regions
  3. Identify the big data gaps and establish a process of continually refining CRM data
  4. Establish and refine the key performance metrics to report to senior management
  5. Understand the key questions and concerns of the sales team in adopting the system

We collected a lot of rich quantitative and qualitative data during the pilot phase, which conclusively proved the value of the predictive model but also provided us with insights to incorporate into the roll out process.  For instance we learned that in a few instances customer address data was not getting updated  in the data warehouse and that sales managers wanted to understand the factors that went into calculating the predictive score of customer before they felt comfortable using it.

Scaling the pilot requires cross organization coordination and strong program management to ensure that the pilot learnings are incorporated in the roll out, there is a positive word of mouth buzz for the solution and there is minimal impact to day-to-day business. The inputs from pilot helped us better design the compensation rules and reporting metrics, which helped us roll out the system which head the trust of the sales force.

Our client saw significant uplift in revenue in the first 3 months of rollout. The sales organization started realizing the value of data driven approach and hired a team to support other sales analytics initiatives.

Are there other tips and tricks which you have successfully used to deploy predictive analytics solutions?

Theme 6: Delivering the prediction at the point of decision is critical

[tweetmeme source=”atripathy” only_single=false]In 2007, my wife worked as a hospitalist in Charlotte.  Around that time, I noticed a strange pre-work ritual she followed. She took printouts of a few pages from the Wal-Mart website before going to work.  The behavior was surprising and one day I asked her about the mysterious printouts. It turns out that she was printing out the list of generic drugs that were covered in Wal-Mart’s $4 prescription plan.

Like most physicians, she used to struggle with the fact that some of her patients were not complying with their medications as they were not able to afford the medicines she was prescribing. Wal-Mart had introduced a plan, where they sold some common generic drugs at $4 per prescription.  A lot of patients were able to afford the Wal-Mart medicines and it was popular amongst the doctors to prescribe from the Wal-Mart covered drugs. However there was one issue. The Wal-Mart covered drug list was not integrated with the Epocrates system, the mobile clinical decision support software that most doctors use to verify drug dosage and interactions prior to writing the prescriptions.  At the time of writing the prescription, the doctor did not know whether the specific drug was covered within the Wal-Mart plan, unless she chose to make the extra effort and carry a printout of covered drugs and refer to it prior to writing every prescription.  A great idea suddenly became less attractive to act upon, as the right information was not made available at the point of decision.  I refer to this as the last mile decision delivery problem of predictive analytics projects.

Most of the effort in analytics projects is spent on defining the problem, aggregating data, building and testing models. Getting the information of the model to the decision maker at the point of decision is at many times an afterthought.  However, the benefits of the project are dependent on solving this critical last mile problem.

In my experience decision delivery is challenging as it requires cross-organizational coordination. Successful analytics projects are a partnership between the analytics, business and IT groups.  The analytics group needs to work very closely with decision makers or the end users to put the analysis results in context of the decision maker’s workflow.  The actual delivery of the information is done through a mobile handheld device to a distributed sales force, CRM system integration for call centers or executive dashboard delivery through reporting system integration. All of them require close collaboration with the IT group which has to take the results of a predictive model and integrate it with the relevant front end or reporting infrastructure.  Then there is end-user training to ensure the end-users know what to do with the new information. The program management effort required to execute such a cross-organization initiative is significant and very often not anticipated or planned by the project sponsors.

A good program manager is critical to most complex predictive analytics projects. He/she is able to coordinate the various stakeholders to align on problem definition, outcome format, technology integration and training to drive user adoption of predictive analytics solution. Something to keep in mind as you plan your predictive analytics initiatives.

Have you seen predictive analytics projects getting derailed due to lack of coordination  between various groups within the organization or under investment in program management resources?

PS: Last year Wal-Mart solved its last mile problem and integrated the covered $4 prescription drug list into the Epocrates application.

Previous parts of the series are available here: part 1, part 2, part 3, part 4, and part 5

Cross posted on TheInformationAdvantage.com