Curious Productivity

Data Science & Unintended Consequences: The Cobra Effect

It’s the 19th century, and the British have a cobra problem. 

India is under British rule, and the city of Delhi is filled with venomous snakes. The British decide to address the problem by introducing a bounty for each dead cobra. At first, this seems like a brilliant solution. People begin hunting cobras and presenting them to authorities in exchange for a bounty. The cobra population decreases as intended, and the clever British pat themselves on the back. 

However, the people of Delhi now see an opportunity: they begin to breed cobras and kill them for bounties. Eventually, the British catch on, and they cancel the bounty program. The breeders, left with a surplus of suddenly worthless cobras, release them. As a result, Delhi’s cobra population increases to pre-bounty program levels. 

This is known as the Cobra Effect—a situation where a solution worsens the problem it aims to solve. It’s a cautionary tale that’s especially relevant in data-driven fields like data science. As we increasingly rely on data to make decisions, it’s crucial to be aware that this effect can happen to anyone.

Here are three common situations in which the Cobra Effect appears in data-driven environments.

#1: Siloed Data Models

A large company has different products, each managed by a different department. Each department wants to make the best decisions possible, so they create data models to understand key metrics, such as customer lifetime value and price sensitivity trends. 

Sales improve for each product, and things seem to be running smoothly. But these data models don’t talk to each other, and gradually inefficiency increases and growth stalls. Each department makes effective decisions only for itself, but not for the whole company, leading to sub-optimal results.

This is a common problem for organizations experiencing rapid growth. The intention of each department is to make data-backed decisions, but with siloed data models, the company is unlikely to be successful. 

Solution

The best solution is to develop a unified system for collecting, storing, and managing data across the entire organization, ensuring that models are coordinating and working together seamlessly. This means that everyone in the organization will have access to the same data, which supports informed decision-making with a holistic view of the business. 

#2: No Single Source of Truth

A company wants to make better decisions using data, so leaders have dashboards built to show data across a range of KPIs. However, the data sources and assumptions used to create these reports and dashboards are different, which leads to inconsistencies and conflicts. 

As a result, leaders ask for even more data, but this only worsens the problem because the underlying issue is not resolved. Conflict spikes and decision-making stalls, leaving the organization worse off than it was before.

Solution

Create a coordinated approach to data reporting across the entire organization is often the best way to solve this issue. This involves creating a centralized data source with standardized assumptions and definitions. 

By establishing a single source of truth, all dashboards and reports are based on the same data, enabling leaders to resolve conflicts with data and make more informed and accurate decisions. 

#3: Faulty Predictive Models 

A streaming service develops a predictive model to identify users likely to cancel their subscriptions so it can proactively offer them deals and entice them to stay. 

The data science team builds a sophisticated model, spending weeks collecting, cleaning, processing, and modeling a large volume of training data. However, it fails badly when deployed. It mislabels customers and offers deals to people who have no intention of unsubscribing. Meanwhile, the segment of the customer base that is likely to leave is ignored, and unsubscribes as a result. Asked for an explanation, the data science team is unable to pinpoint the problem. The model is so complex that it is a veritable black box.

In this situation, ask these three questions:

  1. Were domain experts consulted when the model was built? In the above scenario, it’s likely that the data science team did not align with the business. As a result, the models may not take into account important factors that were excluded from the training data. Without proper input from domain experts, models can overlook critical factors or make incorrect assumptions that lead to poor performance in real-world scenarios. 
  2. Did the data science team incorporate feedback from its customers into the model? By understanding why customers are canceling their subscriptions, the model can be significantly improved to better predict which customers are at risk of being lost.
  3. Was external validation performed? It’s essential to validate the model’s performance on a holdout dataset or with real-world data. This is important to prevent overfitting, where a model performs well on the training data but poorly on new, unseen data.

Conclusion

The Cobra Effect demonstrates that well-intentioned, data-driven solutions can sometimes worsen the problem they aim to solve. By recognizing specific scenarios where the Cobra Effect might occur, organizations can avoid falling into this counterproductive trap and make better decisions.

Want the latest from Case for Curiosity delivered directly to your inbox?
Subscribe for regular updates.