Machine learning algorithms trained on large data sets have proven useful for spotting past patterns. Examples include stable environments like image databases or board games.
When it comes to messy, real-world data, however, those same ML algorithms often fall short, critics say, rigid and unable to adapt. “Machine learning algorithms perform remarkably poorly on time-series predictions,” assert researchers at causalLens, a platform developer that offers real-time economic predictions.
Current machine learning platforms largely fail to provide time-series predictions because “correlations that have held in the past may simply not continue to hold in the future,” the London-based company notes. That’s a particular problem in areas like finance and business where time-series data types are ubiquitous.
Those correlations tend to be single data points, unsuited to capturing context or complex relationships. In one example, an algorithm can be given access to a data set about dairy commodity prices to predict the price of cheese. The algorithm may conclude that butter prices as a guide to predicting the cost of limburger.
Eluding the algorithm is a fundamental assumption about the cost of dairy products: the hidden common cause of price spikes for cheese and butter is the cost of milk. Therefore, a sudden change in the price of butter—consumers’ preference for olive oil, for instance—is unrelated to milk prices. Hence, the faulty correlation between butter and cheese can’t be used to predict the latter’s price.
The company touts its “causal AI” framework as looking beyond correlations to learn obvious relationships and then “propose plausible hypotheses about more obscure chains of causality,” it noted in a recent research bulletin. The approach allows data scientists to add domain knowledge and real-world context to improve predictive analytics.
Indeed, new open source libraries have emerged that seek to help data scientists and domain experts develop adaptable models based on causal relationships rather than data correlations alone. For example, the CausalNex library released earlier this year allows data dependencies to be expressed in network graphs that can be scanned by domain experts to eliminate spurious correlations in machine learning models.
CausalNex is the second open source release of a causal AI data set after Kedro, a library aimed at production ML code. The new library applies “what-if” analysis to Bayesian networks on the assumption that a probabilistic model is more intuitive in describing causality than traditional ML frameworks based on correlation analysis and pattern recognition.
Causal AI proponents also argue their approach makes better use of data to come up with more accurate predictions through the framework’s ability to simulate different scenarios.
“Conventional machine learning approaches are, quite literally, stuck in the past,” the company concludes. “They are fooled by illusory patterns and are unable to quickly adapt to new conditions.”