Skip to main content

Early Warning Systems: The Unbeatable Team of Human and AI



Early Warning Systems (EWS) have proven useful in navigating economic uncertainty in the financial sector through the assimilation of increased volumes of data and by enabling faster, and better-informed decision making. 

In this blog, we consider how the application of Natural Language Processing (NLP) to assimilate broader and larger datasets can lead to further enhancements in credit risk identification.The latest generation of EWS have the following characteristics:

  1. Higher precision and greater consistency than humans. For instance, our machine learning model (Risk Alert) achieved an accuracy score of over 80% in the most recent study compared to a human annotator with an accuracy rate of just 57%. 
  2. Humans are still integral to the model, as ultimately humans are still needed to determine the most appropriate credit action in response to an EWS trigger, and are vital to the continuing training of an automated EWS system, leading to more accurate results in the future. 

Background: Identifying Risk through Text Annotation of Global News

Globally, the use of text scraping tools has become increasingly common, especially for risk identification purposes. Generative AI technology has further enabled collected information to be made more easily available to end users. 

Risk Alert (our EWS with NLP processing) undertakes text annotation from global news sources to predict emerging risks.  Its NLP algorithm scans, analyses, and annotates news sources, capturing large volumes of high-frequency, unstructured data, and transforms this into structured insights that are then used by the risk models within Risk Alert to predict credit deterioration. 

Text annotation is the categorisation of text data to create labelled datasets, which serves as a foundation for supervised learning, and in turn enables the development of intelligent systems in NLP algorithms. For example, if an article stated, ‘the company faces heightened risk of increased production and distribution expenses due to inflation,’ an NLP would search for the risk type in the taxonomy with the closest definition and will categorise the phrase as a ‘financial risk.’ The diagram below sets out an outline of the process:

Figure 1: Application of NLP to News Data

As with any NLP based sentiment analysis, some false positives will occur, in the case of an EWL this means the identification of increased risk when there isn’t any. We employ human-assisted learning to increase model precision and reduce the number of false positives. Human assisted learning drastically improves the model’s ability to annotate articles correctly. 

Testing Methodology: Benchmarking Accuracy of an NLP Model Against Manual Annotation

To assess the performance of our NLP model, an annotation benchmarking exercise comparing human annotators to the NLP model’s performance was conducted. Three metrics were used to assess the results: 

  • Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. High precision indicates a low false positive rate. 
  • Recall is the ratio of correctly predicted positive observations to all observations in the actual class, measuring the ability of a model to correctly identify all relevant instances.
  • F1-score is the harmonic mean of precision and recall. It takes both false positives and false negatives into account and represents overall accuracy.

In a benchmark study, two human annotators independently labeled 2,136 news articles. The information from the annotation was used to provide feedback to the model through four iterations. With each iteration the model’s performance increases across each of the metrics. The results for each metric are presented in the chart below in comparison to the agreement rate between the human annotators.

Analysis: How NLP Models Outperform Human Annotation

 Over four years we have performed four model iterations (rounds). As shown in Figure 1, the f-1 score was 88% (end of round 4), indicating a very high overall accuracy of the model relative to human annotation (57%).

Figure 2: Model Performance Growth Against Human Agreement Baseline

Annotation is a powerful tool that can improve the accuracy of NLP models beyond risk classification and has already been used successfully in Sentiment Analysis and Named Entity Recognition (NER). 

Risk Alert: Implementing Text Annotation for Early Warning Systems Risk Alert, our proprietary solution, leverages the high accuracy of the NLP model. Using millions of news data sources spanning all global jurisdictions and multiple languages, it automatically analyses and identifies risks. This forward-looking monitoring approach enables portfolio managers to identify borrower-related threats sufficiently in advance to provide them with time for proactive risk mitigation and management.Risk Alert webpage: Risk Alert: The Early Warning System of the Future | Deloitte UK