The Trust Dilemma: Overcoming LLM Hallucinations in Financial Services

This is a guest post from Laurence Moroney, Chainlink Advisor and former AI Lead at Google.

In recent years, large language models (LLMs) have become synonymous with artificial intelligence (AI), spurring massive investment and interest. However, the impressive capabilities afforded by LLMs are offset by a severe caveat — the tendency to ‘hallucinate’ or generate false and misleading information. This phenomenon can pose significant trust challenges and techniques to overcome them in high-stakes domains like financial services, where accuracy and reliability are paramount, presenting a massive opportunity. The intersection of AI and technologies like blockchain — where trust and integrity are baked into the platform — could be the solution.

First, let’s examine the problem of hallucinations and then explore why this new industry initiative with Chainlink, Euroclear, Swift, and six financial Institutions is so transformative.

Understanding LLM Hallucinations

A large language model is a predictor of tokens—fundamental units of text or data. Trained on massive amounts of text, using a transformer architecture that learns sequence-to-sequence patterns, LLMs like OpenAI’s GPT, Google’s Gemini, and Anthropic’s Claude have proven to be excellent models for artificially understanding and generating text. But, given their artificial nature, they don’t truly understand their outputs and instead predict the next statistically relevant token for an output.

Consider the phrase from a popular children’s song: If you are happy and you know it.

In your brain, you have learned what comes next. It’s likely the words “clap,” “your,” and “hands.” The transformer architecture mimics this. 

In some cultures, however, the next word is not “clap,” but in fact, “you,” and they sing, “If you are happy and you know it, you clap your hands”. So, if one is predicting the next token based on a training corpus of text, where most instances don’t use “you,” but some do, then the predictive subsequent token modeling would indicate a high likelihood that the next word is “clap,” a lower likelihood that it is “you,” and then very low likelihoods for all other words.

And this is for a well-known phrase. Now, consider what happens if a model, trained on text like this, is asked to predict the next token for something that it has never before seen, like a news story or a corporate action that has only just been written like “Company X today announced a stock split of…” — how would the LLM predict the next token? From its corpus, it has likely seen very similar phrases many times, but they would have many different subsequent tokens like “twenty to one,” “ten to one,” or “one to ten,” etc. 

The LLM would calculate the next likely token from the most common one in its training set and output that. (Just like “clap” instead of “you” for the children’s song). For example, it might output a phrase like “Company X today announced a stock split of ten to one.” 

If the reality is that Company X is factually doing a six-to-one split, we now have a hallucination!

Given that, for our scenario, it’s not the core usage of an LLM to generate content like this, but instead to parse existing content — such as reading a PDF of the corporate actions where the stock split is mentioned. We can have it artificially understand the contents on our behalf so we can question it. It is important to note that the underlying hallucination issue *still* applies. The text of the PDF might say that the split is six-to-one, but the LLM could hallucinate ten-to-one based on its statistical next-token analysis. The output it gives you when you ask about the PDF is still generating the subsequent tokens based on the LLM’s best guesses.

The Peril of Hallucinations in Financial Services

Trusting an LLM blindly is a big mistake for the reasons demonstrated above. For financial services, the consequences of this could be:

Misinformed Decision Making

Inaccurate data could lead to flawed risk assessments, suboptimal investment strategies, and inefficient capital allocation.

  • Regulatory and Reporting Issues: False information could lead to unintentional violations of regulatory and reporting requirements
  • Erosion of Trust: Clients or stakeholders discovering that any institution relies on unreliable, AI-generated information could severely damage trust and reputation
  • Financial Losses: Hallucinated data leading to bad advice or forecasting could lead to significant monetary losses

Thus, fully embracing LLMs for financial operations is fraught with risk. The need for accuracy and reliability in financial data and advice makes the current state of LLM technology challenging to integrate safely into many core processes.

Blockchain: A Path to Trust and Verifiability

While LLMs present challenges in accuracy and reliability, blockchain technology, with its core attributes of trust and verifiability, may be the key to a solution. Blockchain’s decentralized and immutable ledger system provides a framework for recording and verifying information that could be leveraged to help mitigate the risks associated with LLM hallucinations. Let’s explore how that might work, beginning with the idea of consensus.

Consensus: A Method for Trust

The scientific process begins with a theory. This theory is then supported with experimental evidence. This is then reviewed by trusted peers who come up with a consensus—opinions may vary. Still, when most peers support that the experimental evidence underpinning the theory is valid, the scientific discovery is validated and becomes the current ground truth.

Inspired by this process, Chainlink implemented a novel technique to overcome the risks of hallucination. 

They used several LLMs to have them artificially understand the contents of a corporate action and output it in machine-readable JSON format. Instead of trusting a single prompt to a single LLM, the idea was to have a swarm of LLM-prompt combinations to produce various results. 

The consensus could then be measured. If they all produced the same result, we could begin to trust it, and it could be placed on the blockchain as a unified golden record. This is a verifiable, persistent, updateable, and interoperable data container that is synchronized across blockchains.

Of course, if consensus is not attained, a manual process could be used to establish the ground truth and then publish it as a unified golden record.

This process greatly lowered the risk of hallucination, increasing trust in the automation of the process to reduce costs. The publication of the findings on-chain means that all parties can trust the data going forward. 

Thus, an end-to-end system for converting unstructured data to highly trusted unified golden records is attainable. Much of this system could be automated, increasing trust and reducing the costs and risks associated with using LLMs in financial services.

Chainlink used this process in an industry initiative conducted alongside Euroclear, Swift, and six major financial institutions. This project demonstrated the automation of taking unstructured financial data, artificially understanding it with LLMs to produce on-chain golden records, and avoiding the risks of LLM hallucination. 

Given a lack of standardization in reporting processes for corporate actions, significant human capital is needed to read diverse document types to understand data for these events. 75% of firms have to revalidate this data manually, and the inefficient processes cost businesses many millions of dollars to overcome. 

Transforming Asset Servicing With AI, Oracles, and Blockchains

Chainlink’s approach to solving this problem can be found in Transforming Asset Servicing With AI, Oracles, and Blockchains. It shows very encouraging results at the prototype stage with:

  • Data Extraction and Structuring: It establishes a novel data extraction and structuring process that leverages unstructured data from public company sources and turns this into structured data that adheres to regulatory frameworks such as SPMG
  • Consensus Framework: It successfully demonstrated an LLM consensus framework for financial data comparing the outputs of multiple LLMs, greatly enhancing the reliability of their outputs and mitigating the hallucination risks
  • Near Real-Time Data distribution: Once the consensus data was established, Chainlink’s industry initiative propagated it across multiple blockchain ecosystems and stored it as unified golden records in smart contracts. This makes it accessible to market participants and provides a framework for them to build new applications on top of.

Conclusion

We are only at the beginning of the AI revolution. It can be compared to the Internet at the dial-up stage. As novel solutions to existing problems arise, the opportunity to build more and better solutions becomes clearer. 

In this study, the power of AI and LLMs, held back by the risk of hallucination, could be unleashed by a novel intersection of data extraction for consensus, and data publication on a trusted, verifiable solution with blockchain. As LLMs evolve and hopefully improve, the underlying technique of driving consensus and publishing established consensus on-chain as a golden record will continue to show value.

Chainlink’s industry initiative is a very early prototype of what could be a powerful solution that opens many new opportunities for AI, blockchain, and financial services to build better, together.

Need Integration Support?
Talk to an expert
Faucets
Get testnet tokens
Read the Docs
Technical documentation