The blockchain oracle problem is one of the most important barriers to overcome to enable smart contracts on networks like Ethereum to achieve mass adoption across a wide variety of markets and use cases.

As previously discussed in our Blockchain Education Series, smart contracts running on blockchains offer immense potential to redefine the way multiple different parties engage in shared contractual agreements and exchange value. Operating separate from the smart contract economy is the much larger non-blockchain digital economy, made up of all the Internet connected devices that now compute online. A byproduct of this digital infrastructure is an ever expanding reservoir of data and APIs that provide insights into how the world works; e.g. Internet search results showing popular topics of discussion in society or IoT sensors showcasing common traffic patterns.

Blockchain-based smart contracts and traditional data and API economies have immense potential to be the future building blocks of data-driven automation, but the question is how do these two worlds connect? This encompasses the crux of the “oracle problem” and will be the focus of this article.

The article will be broken down into five key sections that:

  • Define the oracle problem
  • Outline the job of an oracle
  • Discuss why blockchains like Ethereum don’t offer native oracle solutions
  • Identify the security risks of centralized oracles
  • Introduce Chainlink, the standard for secure and reliable decentralized oracles

The Oracle Problem

The oracle problem revolves around a very simple limitation—blockchains cannot pull in data from or push data out to any external system as a built-in functionality. As such, blockchains are isolated networks very akin to a computer with no Internet connection. The isolation of a blockchain is the precise property that makes it extremely secure and reliable, as the network only needs to form consensus on a very basic set of true/false questions using data already stored inside of its ledger—e.g. did the public key holder sign the transaction with their corresponding private key, does the public address have enough funds to cover the transaction, and is the type of transaction valid within the particular smart contract? The very narrow focus of blockchain consensus is why smart contracts are referred to as being deterministic; they execute exactly as written with a much higher degree of certainty than traditional systems.

However, for smart contracts to realize over 90% of their potential use cases, they must be connected to the outside world. For example, financial smart contracts need market information to determine settlements, insurance smart contracts need IoT and web data to make decisions on policy payouts, trade finance contracts need trade documents and digital signatures to know when to release payments, and many smart contracts want to settle in fiat currency on a traditional payment network. None of this information is generated within the blockchain, nor are these traditional services inherently accessible.

Bridging the connection between the blockchain (on-chain) and the outside world (off-chain) requires an additional and separate piece of infrastructure known as an ‘oracle’.

What do Blockchain Oracles Do?

A blockchain oracle is secure middleware that facilitates communication between blockchains and any off-chain system, including data providers, web APIs, enterprise backends, cloud providers, IoT devices, e-signatures, payment systems, other blockchains, and more. Oracles encompass several key functions:

  • Listen - monitor the blockchain network to check for any incoming user or smart contract requests for off-chain data
  • Extract - fetch data from one or multiple external systems such as off-chain APIs hosted on third party web servers
  • Format - enable two systems to intercommunicate by formatting data retrieved from APIs into a blockchain readable format (input) and/or making blockchain data compatible with an external API (output)
  • Validate - create a cryptographic proof to attest to the performance of the oracle’s services using any combination of data signing, blockchain transaction signing, TLS signatures, Trusted Execution Environment (TEE) attestations, zero knowledge proof, and more
  • Compute - perform some type of computation on the data, such as calculating a median from multiple oracle submissions or running more complex tasks like generating an insurance quote from several types of data (personal risk profile, market rates, cost of capital, etc)
  • Broadcast - sign and broadcast a transaction on the blockchain as a means to send data and its corresponding proof on-chain for the smart contract's use
  • Output (optional) -  send data to an external system upon the execution of a smart contract such as relaying payment instructions to a traditional payment network or affecting a cyber-physical system

To offer the functions above, the oracle system must operate both on and off the blockchain simultaneously. The on-chain component is for establishing a blockchain connection (to listen for requests), broadcasting data, sending proofs, extracting blockchain data, and sometimes performing computation on the blockchain. The off-chain component is for procesing requests, retrieving and formatting external data, sending blockchain data to external systems, and potentially performing computation in more advanced oracle networks.

Chainlink crop insurance contract
A Chainlink oracle powered smart contract for crop insurance that takes in weather data from multiple sources, uses the aggregated data point to trigger the execution of an insurance contract on the blockchain, and pays out in any traditional paymenth method.

Why Blockchains Can’t Solve the Oracle Problem

Blockchains are highly secure and reliable because of a few specific design principles. As described above, they only need to form decisions on very basic questions using data solely generated within its own environment that are provably true or false. In addition, they use decentralization to both redundantly validate the same piece of data by all the nodes in the network and ensure that one node or a small group of nodes cannot change the rules of the consensus algorithm (PoW, PoS, etc.) and Sybil attack the network, e.g. gain 51% control of the hash power. These properties provide strong guarantees around determinism, especially in a highly decentralized and Sybil-resistant network.

However, blockchains are not well suited to answer questions that begin to delve into the realm of subjectivity or require external data that is not accessible to every node in the network. For example, a simple question like ‘What is the market price of Bitcoin?’ or ‘What is the weather in New York?’ can elicit a wide range of different answers that may vary depending on what data source they use and when they request data from a source. The question then becomes, what is the correct answer?

Introducing subjectivity at the base layer of the blockchain opens up Pandora’s box around a whole host of security, reliability, and governance concerns, which puts at risk the very value proposition blockchain aims to provide -- unbiased determinism for computing transactions.

One major concern is how to ensure data is high quality when not every node has the same access to data. Even a basic data request for the price of Bitcoin is very challenging because simply looking at a website or a single exchange will not be as accurate or reliable as a paid API subscription to a professional data aggregator with financial incentives to maintain high quality services and decades of experience filtering data and creating market coverage that accounts for all trading environments. It’s extremely difficult to manage and enforce quality control for off-chain data submitted by blockchain nodes since anyone can run a pseudo-anonymous node and they all have an equal opportunity to submit answers, yet they may not all be willing to buy a subscription to a high-quality off-chain API nor is it easily enforceable.

Another major concern is scalability. Every time a new data source needs to be added to the network or an existing data aggregation method must be adjusted, it requires massive social governance coordination to get every node in the network to agree and upgrade their software. This governance overhead increases friction, slows development of other core features of the blockchain (such as PoS and sharding), and limits the speed of oracle innovation. Ultimately, the more complexity there is at the base layer of the blockchain, the more attack surface and risk to all applications that run on it. Even applications not using oracles or not involved with confrontational data requests will get caught in the crossfire and potentially get disrupted if the entire chain comes to a halt because of an oracle issue.

It’s for these reasons and many more that oracles are not integrated into the base layer of any major blockchain, but instead operate as separate networks. This ensures that blockchains have a lower attack surface and retain their determinism by having a singular focus on blockchain consensus, while oracles have the required flexibility needed to produce determinism from a complex and subjective off-chain world without creating dependencies and limitations that put at risk the entire chain.

Centralized Oracles Introduce Major Risks

The entire point of a smart contract is to achieve determinism through technological enforcement of the contract’s terms as opposed to probabilistic execution carried out by human enforcement. To achieve this end, the blockchain cannot have any single point of failure, which must be extended to the oracle to maintain these properties throughout the entire end-to-end lifecycle of the contract. Why have a multi-million dollar contract function as a smart contract on a fully decentralized blockchain if a single centralized oracle can control the inputs which determine the contract’s outcome?

A centralized oracle is a central point of failure in the smart contract
A centralized oracle is a central point of failure in the smart contract

Whether it’s the development team of the smart contract application running the oracle themselves or relying on a third party oracle service, both scenarios give excessive power to a single entity to influence the contract via control of the oracle. While the centralized oracle operator may operate with the best intentions, they are still subject to all the common centralized problems of today like downtime, DDOS attacks, hacks, and accidental incompetence, all of which put users’ funds at major risk.

Even the most noble centralized entities can come under pressure once the value of the contract scales, opening them up to bribes, intimidation, and regulatory pressure, which ultimately only require one person involved in the operation of the centralized oracle to go rogue. This model is not scalable, and doesn’t fit with the idea of decentralized infrastructure being a key driver of secure and reliable automation.

In order to overcome these shortcomings, oracles need to create the same security and reliability guarantees of a blockchain, although in a different manner given the many differences in how to solve the oracle problem compared to achieving blockchain consensus.

In order to bring determinism to the oracle layer, Chainlink has developed a decentralized oracle network that can provide a multitude of different guarantees that can be used in any combination to provide customized oracle solutions to any use case.

  • Open-source - being an open-source technology allows the wider blockchain community to independently verify the security and reliability of Chainlink’s source code and functions, as well as contribute to its improvement
  • External Adapters - allowing nodes to securely store API keys and manage account logins enables smart contracts to retrieve data from any external system and API, including those that are password/credential protected
  • Decentralization - employing decentralization at the node and data source level ensures no one node or data source is a single point of failure, providing users strong guarantees that data will be delivered on time and remain resistant to manipulation
  • Data Signing - having nodes cryptographically sign the data they provide to smart contracts allows users to identify which nodes sent data and look at their past history to determine their performance quality
  • Service Agreements - using binding on-chain agreements between the requesting smart contract and the oracle provider that outline the terms of the oracle service and penalties/rewards for performance provides users with enforceable guarantees on their off-chain data request
  • Reputation Systems - feeding signed on-chain data into reputation systems allows users to make informed decisions about which nodes are good and which nodes are not based on a variety of metrics like successful jobs, clients served, average response time, etc
  • Certification Services - enabling nodes to increase their security and reliability to users by obtaining any number of certifications can provide certain key guarantees like KYC, geographic location of the node, security reviews of their infrastructure, and more

    Advanced Cryptography and Hardware - providing flexibility for more advanced cryptography (like zero knowledge proofs) and hardware (such as trusted execution environments) enables oracles to perform additional functions like prove the origin of data to a smart contract (e.g. specific data came from a specific server), keep data confidential from the oracle itself, perform off-chain computation, and more
Chainlink features

These are just some of the many features offered by Chainlink that provide users with a whole set of guarantees to ensure a highly secure and reliable oracle mechanism. In future Education articles, we will dive deeper into each of them to get a more complete understanding of the Chainlink Network.

By building out these key features on Chainlink, smart contracts on any blockchain can now access off-chain data without sacrificing on its core value of determinism, providing a solid foundation from which to build out the future of data-driven automation.

Follow us on Twitter to get notified of upcoming article releases, join our Telegram or Reddit for general news on Chainlink, or take part in the technical discussion on our Discord.