Data Quality for DeFi Smart Contracts

The ecosystem of Decentralized Finance (DeFi) applications and the value secured by blockchain oracles are scaling in tandem, mutually supporting the success and growth of one another. As the value secured by DeFi continues to grow at a rapid pace, so does the importance of ensuring that this revolutionary new decentralized financial ecosystem provides a higher level of security and reliability guarantees to its users.

The blockchain “oracle problem” is fairly well known, as many articles have examined this subject in detail. However, the topic of “data quality” provided by oracles still remains largely unknown and misunderstood. The misunderstanding is rooted in the often held assumption that oracles are used to both transfer external data on-chain and generate high-quality data. In our experience researching and building secure oracle solutions, blockchain oracles are designed to transfer data on-chain and harden data against manipulation, not create the data itself.

The separation of these problems (data delivery vs. data quality) has been applied to the architecture of the decentralized oracle networks powering Chainlink Data Feeds, which are the most used oracles in DeFi. Based on the success of these oracle networks in providing high-quality data to live applications, we’ve identified five essential components to solving the data quality problem.

  1. Enable oracle nodes to connect to premium data providers to ensure user contracts have access to the highest quality data. This requires the oracle protocol to have password and credential management capabilities so all nodes can securely store API keys and manage account logins for paid subscriptions.
  2. Have nodes source price data from high-quality off-chain data providers that specialize in generating accurate price data; specifically data aggregators that maintain volume-adjusted market coverage across all trading environments. Using the oracle mechanism to generate a global market price from a collection of raw data feeds is extremely difficult and it opens the oracle up to numerous attack surfaces, such as rapid volume shifts and data outliers, both of which are not uncommon in cryptocurrency markets.
  3. Incorporate decentralization as an active part of the security and reliability guarantees provided by the oracle. Aggregate from multiple independent nodes to ensure the oracle mechanism is manipulation-resistant and highly available when delivering data to contracts. Source from multiple high-quality data providers without sacrificing quality to bring additional decentralization to the data source level.
  4. Prefer systems that give both users and developers the ability to make informed decisions when designing an oracle mechanism by providing them with on-chain insights into the current and historical performance of each node and the decentralized oracle network as a whole. Avoid any security through obscurity approach to minimize hidden risks and ensure as many eyes as possible can spot any potential issues early on before they become substantial problems down the line.
  5. Steer clear of large risks such as using single exchange data sources and/or diluting high-quality data from secure oracle solutions with low-quality data from less secure oracle solutions. Decentralization without quality control standards exposes contracts to a larger attack surface with greater complexity, often leading to the unintended consequences of diminishing the high-quality oracle solution.

To further expand on each of these features that are important to ensuring data quality, we examine the ideal composition of a secure decentralized oracle network, how to properly leverage Chainlink’s extensive flexibility to generate high-quality data, and the large data sourcing risks to avoid when designing price oracle networks.

Composition of a Secure Decentralized Oracle Network

An oracle is middleware that serves as a bridge between the blockchain and the outside world. It allows smart contracts to consume information not stored on the blockchain, in order to become externally aware of everyday events in the real world. This external connectivity exponentially increases the number of events about which a smart contract can be written about, allowing developers to capture more value across a wider array of markets. With increased connectivity comes new attack surface area that must be secured in order to maintain the smart contract’s core tenants of tamper resistance, immutability, and availability.

Decentralized oracle networks are secure middleware for connecting on-chain and off-chain environments, providing a framework for building the security and reliability guarantees necessary for users to trust externally connected smart contracts with billions of dollars or more in user funds. Without holding oracles to the same security and reliability standards as the blockchain, the entire smart contract is at risk even if the contract code itself is flawless.

Chainlink Data Feeds are a collection of decentralized oracle networks that provide the largest set of on-chain pricing data in the Ethereum ecosystem, relied on by an increasing number of leading DeFi applications. The design pattern of these price oracle networks follows a security through verifiable decentralization approach and adheres to the best data quality practices to bring maximum security to its users. Here are four key features implemented throughout Chainlink Data Feeds that should be applied to all decentralized oracle networks looking to ensure data quality.

A diagram showing various node operators and data sources.

High-Quality Data from Premium Data Providers

While mining blockchain hashes is a fairly uniform operation, generating industry-specific data that is of high enough quality to actually be relied upon to secure hundreds of millions of dollars in value is not a task anyone can be trusted to do. Instead of trying to use the oracle mechanism to generate high-quality data from a collection of raw data, developers are often better off having nodes source data directly from respected data aggregator companies with large teams, full-stack infrastructure, and a sole focus on generating high-quality data for specific industries.

Generating high-quality proprietary data is capital intensive; it’s not free and requires a legally binding contract and credentials to access. Nodes must either have a paid subscription to the data provider (API) or be authorized specifically by the data provider (e.g. internal enterprise data). Both of these permissioned models require password and credential management capabilities to bridge the interaction between the node and API. Thus, node operators need the ability to store API keys and manage account logins to interact with these premium data providers.

Oracle solutions that are unable to connect to premium APIs because of a lack of credential management capabilities are limited to offering open, free, or pirated APIs. These APIs generally have low-quality data, rate limit throttles, unreliable response times, and no legally binding availability or service quality guarantees, all of which make such data sources not suited for any high, medium, or even many small value use cases. Smart contracts that are fed low-quality data have no guarantee about the reliability or accuracy of the data being consumed, creating a larger attack surface in the process. As is true with any other technology-driven by data, “garbage in, garbage out.”

Chainlink nodes participating in the Price Reference Data Contracts leverage External Adapters, enabling them to connect to any premium API. These APIs offer higher quality data, quicker response times, and have guarantees regarding availability and service quality. External adapters are modular, can be written in any programming language, and can be hosted from a different server than the Chainlink node. They can be used to retrieve data from data providers, web APIs, enterprise systems, IoT devices, payment systems, other blockchains, and more.

Decentralization of High-Quality Node Operators

Data quality is a moot point without a secure and reliable oracle mechanism to deliver the data to smart contracts. Decentralization of high-quality node operators is a key design pattern to protect against unpredictable periods of downtime and eliminate the need to trust a single entity not to tamper with the data delivery process. Decentralized consensus greatly increases the cost of attack because even if a few nodes experience downtime or become malicious, it will have little effect on the final aggregated response.

Chainlink’s Price Reference Contracts are powered by decentralized oracle networks that aggregate responses from numerous independent, security-reviewed, and Sybil-resistant oracle nodes. Participating Chainlink nodes are operated by leading blockchain DevOps and security teams distributed around the world, including both off-site in the cloud and on-site bare-metal servers, to avoid any single point of failure in the oracle mechanism. There are also numerous community-operated nodes on standby that can be added to a network at any time for additional decentralization.

Decentralization of High-Quality Data Sources

Oracle solutions can become more robust by incorporating multiple data sources as long as they don’t sacrifice on the quality of any individual data source. Decentralization of high-quality data sources prevents a single data provider from being the sole source of truth and protects against situations when the sole data provider goes offline. However, there may be situations where only a single high-quality data source is available, which is where more advanced cryptographic techniques for securing data quality, such as staking-backed service agreements (discussed below), TLS verification (Town CrierDECO), and zero-knowledge proofs, become more important.

Chainlink’s Price Reference Contracts are decentralized at the data source level. Each Price Reference Network collectively sources market data from multiple independent and highly reliable data providers. These data providers consist entirely of premium data aggregators that have full market coverage across all trading environments and include Brave New CoinKaiko, and many other reputable data APIs. Each data point is then aggregated to form a single reference price that is stored on-chain for contracts to consume using a simple read function.

Open-Source Visualizations and Monitoring

If the smart contracts underlying DeFi applications are open-source for the public to monitor in real-time, then the price oracle mechanisms providing data should be transparent too. Without transparency of the oracle solution, dApp users have no ability to verify where the data is sourced from, which nodes are providing data, the latency of responses, the historical performance of the oracle network and the accuracy of its data, and more.

Each of Chainlink’s Price Reference Data contracts is accompanied by transparent visualizations derived from on-chain data that showcase a very detailed set of information, such as:

  • The latest price of each reference data feed
  • Which DeFi projects are sponsoring and supporting each price feed
  • Which security reviewed node operators are securing the price feeds
  • When updates should occur
  • The minimum amount of node responses needed for aggregation to begin
  • A plethora of other relevant key information about the oracle network as a whole and each individual node operator

Chainlink Data Feeds
Chainlink Data Feeds provide smart contracts with high-quality price data using a decentralized network of oracle nodes.

Additionally, individual node performance can be analyzed on a per data request basis to see if a node was able to successfully complete a data request or not. The Chainlink Explorer allows node operators, data providers, and users to study the performance of each node within a network and see exactly which steps were taken to identify if there were any errors along the way.

Flexibility is an essential component of a generalized oracle network capable of becoming a standard used throughout DeFi. It empowers developers to create whatever oracle design pattern they feel gives them the security and reliability guarantees they need. While Chainlink Price Reference Data feeds utilize a combination of numerous data aggregators to generate global market prices, the Chainlink protocol does not enforce any one design pattern regarding how an oracle network is constructed or from where the data is sourced. Instead, it provides the most open modular framework on the market to meet any specific needs.

Incorporate Any Data, Collection of Nodes, and Aggregation Model

Developers use these external adapters to quickly connect their smart contracts to any data source required for execution. They can also customize the precise amount of decentralization they need, the exact data sources from which they want to pull data, which algorithms are used to aggregate data, and the frequency at which updates should occur. This provides maximum flexibility in how the contract consumes external data.

Chainlink Market
Market.link is a third-party website where developers can explore pre-built external adapters that provide a wide array of data sources.

This customizable framework allows developers to scale their oracle network up or down with ease, depending on the security they want to pay for. Chainlink has the largest pool of secure node operators along with many community nodes competing for jobs that can be quickly added to any oracle network for additional security guarantees. Furthermore, developers have access to a growing variety of preformatted data sources that can be included in a data aggregation without any upfront development work.

Chainlink also provides the ability to customize the aggregation, by using an average, median, or even more complex models of weighted sourcing and removal of outliers. This includes flexibility in the update frequency, whether using a time-based update, price deviation updates (e.g. every 0.5% change in price), or some type of hybrid approach with multiple parameters.

Chainlink’s flexible framework allows data providers to customize how they provide data to the new and emerging smart contract economy, either by operating as a traditional API business or by running a Chainlink Node directly.

Traditional APIs
Data providers can choose to operate the exact same way they do today by providing their data to paying users via subscription models offered in fiat currencies. Chainlink nodes can subscribe to these APIs and relay their data on-chain by leveraging Chainlink’s external adapters in their node setups. This can be applied to any current existing data provider and is already being used in production by Chainlink Data Feed oracle networks, with nodes subscribing to premium data providers such as CoinGecko. Chainlink also has external adapters already made for exchange APIs available to nodes like Binance and Coinbase.

This model is powerful because data providers don’t need to change anything about their current business model or infrastructure. Even if data providers themselves are hesitant to service smart contract markets directly, Chainlink nodes can still provide developers with access to whatever data resources they need via modular external adapters. This also allows nodes to subscribe to data providers that run their own Chainlink nodes (described below) to provide additional decentralization to their data.

Data Providers as Chainlink Nodes
The other model is for data providers to operate Chainlink nodes and sell their data directly to smart contracts. This provides them with a new method of monetizing their data and is already being utilized by several leading data providers, such as market data aggregators Kaiko and Alpha Vantage, along with cryptocurrency exchange Huobi.

One of the unique advantages data providers receive by running their own Chainlink nodes is the ability to cryptographic sign their own data. Users and smart contracts can be assured that price data coming directly from a data provider or exchange’s Chainlink node hasn’t been tampered with en route to the smart contract, since the data is cryptographically signed using their node’s unique private key before broadcasting on-chain. Signed data can then be verified on-chain through the node’s public key address, ensuring it has high integrity for being direct from the source.

Through this framework, data providers can broadcast data directly to the blockchain, removing the need to rely on external actors to route their data on-chain for them. This empowers data providers to control the frequency with which they broadcast data on-chain and allows them to maintain the security of their data — from data generation all the way to final delivery to the consuming smart contract. Thus, data providers have the flexibility to simultaneously provide data to multiple different applications in unique ways, e.g. providing updates every minute to one set of applications while servicing other applications using a price deviation threshold mechanism of every 0.5% change in price.

Data Providers can sell their data to Chainlink node operators and/or run a Chainlink node directly
Data Providers can sell their data to Chainlink node operators and/or run a Chainlink node directly

Sell to All Blockchain Environments From One Gateway

Data providers can’t be expected to understand every blockchain environment and independently set up secure operations on each one, especially considering the novelty of the blockchain market and the lack of documentation and developers who understand each environment.

Chainlink’s oracle network is accessible by all blockchain environments, either by leveraging existing external adapters/initiators or creating new ones to quickly make new blockchain available. Chainlink is open-source so core developers can integrate Chainlink without any outside permissions, leading to horizontal scalability without development bottlenecks. Most of the leading blockchains, including Ethereum, PolkadotTezosCosmos, and many more, are already integrating Chainlink natively into their networks

This setup provides data providers and smart contract developers a single gateway from which they can sell and access data across any chain, ultimately bringing more data on-chain to dApps and allowing data providers to capture more revenue. Importantly, the flexibility of this approach avoids the need for data providers to choose where to deploy resources.

Establish Crypto-Economic Guarantees on Data and Service Quality

Chainlink oracle networks will incorporate binding Service Agreements signed by both the node operator and the requesting smart contract, which pre-defines the parameters with which a node needs to be compliant during the entire length of the agreement. These parameters set the terms of data delivery (latency of response), data quality (accuracy of response), amount of staked LINK (crypto-economic guarantees), slashing conditions (penalties), and any other predefined terms and conditions defined by the requester. The node operator’s payment is dependent on successfully fulfilling these parameters and ultimately delivering high-quality data on-chain in a timely manner. This ensures data providers that run their own Chainlink node have maximum flexibility over what data guarantees they offer, bringing an additional level of trust and integrity to the quality, reliability, and accuracy of their data.

Binding Service Agreements create crypto-economic guarantees of data quality and delivery.
Binding Service Agreements create crypto-economic guarantees of data quality and delivery.

Make Informed Decision Using Reputation Frameworks and Node Marketplaces

A critical component of managing flexibility is giving users the tools to make informed decisions about the parts they incorporate into their oracle network. It encompasses two parts: Reputation Systems and Node Marketplaces.

Reputation Framework
Reputation systems provide users with an immutable record of on-chain data that showcases the historical performance of a node operator and/or data provider. Future requesters then have historical cryptographic proofs for determining which nodes are reliable and which are not.

Through third-party services, Chainlink nodes can be compared directly against one another to see which oracle nodes are more reliable than others. These sites show both raw data and refined stats about the Chainlink network in aggregate, as well as specific data about each individual oracle node, including transaction count, response time, payments earned, success ratio, and more.

Reputation.link allows developers and users alike to get deep insight into the performance of Chainlink oracle networks, including each individual node and data source.
Pier Two allowed developers and users to analyze the performance of Chainlink oracle networks, including each individual node and data source.

Node Marketplaces
The other important component is having a location to discover oracle nodes, filter them, compare them, and ultimately select them to be used within an oracle network. There are multiple third-party node listing sites for Chainlink nodes, such as LinkPool’s Chainlink Market. This allows developers full control over how their oracle network will be structured, because they can choose exactly which nodes they trust and how many nodes they are willing to pay for.

Additionally, each node operator is able to list their certifications, security reviews, proof of identity, external adapters, data sources, and job run specs that detail exactly which off-chain services they offer to smart contracts. Operators can set their own prices and parameters for each job they offer, creating a free market economy where nodes compete on a multitude of varying factors.

LinkPool Chainlink node details
Nodes on market.link are able to build their reputation on-chain and have it displayed to developers looking for reliable nodes.

How to Leverage Flexibility to Build Price Oracles That Avoid Large Data Sourcing Risks

In order to provide high-quality data, there are several attack vectors that are critical to take into account and preemptively mitigate when designing an oracle mechanism. By not taking these situations into account, developers are taking massive risks with user funds and ultimately putting the success of their entire dApp at risk.

Volume Shifts/Exchange Lock-In
Cryptocurrency markets differ from traditional financial markets because no exchange owns the exclusive issuance of assets and therefore cannot lock-in users and cover the entire trading market of an asset. Blockchain technology is permissionless and thus anyone can list cryptocurrency coins/tokens on their exchange for traders to access at any time. Because of this dynamic, the volume of cryptocurrencies is spread across many different exchanges and can shift rather quickly between different exchanges. This needs to be accounted for by an oracle mechanism if it is to avoid market manipulation attacks where the majority of the volume is shifted to an exchange not included in the data aggregation process.

Flash Crashes
Cryptocurrency exchanges, which commonly lack sufficient circuit breakers, are susceptible to flash crashes where the market price deviates far outside the rest of the market across all other exchanges. Even the largest exchanges are subject to this risk and have already experienced these issues multiple times over the years. Kraken experienced a flash crash that saw the BTC/CAD price pair plunge from \$11,200 to \$100 CAD, a drop of nearly 99%. Coinbase experienced an extreme flash crash which saw ETH temporarily plummet from \$322 to a low of \$0.1. Additionally at the beginning of 2020, Bitmex, a cryptocurrency derivatives exchange, experienced a flash crash where the price of XRP plummeted 60% within a single minute from \$0.33 to \$0.13.

Quality Dilution
It’s important to avoid enforcing decentralization without quality control standards in place to avoid any lower quality data sources from diluting the aggregation process. Data providers and node operators with poor performance histories, unknown reputations, and unproven security infrastructure should be barred from being included in any oracle mechanism. Ensure node operators and data providers have the resources and knowledge to be able to resolve any issues that may occur, and set up alerts and ensure failsafes.

Please note, a malicious attacker does not have to be an experienced developer to exploit these attack vectors. Any retail trader or small group of traders who notices such an opportunity can use an exchange’s UI to manipulate particular markets and skew the reference price point generated by an oracle with limited market coverage. This greatly increases the attack surface as it opens up the ability to manipulate an improperly designed oracle to anyone with an internet connection and an exchange account.

Chainlink’s Price Reference Data Contracts have been specifically designed to mitigate  these risks utilizing data aggregators instead of a single exchange API or a collection of exchange APIs.

High-Quality Data Aggregators Provide Market Coverage Across All Price Discovery Environments

Chainlink Data Feeds exclusively uses data aggregators because they provide the most robust market coverage, a feature that is especially important for providing accurate data to markets that are still relatively low in volume when compared to traditional financial markets. When sourcing from data aggregators, the role of maintaining market coverage shifts from the creator of the oracle network, who may not have the experience or resources to continually track exchange volume, to professional data aggregators.

These data aggregators have full-time globally distributed teams that are highly experienced in maintaining accurate price data with complete market coverage across all trading environments. They take into account important metrics such as liquidity, volume, time, and how these metrics vary across exchanges; they also smooth out any outliers. These features make Chainlink’s Price Reference Contracts highly resistant to volume shifts, flash crashes, and quality dilution.

Chainlink’s Price Reference Contracts also utilize multiple data aggregators to harden price data against any single source manipulating the resulting data point. This provides even greater security and reliability guarantees to dApp developers and end users, in addition to the usage of security-reviewed node operators with best-in-class monitoring teams.

The end-to-end process of Chainlink’s Price Reference Data contracts
The end-to-end process of Chainlink’s Price Reference Data contracts

The Major Risks of Improperly Leveraging Oracle Flexibility When Sourcing Data

To fully understand the risks involved with deploying oracle networks that either do not take into account volume shifts or source exclusively from a single API, it’s best to look at some real-world examples of what can go wrong and how this plays out.

CAUTION: Avoid Oracle Designs Based on a Single Exchange

Oracle networks that pull price data from a single exchange not only have no protection against exchange downtime, flash crashes, and price manipulation; they also have extremely limited market coverage. While such a setup may seem like it is initially working during times of low volatility, when market volatility increases, price discovery occurs and volume can shift frequently between different exchanges. Even if the oracle is updated to track a different exchange, the new price point can be highly inaccurate, because market shifts don’t always take the same form. This creates a scenario where, although the data source has changed, it cannot maintain market coverage reliably.

In this video, Chainlink Co-founder Sergey Nazarov explains the dangers of using a single exchange as a data source:

 

Below is a step-by-step outline of the dangers created by pulling from a single exchange:

  • A developer, Joe, has written a smart contract application that requires external price data of a crypto asset. He decides to build an oracle network that pulls price data from his preferred exchange, Exchange C. On the exact day he built the oracle, Exchange C had 80% of the asset’s volume; he figures it’s a “good enough” solution.
  • A week goes by and user deposits are growing. While Exchange C now only covers 50% of the asset’s volume, the market volatility of the asset is low and the oracle mechanism appears to be working. Joe figures he can continue to focus on developing more features for his dApp instead of thinking twice about the oracle’s declining market coverage since “it’s still working.”
  • Another month goes by and, in the middle of the night, Joe wakes up to a call informing him that his dApp just had millions in user deposited funds drained. He soon finds out that the majority of market volume shifted away from Exchange C, the only source from which his oracle pulled, and which now only accounts for 5% of the asset’s volume. This exchange was manipulated by a whale trader causing the oracle to report an outlier price point, which opened up the opportunity to unfairly siphon a large amount of value from innocent users.
A diagram showing how centralized oracles introduce a single point of failure
  • Joe’s dApp is now dead in the water due to the loss of user trust and his reputation as a competent developer has been tarnished. Such a situation could have been avoided if his oracle solution had proper market coverage.

The above example shows the extreme dangers that are created when an oracle network is built to pull data only from a single exchange. Market coverage can make or break an application and the situation could have been even worse in the case of a flash crash.

Fixed Exchange Aggregation Oracles Do Not Account for Volume Shifts

Oracles that pull data directly from preselected exchanges are vulnerable to situations in which volume shifts to new exchanges that were not included in the original aggregation process. While the exchanges originally chosen as data sources for the oracle network may have been liquid during its initial creation, there is no guarantee that volume will stay on these exchanges into the future. This lowers the cost of attack from malicious actors because only a small proportion of an asset’s overall volume needs to be manipulated.

In this video, Chainlink Co-founder Sergey Nazarov discusses why adaptability to shifts in volume across exchanges via a secure oracle mechanism is critical to the security and reliability of DeFi products:

 

While this may seem like a minimal attack vector, consider the following situation:

  • Another smart contract developer, Bob, who witnessed Joe’s mistake of only using a single exchange, decides he will build an oracle that instead pulls data for a crypto asset from an array of preselected exchanges: A, B, and C. His thinking is that, by taking the median of multiple exchanges, market manipulation is mitigated.
A diagram showing how aggregated data equals better oracle reporting.
  • A couple of weeks go by and Bob seems convinced that he made the right decision because his oracle, which is pulling data from multiple exchanges, continues to deliver an accurate response even when a single exchange is manipulated. As such, he diverts his focus to improving his application’s core business logic. While he is working on new features, Bob does not notice that two new exchanges have shown up and captured 85% of the crypto asset’s volume.
A diagram showing how aggregated data leads to better oracle reporting.
  • A few more days go by and Bob awakes to discover that, just like Joe, his contract has been exploited to steal millions in user deposits. It turns out that while the exchanges Bob picked for his oracle network were liquid when he initially built it, over time, volume shifted to new exchanges that were not included in the initial aggregation process. As such, his oracle network eventually had only 15% market coverage and was then manipulated by traders who used the opportunity to exploit Bob’s smart contract application.
A diagram showing how aggregated data leads to better oracle reporting.
  • While Bob had the right idea to decentralize the oracle’s data sources, he did not take into consideration volume shifts between exchanges or the fact that new untracked exchanges could capture significant market volume of an asset. Even if the new exchanges didn’t appear, the volume could still have concentrated to one or two exchanges, which would have allowed traders to manipulate the low volume exchanges and skew the median calculation in their favor.

Both Joe and Bob lacked adequate market coverage because they tried to build a data generation product using the oracle mechanism. They could have avoided this situation entirely by having the oracle pull data from data aggregators that have decades of experience preventing market manipulation and market coverage issues.

Mixing Low-Quality Oracle Solutions with High-Quality Oracle Solutions

This desire to prevent outlier scenarios has caused some to consider using multiple different oracles together to create a price update. While decentralizing across many oracle solutions may sound like a good idea in theory, it actually introduces larger risks due to the shortcomings in reliability of the other oracle mechanisms and the poor data quality they provide.

We see great risk in mixing Chainlink’s highly secure and reliable price data from its Price Reference Data networks with lower quality data from unproven and less transparent oracle solutions that also do not support premium APIs and/or source market data directly from exchange APIs. This concern becomes especially critical as the value secured by DeFi increases, because the incentive to attack DeFi dApps through weak points in the oracle mechanism will also increase.

Imagine a situation where three oracle solutions are aggregated together: one is Chainlink’s Price Reference Data, which uses high-quality nodes to fetch data from premium data aggregators; another is an oracle that fetches price data from a preselected array of exchange APIs; and the last is an oracle solution that does not support credentials management and thus can only connect to a single low-quality data source or exchange API.

Combining Chainlink's high-quality data with low-quality data from insecure oracle solutions lowers the quality of the resulting aggregated data point.
Combining Chainlink’s high-quality data with low-quality data from insecure oracles lowers the quality of the resulting aggregated data point.

In this example,

  • On the right, Chainlink Data Feeds pulls data from multiple high-quality data aggregators, resulting in a volume and liquidity-weighted market-wide price that covers all trading environments of $100.
  • The top left oracle solution fetched data from a preselected array of exchange APIs (A, B, C) which, at that time, only covered 15% of total market volume (leaving out D and E which had 85% of the volume). This skewed the resulting data point, leading the oracle to report an incorrect price point of $70.
  • The bottom left oracle solution was only connected to a single low-quality data source, which experienced an outage because of high volatility in the market. This single point of failure resulted in the oracle reporting a price of $0.

When the smart contract took the median of the three values received (\$0, \$70, \$100), the final aggregated price was inaccurate at \$70 instead of the proper market-wide price of \$100. Even worse, the mean of the three would result in a price around \$57 USD. In both scenarios, the low-quality data delivered by the two other oracle solutions diluted the high-quality data delivered by Chainlink’s Price Reference Data.

Such a situation results in a low-quality data point that is more vulnerable to manipulation and dilutes the high-quality data delivered by Chainlink’s volume-adjusted data. Decentralization in an oracle solution is key, but it should not come at the expense of data or node quality. Data mixing of any kind can threaten the quality of the value delivered by Chainlink’s Price Reference Data networks. We strongly recommend against using price data from unknown sources, employing oracles with poorly thought out calculation methods, counting on oracles without premium data, or relying on oracles with subpar crypto-economic guarantees.

Providing High-Quality Data to Next-Generation Applications

Oracles and data sources are two distinct components that must be equally resilient when combined to provide full-spectrum, end-to-end security. In order to have a reliable oracle network capable of supporting a DeFi ecosystem that is responsible for billions and eventually trillions of dollars, the quality of the data being supplied must be secure and reliable. Chainlink has consistently put the security of its oracle mechanism and reliability of its data at the forefront in order to create a secure end-to-end apparatus that allows DeFi to grow and thrive into the mainstream for years to come.

A clickable banner to a report detailing the Ultimate Guide to Blockchain Oracle Security
This guide gives a comprehensive breakdown on how to evaluate blockchain oracle security.

If you are a DeFi project and would like to build your own price reference oracle network or use an existing one, visit our developer documentation, join the technical discussion in Discord, or schedule a call to discuss an integration. You can easily integrate one or many Chainlink oracle networks live on mainnet and testnet today, adding more security and capabilities to your smart contracts.

Need Integration Support?
Talk to an expert
Faucets
Get testnet tokens
Read the Docs
Technical documentation