Digital Identity on the Blockchain: Securing User Data With Chainlink
In the off-chain world, “Digital Identity” (D-ID) refers to the aggregated information that is collected by various parties and platforms when a user spends time and conducts activities online. Data such as a user’s search history, social media activity, transaction history, usernames and passwords, call records, SSN, date of birth, credit history, medical history, and other important information routinely finds its way and is stored online, ultimately building a unique profile spread across multiple databases—each user’s Digital Identity.
Users have at their disposal a range of authentication and security tools to protect their data, but even the most secure online platforms can be hacked, leading to exposure of sensitive aspects of a user’s D-ID and putting them and the platforms at risk of identity theft and fraud. In fact, multiple studies have shown that hacked or leaked personal information is among the most frequently traded products on the dark web.
The way D-ID functions on a blockchain, by contrast, is at once more public and more private. Blockchains are decentralized, immutable ledgers (or databases), allowing for individuals to transact peer-to-peer while maintaining consensus concerning the ledger/data, ultimately creating a source of shared truth. Blockchains are public in the sense that any participant or even outsider can audit every transaction and address, and they’re private in the sense that, unless they’re explicitly permissioned, blockchains require no KYC (Know Your Customer) and users can participate anonymously with their blockchain addresses possessing little or no link to their off-chain identities.
One especially promising use case for blockchain technology is to improve the D-ID experience by applying the best features of blockchain technology to legacy D-ID systems. Though the architectural details vary, a blockchain-based D-ID solution would ideally allow users to selectively choose with whom and when they share their information, keep user information off of databases vulnerable to attack, allow users to better monetize their data, and better preserve user privacy. While this use case might seem trifling at first blush, it offers more than convenience and data security—by some estimates, digital identity and related industries could reach 3% of GDP by 2030.
This article will examine the risks and challenges associated with legacy D-ID implementations, break down how blockchain D-ID might solve them, and analyze four specific implementations relying on Chainlink oracles to connect personal information with the blockchain.
Current Flaws in D-ID
Though they can often go unnoticed to users who have come to accept them, the flaws of legacy D-ID systems are both systemic and pernicious. D-ID is a crucial element to making many of the online systems people rely on for everyday life work, but at nearly every step—from the collection, storage, and sale of data—D-ID is rife with security, privacy, and even ethical concerns. Ultimately, these problems can roughly be grouped into three categories: data monetization, data access, and data storage.
An important part of D-ID is the data surreptitiously gathered by major internet platforms on a user’s behavior, habits, and biographical information. A search engine, for instance, might gather data about a user’s interests to tailor ads for them, or a social media site might sell information natively created by users to interested parties such as political campaigns. Because the details of these activities are often buried in terms-of-use agreements, users of these platforms ubiquitously and unwittingly enrich platforms with time spent ostensibly in leisure.
This process where users, by engaging in normal habits, unknowingly provide platforms with information that is then subsequently monetized is frequently referred to “free labor.” Proponents of free labor argue that this data monetization is a natural trade-off for access to what are often free services/platforms, and that they eventually benefit the user by allowing the platforms to grow faster and provide better user experiences.
However, free labor presents a host of ethical and privacy issues, often revolving around users being unclear about what data is being gathered, to whom it’s being sold, or where it’s being stored. Though some countries have attempted to place regulations on the data that can be collected by major platforms, free labor remains a rampant issue, with users all across the Internet unsure of what data is being gathered and what’s being done with it.
Certain Internet platforms and services require a more complete D-ID profile to access than others. Social media sites may require just an email address (though they’ll subsequently build a profile on a user), while a lending service or a government agency might want a full financial or personal history before providing access and services through their portal. As a result, users are often forced to provide the same information about themselves over and over across different platforms. While the separation of databases may prevent a more catastrophic breach by isolating an attack, each database storing important user information ultimately increases the attack surface of a user’s data.
This puts users in a difficult position, having to choose between time-consuming processes and bureaucracy or storing their information for repeated use on databases that might potentially be vulnerable to attack. Additionally, this system also creates headaches for the platforms as well: government agencies might store redundant information across multiple servers, which leads to cost inefficiency, and other platforms might become more vulnerable to scams or theft as a result of user data leaks. In many instances, the platforms are ultimately the responsible parties for any financial losses associated with identity theft.
As mentioned above, users frequently propagate information about themselves online, including financial information in order to make purchases or gain access to services. Access and security rarely go hand-in-hand, and the same holds true for D-ID—each website that stores information about a user presents a new attack vector through which their information might be stolen.
Given the level of the threat, one would expect that platforms would invest in superior security and privacy infrastructure. However, in spite of security efforts statistics indicate that data protection problems are getting worse, not better: upwards of 10% of the population is affected by identity theft every year, and that number is on the rise during the pandemic.
Blockchain-Based D-ID Solutions
Because of these flaws, D-ID is a space ripe for disruption from blockchain technology. By using blockchains to architect superior D-ID systems, many of the most glaring problems with D-ID can be solved and whole new use cases can be enabled.
The key features of a blockchain-based D-ID system would include: the ability for users to monetize the information they natively create and track how their information is being used; the ability to readily and easily share D-ID information; and the ability to keep that data secure. There are a range of unique approaches towards achieving these goals—including the potential of doing away with off-chain identities entirely—and each leverage blockchain in different ways.
One blockchain-based D-ID system is Chainlink’s privacy preserving oracle technology DECO—developed by Chainlink Labs Chief Scientist Ari Juels, researcher Fan Zhang, and others. While new D-ID storage solutions may alter how data is stored, the reality is that a lot of data is still stored in trusted databases. Many users/institutions may prefer the security of entrusting a high-security custodian to protect that data, especially governments and large enterprises.
DECO allows oracles to attest to the validity of information in trusted databases/systems without exposing it to the public or even the oracle itself using a cryptographic technique known as Zero Knowledge Proofs. Essentially, the oracle can join a user-initiated web session to attest to some requested information— possibly to verify someone’s identity, approve their financial information, or check key government records. Importantly, that data never leaves the secure, user-selected database, allowing a user to store their D-ID information in certain locations they trust and set up selective access, as opposed to propagating it to a variety of systems with weak guarantees on access control. This allows for a privacy-preserving plug-and-play option that combines the usability of legacy systems with the security of blockchain.
DECO’s privacy-preserving technology also allows for use cases that would otherwise have been impossible, such as big data medical studies. For years researchers have been excited about the potential of applying machine learning and computational analysis to large medical datasets, hoping to use these tools to make discoveries and breakthroughs that human analysis wouldn’t be able to find. However, the privacy and security concerns of patient data have long been a roadblock. DECO would allow researchers selective access to the data they need while complying with HIPA regulations and without putting that data at risk, potentially enabling a new era of medical research.
Another example of a project using blockchain technology to enhance D-ID is Bloom, a decentralized identity protocol that allows users to claim, control, and selectively share their financial data while retaining full ownership via a decentralized architecture.
Bloom works by taking user-provided data and verifying each user’s identity, and then subsequently writing that data to the blockchain as an encrypted hash. This allows user information to be stored on a public ledger/source of truth while simultaneously maintaining privacy of it. It’s especially useful for financial information, which is one of Bloom’s core areas of focus—a recent blog post from Bloom laid out how Chainlink oracles help connect credit scores to DeFi protocols.
“Bloom started as a protocol using smart contracts and Ethereum addresses to uniquely identify individuals and enable them to claim, store, and share verified identity attributes, with the goal of decentralizing the credit bureau model,” says Isaac Patka, CTO of Bloom. “As the technology evolved we joined forces with the larger decentralized/self-sovereign identity community to develop open and interoperable standards for identifying users, issuing credentials, and exchanging information. The identity standards have now matured to the point that we can take this technology to market and drive global impact. We are excited to realize our original vision of extending financial inclusion, and using platforms like Chainlink to bridge the gap between the traditional and decentralized worlds.”
Unstoppable Domains is decentralized blockchain-based protocol for registering and hosting Internet domain names as non-fungible ERC721 tokens on the Ethereum blockchain. Unstoppable Domains recently announced a new feature that uses Chainlink oracles to link Twitter users to specific domains, making it easy to identify and confirm a user’s public address based on their social media account. Additionally, users can send payments directly to the domains, bypassing often confusing Ethereum addresses for a superior UI/UX experience.
What makes this solution unique is that it can potentially bypass real-world information entirely. Twitter users can remain anonymous, but still have a named Internet domain linked to them that can send and receive blockchain-based payments. This allows for secure, highly intuitive transfer of value between parties whose identities are potentially entirely digital and don’t have to be stored in any centralized database.
Decentr is a project that aims to provide a Web3 version of credit scores—what they call a “Personal Data Value” (PDV). Each user’s PDV would be sourced from a potential combination of social media activity, on-chain activity such as their total owned assets and history of repaying loans, and real-world data such as KYC/AML information. As discussed Decentr’s blog post, Chainlink oracles can supply this data to DeFi protocols across any blockchain, and users with high PDV values could potentially receive less collateralized or even collateral-free loans.
Like Unstoppable Domains, this approach not only finds a way to securely connect off-chain data to D-ID using blockchain, but also bolsters the blockchain identity experience by taking valuable on-chain data and using it to create a D-ID profile of users. Privacy-focused users could potentially bypass using real-world information all together, and instead build their PDV value solely from their on-chain metrics.
Current flaws with D-ID systems consistently and pervasively put users’ privacy and security at risk. However, these risks are currently perceived as a necessary tradeoff for accessing key day-to-day services and platforms. With the use of blockchain technology and Chainlink oracles, these risks can be mitigated by securely allowing platforms as-needed access to off-chain data, or by doing away with off-chain identity entirely in lieu of on-chain data verified by Chainlink oracle networks.
These developments will not just help secure users’ privacy and data security, but also has the potential to enable new use cases for private data that would otherwise have been impossible due to security and privacy concerns. Through using Chainlink oracles to connect on- and off-chain data to protocols and Web3 platforms, a new era for Digital Identity might just be beginning.
If you want to learn more about the blockchain space, explore the Chainlink Blog for more content, including articles about Data Quality for DeFi, Dynamic NFTs, the Oracle Problem, Economic Rewards in Gaming, DeFi Composability, and much more.
If you’re a developer and want to connect your smart contract based application to off-chain data and infrastructure outside the blockchain, please reach out to us here or visit the developer documentation.