Threshold Signatures in Chainlink
Abstract: Threshold signatures offer a concise, cheap way to verify consensus over very large groups, and we plan to use them to scale Chainlink’s security independently of Ethereum’s throughput. In this post I (1) review how threshold signatures work, following Stinson & Strobl’s 2001 paper; (2) describe a new Schnorr-like signature which is cheap to verify in solidity (just 15k gas, including input parameters); and (3) introduce the implementation I’ve developed of the threshold-signature construction, using the Schnorr-like signatures.
Introduction
What would the world be like if even strangers could be trusted to keep their commitments? How much collaboration and mutual aid go unrealized because it’s simply too hard to trust our commitments to each other? At Chainlink Labs, we aim to find out, because our goal is to help people craft secure smart contracts which faithfully respond to real-world outcomes.
A smart contract on a blockchain like Ethereum’s can reliably reward or punish[1] a participant as soon as their behavior has been proven. When operating correctly, that’s a much cheaper and safer form of enforcement than contract law[2]. But to enforce commitments to real-world behavior which people actually care about, a smart contract needs trustworthy reports of relevant real-world events. For example, today you can make a smart contract reward solutions to certain math problems[3], but an insurance contract concerning real-world risks like drought needs to trust someone to tell it when the drought has happened.
It’s relatively easy to set up a smart contract that responds when someone posts a transaction reporting a real-world event, but by itself that’s an unreliable arrangement. As more and more value comes to depend on the outcome of that report, the incentive to corrupt it grows enormous. So we need a way to make it more expensive, ideally impossible, for any adversary to influence the report.
In some sense, this can be seen as a generalization of the goals of Bitcoin’s original consensus mechanism, and Bitcoin contains the kernel of the solution: To address the “double spend” problem, Bitcoin needs trustworthy observations of which Bitcoin transactions are valid, and the solution was a distributed consensus which weights participation by the “skin in the game” demonstrated by winning the proof-of-work lottery.
Chainlink’s current approach to real-world events is for a group of “oracles” to each report their observations to a smart contract which imputes an aggregate value. The more diverse this group is, the more observers an adversary has to suborn to corrupt the outcome, so the more secure the system becomes. The trouble is, having everyone report to the blockchain individually is quite expensive. Historically, on Ethereum, there have been 25 days when naive reports of this sort probably cost more than $1.50[4]. so if you wanted confirmations from 2,000 participants, that might have cost over $3,000 on those days! A group committing to provide 2,000 verifications every day for a year could potentially face hundreds of thousands of dollars in gas costs, which would be feasible only for extremely high-margin contracts. Even at today’s prices, the total gas for a year of such reports would cost thousands of dollars.
At the same time, if you actually needed a specific group of 2,000 observers to confirm a report, any one of them could hold up the confirmation of a real-world observation by failing to send in their confirmation. So to ensure reports go through, we really want an arbitrary subset to confirm, say 2,000 out of 2,500. That way, 500 reports could fail, and the consensus value would still make it to the blockchain.
If this sounds a lot like multisig contracts, it is, but the gas costs of confirming a transaction in existing multisig wallets scale linearly in the number of participants. In this post, I’m going to describe threshold signatures, a cryptographic scheme where an arbitrary group of signers can construct one signature, as long as there is a sufficient number of them. This enables a huge efficiency gain, because almost all the calculation can be done by the signers before submitting to Ethereum, and what they actually end up submitting is very concise.[6].
Our current best threshold signature scheme (described in this post) requires about 15k gas to confirm[7]. This means, for instance, that the \$3,000 data point I mentioned earlier would only cost about \$2, a 1,500-fold savings (and at current gas/ETH prices, the cost would be a bit over one cent, vs about $17 for validation of a 2,000-strong quorum, using the current framework.)
The fact that consensus between signature participants is established independently of Ethereum (with very concise on-chain verification) also means that Chainlink’s security can scale with no improvement in Ethereum’s throughput. This will allow contracts to depend on much larger observation sets, because any sufficiently large subgroup of participants can all agree on an observed value and collectively sign it, off-chain. The limiting factor will become the cost of off-chain communication between participants during constructions of the shared private key and the signatures (both of which grow as the square of the number of participants.) And it’s likely possible to scale to an effective consensus even beyond that limit, using other methods we’ve been developing, and might discuss in a future blog post.
Cryptographic Preliminaries: Discrete Log, Pedersen Commitments, and All That
Mathematicians love powerful abstractions[8], and they also love puns. We’re going to be dealing with a number of mathematical objects which all have different notions of addition and multiplication on them. These arithmetic notions all relate to common abstractions known as groups, rings, and fields, but it’s important to keep in mind that the arithmetic operations over each is different. The beautiful thing about these abstractions is that your intuitions from regular arithmetic will mostly carry over. I will try to let you know when there are critical differences. The important thing is, if I say something like “$G+H$”, don’t necessarily assume that $G$ and $H$ must be numbers, just because I’m adding them. The “+” there is a kind of pun. Hopefully, the type of the objects will be clear from context.
The mathematical object we’ll be doing cryptography over is an elliptic curve. If you’re not familiar with them or their use in cryptography, I recommend Andrea Corbellini’s highly informative introduction. But my goal is that you shouldn’t need to know all of that to follow along, here. All we really need to accept for now is that there’s an operation on curve points which looks a lot like regular addition of whole numbers, in that
- We can add points to each other to get another curve point. This is the first of those puns: If $G$ and $H$ are curve points, we denote their sum by $G+H$. This does not look anything like numeric addition[9], but the following facts mean it has similar algebraic properties.
- There’s a zero point which has no effect when you add it to other points.
- Every point has a negation, and when you add a point and its negation, you get the zero point.
- When you add a sequence of points together, it doesn’t matter what order you do the additions in.
- There’s a point G on the curve which generates the entire curve, in the sense that if you could compute $\{G, G+G=2G, G+2G=3G, G+3G=4G,…\}$ and so on, you would eventually get every point on the curve.[10]
Here $G$ is playing a similar role to that of the number 1 when you’re doing addition of whole numbers: Just as you can represent any positive whole number $m$ as a sum of $m$ ones, $1+\ldots+1$, you can in principle represent any curve point $H$ as a multiple of $G$, $mG$ for some number $m$.
After running through every point on the curve, the sequence $(G, 2G, 3G,\ldots,)$ “wraps back around” to $G$. Let’s call the number of points on the curve $q$[11] (For security reasons, $q$ needs to be an incomprehensibly large prime.) So we have $(q+1)G=G$[12]. For this reason, it’s useful to think about multiples of $G$ in terms of “clock arithmetic” on the elements $𝔽_q=\{0, \ldots, q-1\}$.[13] Arithmetic in $𝔽_q$ is another mathematical pun: to compute $a+b$ or $a\times b$ on elements $a, b\in 𝔽_q$, first we do the operation as if they were regular numbers, then we take the Euclidean remainder[14] from that, divided by $q$. E.g., suppose for a moment that $q$ was the (extremely cryptographically insecure) number 7. In $𝔽_7$, $6+6=12=1\times 7+5$, so we say $6+6\equiv 5\mod 7$, as if we started at 6 on a clock with seven ticks, moved 6 ticks forward, and looked at which tick we stopped on. Similarly, $6\times 6=36=5\times 7+1$, so we say $6\times 6\equiv 1\mod 7$. As long as $q$ is prime, every non-zero element of $a\in 𝔽_q$ has a multiplicative inverse $a^{-1}\in 𝔽_q$ such that $a\times a^{-1}\equiv 1\mod q.$[15] In $𝔽_7$, $6^{-1}$ is 6 itself. Because every non-zero element has an inverse, we can calculate quotients $a/b$ for $b\neq 0$ as $a\times b^{-1}$.
One cryptographically important aspect of arithmetic on elliptic curves is that there is a concise representation[16] of the value from adding $G$ to itself $m$ times, $mG=G+\ldots+G$, and $m$ is believed to be irretrievable from that representation[17]. It’s a widely accepted, foundational assumption of elliptic curve cryptography that if I just give you the value $mG$, you’re very unlikely to be able to figure out what $m$ is, except more or less by random guessing. And since the curve has at least 100,000 times more points on it than the estimated number of atoms in the universe, random guessing is seen as hopelessly unlikely to succeed.[18] The difficulty of guessing $m$ from $mG$ is known as the discrete-log hardness assumption.
One interesting consequence of the difficulty of the discrete log problem is that you can use it to encode secret arithmetic. What you do is pick another random point on the curve $H$ such that no one can figure out its discrete log over $G$,[22] then you can encode a value $\alpha$ as $C=\alpha G+\beta H$, for some random $\beta$. The point $C$ is what’s known as a Pedersen commitment to the value $\alpha$, because if you’ve published $C$, the hardness of the discrete log problem makes it implausibly difficult to go back and figure out other values $\tilde{\alpha},\tilde{\beta}$ for which $C=\tilde{\alpha}G+\tilde{\beta}H$. In fact, knowing such $\tilde{\alpha},\tilde{\beta}$ would directly imply that you can compute $m$ for which $mG=H$, solving the discrete-log problem: just set $m=\frac{\alpha-\tilde{\alpha}}{\beta-\tilde{\beta}}$.
$C$ is called a commitment because I can give it to you as part of some agreement we’re entering into where I later reveal $\alpha$ and $\beta$, and we take some action based on $\alpha$ or $\alpha G$. You verify whatever $\alpha$ I reveal at that point by checking $C=\alpha G+\beta H$, and you’re confident that it’s the $\alpha$ I committed to with $C$ because it would be very hard for me to come up with alternative $\tilde{\alpha}$ and $\tilde{\beta}$ for which $C=\tilde{\alpha}G+\tilde{\beta}H$. At the same time, I can be confident that you won’t be able to determine $\alpha$ or $\alpha G$ until I reveal it.
You can do secret arithmetic with Pedersen commitments just by adding the points: If $C^\prime=\alpha^\prime G+\beta^\prime H$, then $C+C^\prime=(\alpha+\alpha^\prime)G+(\beta+\beta^\prime)H$ is a Pedersen commitment to $\alpha+\alpha^\prime$[23].
Shamir Secret Sharing
Now I’m going to lead you through the constructions we need for threshold signatures. This is mostly following Stinson and Strobl’s paper, Provably Secure Distributed Schnorr Signatures and a $(t,n)$ Threshold Scheme for Implicit Certificates.
For the foundational construct, Shamir Secret Sharing, we don’t need an elliptic curve, we just operate over $𝔽_q$.
The initial phase of Shamir Secret Sharing is not collaborative, though we’ll see later that threshold signatures use it in a collaborative protocol. There is a single trusted dealer who creates (and knows) a secret, and gives enough information about it to $n$ other entities that any $t$ of them can reconstruct it.
The key idea in Shamir Secret Sharing is that the space of $(t-1)$-degree polynomials with coefficients in $𝔽_q$ form a $t$-dimensional vector space. So the dealer represents its secret as the constant term of a polynomial $f(x)=a_0+a_1x+\ldots+a_{t-1}x^{t-1}$, with the coefficients $\{a_i\}\subset 𝔽_q$ each uniformly and independently chosen from $𝔽_q$. We’re actually going to share all of $f$’s coefficients, but only $a_0$ is used as the shared secret.[24]
Then the dealer assigns indices $1,\ldots,n$ to the recipients, in some arbitrary but fixed order. Finally, it sends the $i$th participant the values $(i, f(i))$. Because $f$ is a point in a $t$-dimensional vector space, $t$ of these evaluations are enough to reconstruct $f$. On the other hand (just ignore this sentence if you don’t know about vector spaces), $t-1$ or fewer points can only constrain $f$ to a nontrivial vector subspace of polynomials over $𝔽_q$, so they can’t reveal information about any specific coefficient (unless that coefficient is 0, which is overwhelmingly improbable.) It’s sort of like how if you take the intersection of two planes in three-dimensional space, you get an entire line, but if you take the intersection of three planes, you get a single point.
Let’s say participants $i_1,\ldots i_t$ share their values $(i_j,f(i_j))$ with each other. Then they can construct the polynomial[25]
\[\begin{equation}g(x)=\sum_{j=1}^tf(i_j)\prod_{k\in\{1,\ldots,t\}\setminus\{j\}}\frac{x-i_k}{i_j-i_k}\label{lagrange}\end{equation}\]
Here, the $\sum$ sign means “add these values, from $j=1$ up to $j=t$,” and the $\prod$ sign means “multiply these values, for $k=1$ up to $k=t$, but not including $k=j$.” If I wanted to express this in terms of python generator comprehensions, I might do it like
sum(f(i[j])*np.prod([(x-i[k])/(i[j]-i[k])
for k in range(1, t+1) if k != j])
for j in range(1, t+1))
where np.prod
is the numpy
prod
function, which computes the product of the elements in its first argument[26].
Evaluating $g$ at any $i_j$ gives, separating out the $j$ term,
$$\begin{align*}
g(i_j)&=f(i_j)\prod_{k\neq i_j}\frac{i_j-k}{i_j-k}+\sum_{k\neq i_j}f(k)\prod_{l\neq k}\frac{i_j-l}{k-l} \\
&=f(i_j)\times 1 + \sum_{k\neq i_j}f(k)\times\frac{i_j-i_j}{k-i_j}\times\prod_{l\neq k,i_j}\frac{i_j-l}{k-l} \\
&=f(i_j)+0\\ &=f(i_j),
\end{align*}$$
where for the first equality, I have taken the $j$th term out of the sum, and the for the second equality I have used the fact that $\frac{i_j-k}{i_j-k}=1$ for $k\neq j$, and taken the $j$th factor out of each remaining product.
Since $f$ and $g$ agree on $t$ points, they are the same polynomial. This trick, for constructing a degree-$d$ polynomial from its value at $d+1$ points, is known as Lagrange interpolation.
At least $t$ participants are necessary for this recovery of $f$. With fewer, you can only infer that $f$ lies on some non-trivial affine subspace (which has cardinality at least $q$, because it’s at least one-dimensional over $𝔽_q$) of the vector space of $(t-1)$-degree polynomials.
Verifiable Secret Sharing
Cryptography isn’t just about Mathematics. It’s also important to think adversarially about how participants in a protocol like Shamir Secret Sharing could misbehave for personal gain. This leads to rounds of challenges and responses, to determine who’s responsible for the collaboration going awry. Also, every message needs to be signed by its sender, so that it can be provided as evidence of their misbehavior if need be.[27]
In the above scheme, the dealer is trusted to provide consistent values $f(i)$ to each participant. If the dealer isn’t using a consistent $(t-1)$-degree polynomial to construct the values $f(i)$, different $t$-size subsets of the participants could construct different values for the secret.
Verifiable secret sharing uses Pedersen commitments to constrain what the dealer can send the participants. The dealer constructs $f(x)=a_0+a_1x+\ldots+a_{t-1}x^{t-1}$ as before, but also constructs a second random polynomial, $h(x)=b_0+b_1x+\ldots+b_{t-1}x^{t-1}$, and uses this to form public Pedersen commitments to the coefficients of $f$, the curve points $C_c=a_cG+b_cH$.
The messages from the dealer to the participants are now $(i,f(i),h(i))$. This allows each recipient to verify that the values come from the polynomials committed to by $\{C_c\}$, because it must be the case that $\sum_{c=0}^{t-1}C_ci^c=\left(\sum_{c=0}^{t-1}a_ci^c\right)G+\left(\sum_{c=0}^{t-1}b_ci^c\right)H=f(i)G+h(i)H$, which is a curve point they can calculate, knowing $f(i)$ and $h(i)$.
If a recipient, say the $i$th, can’t verify the information from the dealer, it broadcasts a complaint publishing the dealer-provided values, which is signed by the dealer. Then all the other participants can verify for themselves, using the complaint, that the dealer sent values for which $\sum_{c=0}^{t-1}C_ci^c\neq f(i)G+h(i)H$. If that happens, the whole deal is off! Similarly, if any participant broadcasts an invalid complaint (one with an incorrect dealer signature or with valid dealer values), they are marked as untrusted.[28]
Once everyone’s content that they have the correct values from the dealer, any $t$ of them can construct the dealer’s secret polynomial themselves using the same Lagrange-interpolation trick as in equation $(\ref{lagrange})$.
Pedersen/Rabin Distributed Key Generation
The goal in distributed key generation is to create a private value $m\in 𝔽_q$ which nobody knows, but any $t$ participants can work with to do public-key cryptography, with $m$ as the private key and $mG$ as the public key.
The following scheme requires that $t\leq n/2$, i.e., that more than half of the participants are honest. It’s kind of magical, in that it simulates Verifiable Secret Sharing, with a polynomial which no one knows.
The first step is $n$ simultaneous Verifiable Secret Sharings of secret polynomials $f_i(x)=a_{i,0}+a_{i,1}x+\ldots+a_{i,t-1}x^{t-1}$, $h_i(x)=b_{i,0}+b_{i,1}x+\ldots+b_{i,t-1}x^{t-1}$, with the $i$th participant acting as the dealer for $(f_i,h_i)$ to all the other participants. They all have to agree on an ordering of the participants, and share deals using that ordering, so that the $i$th particiant gets $f_j(i)$ from any other $j$th participant, not some other $f_j(k)$. Roughly speaking, the unknown polynomial they’ll be sharing is the sum of the $f_i$’s.
If there are any valid complaints about a participant’s dealing, or if a participant makes an invalid complaint against someone, they are removed from the collaboration. At the end of this, participant $i$ has $(f_j(i),h_j(i))$ from participant $j$, for all $j\neq i$. Denote by $Q$ (for “qualified”) the set of participants who get through this sharing procedure without a legitimate complaint against them.[29]
Once all these shares are complete, any $t$ participants could reconstruct any other participant’s secrets, but an implicit assumption of a $(t,n)$-threshold scheme is that at most $t-1$ participants are dishonest, so any dishonest coalition is too small to do that.
At this point, the participants have enough information to simulate sharing of the sum of all the qualified participants’ secret polynomials, $f(x)=\sum_{i\in Q}f_i(x)=$ $\sum_c(\sum_{i\in Q}a_{i,c})x^c$ and $h(x)=\sum_{i\in Q}h_i(x)=\sum_c(\sum_{i\in Q}b_{i,c})x^c$, even though none of them actually know $f$. Since participant $i\in Q$ has $(f_j(i),h_j(i))$ from every participant $j\in Q$, they can compute $f(i)=\sum_{j\in Q}f_j(i)$ and $h(i)=\sum_{j\in Q}h(i)$ [30], and verify the associated Pedersen commitments, so it’s as if some notional dealer is sharing that secret with them. We’re going to use these values to work with the distributed secret $m=\sum_ia_{i,0}$.
Now each participant $i$ reveals the “public key” part of their share of the secret, by publishing $A_{i,k}=a_{i,k}G$ (recall that $(a_{i,k})_k$ are the coefficients of $i$’s secret polynomial, $f_i,$ and in some sense $A_{i,k}$ was already shared as the first term of $C_{i,k}=a_{i,k}G+b_{i,k}H$.) Every other participant can then verify that $f_i(j)G=\sum_{k=0}^{t-1}j^kA_{i,k}$.
If $j$ reports an invalid, signed set of $A_{i,k}$s from $i$, the other participants (of which there must be at least $t-1$, for the protocol to work) resort to recovering $i$’s secret polynomial using the Lagrange interpolation trick. Because they’re able to do that, a malicious coalition can’t choose to abort the protocol or control the output in any way by choosing to publish or withhold a valid complaint against one of its coalition. This is the reason we had to initially verify the shares using the Pedersen commitments $C_{i,k}$, rather than the straight “public key” values $A_{i,k}$: It’s assumed to be impossible to infer $A_{i,k}$ from $C_{i,k}$, and we wanted to make sure everyone was committed to participating in the final construction of the secret, either by sharing properly, or by having their secrets forcibly reconstructed by the other honest parties, from their verified revelations during the Verifiable Secret Sharing phase.
The above recovery step requires that $t\leq n/2$. As long as that’s the case, there are always enough shares from honest participants to reconstruct shares from dishonest participants who withhold the correct $A_{i,k}$s.
Finally, now everyone knows the “public key” for the constant coefficient of any $i\in Q$, namely $A_{i,0}=a_{i,0}G$. The associated “private key” is just that coefficient, $a_{i,0}$. The distributed secret is then just the sum of everyone’s constant coefficients, $z=\sum_{i\in Q}a_{i,0}$, and the associated public key (which we will be using in the threshold signature scheme) is the sum of everyone’s public keys, $\sum_{i\in Q}A_{i,0}=\sum_{i\in Q}a_{i,0}G=(\sum_{i\in Q}a_{i,0})G=zG$.
It’s worth noting that if the protocol is forced to reconstruct $i$’s secrets because $i$’s published $A_{i,k}$ values were invalid, then from that point only $t-1$ other participants are necessary to work with the distributed secret $z$. But the important thing to keep in mind is the implicit assumption of this threshold scheme is that at most $t-1$ participants are dishonest. If $i$’s public $A_{i,k}$ values are wrong, that proves $i$ is dishonest, so based on that assumption there can be at most $t-2$ other dishonest parties.
Threshold Signatures
We can do threshold signatures, now that we have a way to construct a $(t,n)$-distributed secret $z$ and the associated public key $zG$ from participant-level portions of the keys, $\{(z_i,z_iG)\}_{i\in Q}$. First, the participants construct such a public/private keypair, which they will use to sign messages, $(z, zG)$. Then for each message $M$ they want to sign, they’ll run the distributed key generation process again to construct a nonce $k=\sum_{i\in Q^\prime}k_i$ (where $k_i$ is $i$’s part of the new secret), and associated public key $K=\sum_{i\in Q^\prime}K_i$, where $K_i=k_iG$. Each participant $i$ signs the message with their portion of $z$, using the Schnorr signature scheme:
$$e=H(K‖M), s_i=k_i-z_ie,$$
where $H$ is an appropriate hash function, and the arithmetic is in $𝔽_q$. Finally, they add up their signatures to obtain $s=\sum_{i\in Q^\prime}s_i=\sum_{i\in Q^\prime}k_i-z_ie=k-ze$. The pair $(s,e)$ is a signature on M using the public/private key pair $(z, zG)$ and the nonce $k$. Anyone can verify that the signature matches the public key $zG$ by checking that $H(sG+e(zG)‖M)=e$, which holds for a correct signature because $sG+e(zG)=(s+ez)G=kG=K$.
The linearity of the Schnorr-signature formula, $s=k-ze$, plays a critical role, here. Ethereum took its signature scheme from Bitcoin, which uses ECDSA signatures, not Schnorr, and while signing in ECDSA involves a somewhat similar formula, $s=(H(M)+K_xz)/k$[31] in $𝔽_q$, the non-linearity in $k$ means one can’t just add up the participants’ individual signatures like we can in Schnorr.
Cheap On-chain Verification of Schnorr Signatures
Schnorr signatures aren’t that expensive to validate on-chain. There’s a public implementation which takes about 85k gas, by using the Ethereum precompiled contracts for the pairing curve AltBN-128. Still, in the optimistic case where Chainlink and Ethereum become very popular, we’re going to want to reduce that cost as much as possible in order to allow more data to be reported on-chain.
With a small adjustment to the Schnorr signature formula, we are able to reduce the cost of on-chain verification to a mere 15k gas[32], using Vitalik Buterin’s ecrecover trick for abusing Ethereum’s authentication infrastructure to verify elliptic curve arithmetic. Ethereum has a precompiled contract, accessible through solidity via the ecrecover
function, which for 3,000 gas will return the Ethereum address[33] for the public key which signed a message. When you compute ecrecover(z, v, r, s)
, you’re effectively solving for $P$ in[34]
$$(s^{-1}zG+s^{-1}rP)=R,$$
where $R$ is the point specified by the $x$ ordinate $r$ and the parity of the $y$ ordinate from $v$, and all scalar arithmetic is in $𝔽_q$. It follows that
$$P=(r^{-1}s)R+(-r^{-1}z)G.$$
On the other hand, to verify a Schnorr signature, we compute $K=sG+eP$, where $P$ is the public key, and then check that $e=H(K‖M)$. If we instead use $e=H(\mathrm{address}(K)‖M)$ in the signing and verification procedure, we can use ecrecover
directly, to check that $\mathrm{address}(K)$ is ecrecover(
$-P_xs$,$P_v$,$P_x$,$P_xe$)
$=\mathrm{address}(sG+eP)$, where $P_x$ is the $x$ ordinate of the public key, $P_v$ is 27 if $P$’s $y$ ordinate is even and 28 if it’s odd, and $e=H(\mathrm{address}(K)‖M)$. As argued here, we could even truncate $e$ to 128 bits, to maintain the 128-bit security afforded by secp256k1[35]. Due to verifying a hash of the curve point rather than the curve point itself, some adjustments to the proof in that paper are necessary, and I might discuss them in a future blog post.
Implementation
The excellent cryptography library kyber (which is a project of the academic EPFL DEDIS group, and to the best of my knowledge has no connection to the Kyber Network[36]) already has an implementation of the threshold signature scheme I’ve described above. We’ve written a wrapper of btcd’s secp256k1 arithmetic which is compatible with the kyber Group
abstraction, a modification to kyber’s threshold-signature construction, and also a smart contract which implements verification in the above signature scheme, along with an end-to-end test of the two systems. Once we’ve audited this implementation, we plan to set up a peer-to-peer network between oracles, through which they can negotiate and construct threshold signatures attesting to real-world data.
We expect this will be a major advance for the integrity of smart contracts’ connections to the real world, because it enables cheap on-chain verification of essentially arbitrary off-chain interactions between participating oracles. If you’re interested in this kind of thing, we’re hiring.
Acknowledgments
Thanks to Chris Blake, Steve Ellis, Ari Juels, Dimitri Roche, and Holly White for feedback and suggestions on this post, and to Sergey Nazarov for suggesting I write it in the first place.
- Yes, all we can do with this framework is offer financial rewards, which from an organizational perspective is a very restrictive incentive structure. Lots of simple collaborations could benefit from easily enforced financial rewards, though. On the other hand, if you know of a blockchain which enables rewards in terms of deep personal connections, meaningful achievements, and warm, fuzzy feelings, please let us know, so we can integrate Chainlink with it. 😊 ↩︎
- While regulations and other legal protections are essential to reducing the level of trust we need to work together, for many potential collaborations they’re too expensive, slow, and uncertain to mitigate the risk of exploitation. ↩︎
- That particular contract contained a mistake and concerned a problem which is considered to be solved, but overall it’s a nice idea! ↩︎
- Historical daily average ETH/USD prices and gas prices from Etherscan. Estimate made by taking the product of the two values for each day[5]. Assumption of 43k gas for the transaction is based on a separate transaction for each participant (21k base gas cost), a
uint256
input (2k gas) and storage of their input (20k gas.) There are possible optimizations to this which I haven’t considered carefully, because it’s not the main thrust of this post: You could probably reduce that cost to about 9k gas per participant by aggregating multiple confirmations into a single transaction (uint256
for each report, twouint256
s for each signature, 3k each to verify the signatures) but it would still be expensive! ↩︎ - This is statistically unsound, but will tend to underestimate the maximum implied USD gas price. ↩︎
- It’s also possible to do threshold-ECDSA signatures, and Keep Network is implementing this for Ethereum. Threshold ECDSA signatures are intriguing, because you can use them to directly sign an Ethereum transaction, and Ethereum’s machinery will validate the signature for you as it would in any other transaction. So that’s a big gas savings when everyone in the protocol is behaving as expected. Working around the non-linearity in the ECDSA signature formula involves a lot of complex homomorphic encryption with very large numbers (the square of a secure RSA modulus), which would make on-chain accountability for incorrect behavior during the secret-sharing procedure much more expensive and harder to express in an Ethereum contract. At Chainlink we have decided to work with Schnorr signatures because of their simplicity, and because their linear structure allows greater flexibility in our subsequent development plans, which I might talk about in a later blog post. ↩︎
- It’s also based on regular Ethereum secp256k1 elliptic-curve arithmetic, which puts us at far less risk that we’ll be unpleasantly surprised by a cryptographic breakthrough which compromises the scheme. Pairing-based BLS signatures are attractive, but the cryptographic security of Ethereum’s pairing operations could be less than 110 bits (“Thus, a conservative estimate of the security level of BN curves with a 256-bit prime p is 110 bits.”) ↩︎
- An extreme example:One striking characteristic of Grothendieck’s mode of thinking is that it seemed to rely so little on examples. This can be seen in the legend of the so-called “Grothendieck prime”. In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take 57.”But Grothendieck must have known that 57 is not prime, right? Absolutely not, said David Mumford of Brown University. “He doesn’t think concretely.” ↩︎
- It does bear some resemblance to multiplication in the field of rational functions on the curve, though, if you take the right perspective. ↩︎
- This point $G$ is called a generator. Not every curve has a generator, but the one Ethereum uses does. On curves where there’s no generator, it’s usually still possible to do cryptography by restricting operations to the set $\{G, G+G=2G, G+2G=3G, G+3G=4G,…\}$ for some $G$, even though that’s a strict subset of the curve. ↩︎
- The actual value of $q$, for secp256k1, say, is $2^{256}-432420386565659656852420866394968145599$. The sizes of various cryptographic elliptic-curve groups are shown in the second table here. It’s useful for $q$ to be just a little bit less than $2^{256}$ in the case of secp256k1 because that wastes hardly any space in a 32-byte representation during network transmission. It also makes it easy to uniformly sample a random number from $\{0,\ldots q-1\}$, just by sampling 32 random bytes, and throwing the result away to try again, if it represents a number greater than $q$ (which will happen about once every $2^{128}$ samples, if you’re correctly drawing uniform random samples.) ↩︎
- It is actually feasible to compute $mG$, even if $m$ is incomprehensibly large, using the double-and-add algorithm. Roughly speaking, you compute $2G=G+G,4G=2G+2G,8G=4G+4G,\ldots,2^{255}G=(2^{254}G)+(2^{254}G)$. Then you use the binary representation of $m=\sum_{p=0}^{255}b_p2^p$, with $b_p\in\{0,1\}$, to compute $mG=\left(\sum_{p=0}^{255}b_p2^p\right)G=\sum_{p=0}^{255}b_p(2^pG)$. It’s a bit tricky to code this correctly, because if you do it in the most efficient way, the time it takes to run the algorithm can leak information about $m$, and $m$ is supposed to be secret. It’s important to make sure exactly the same computational operations (though with appropriate inputs, of course) take place at each step, whether $b_p$ is 0 or 1. ↩︎
- If you happen to read Andrea Corbellini’s excellent introduction to elliptic curves, it’s worth noting that $𝔽_q$ is a different structure than the $𝔽_p$ he uses there. $𝔽_p$ is the base field for the coordinate system in which the elliptic curve is defined,[21] while $𝔽_q$ represents the additive structure on the curve, with addition in $𝔽_q$ corresponding to point addition, multiplication of a point $P$ by $m\in 𝔽_q$ corresponding to adding a point to itself $m$ times, and $1\in 𝔽_q$ representing the generator, $G$. Both $𝔽_q$ and $𝔽_p$ use “clock arithmetic”, but the number of ticks on their clocks are different. ($𝔽_p$ has $2^{256} – 2^{32} – 2^9 – 2^8 – 2^7 – 2^6 – 2^4 – 1.$) ↩︎
- I.e., we use Euclidean division to find $(d,r)$ such that $a+b=q\times d + r$, where $0\le r<q$. This mostly matters for computing the negation $-a$ in $𝔽_q$, because it gives us back an element of $\{0,\ldots,q-1\}$, namely $q-a$. ↩︎
- This can be computed in approximately $\log_2 q$ steps by running the Extended Euclidean Algorithm on $q$ and $a$. The inverse will be the final Bézout coefficient of $a$, since $\gcd(q, a)=1$, because $q$ is prime and $a<q$. ↩︎
- Specifically, every point on an elliptic curve is of the form $(x,y)\in(𝔽_p)^2$, with $y^2\equiv x^3+1\mod p$. You can easily represent such a point in 257 bits: The $x$ ordinate, and the parity of the $y$ ordinate, which can be computed up to a sign change from $\sqrt{x^3+7}\mod p$. ↩︎
- For example, the $(x,y)$ coordinates for the secp256k1 generator $G$ are
(0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798, 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8)
, while for $2^{128}\times G$ they are(0x8f68b9d2f63b5f339239c1ad981f162ee88c5678723ea3351b7b444c9ec4c0da, 0x662a9f2dba063986de1d90c2b6be215dbbea2cfe95510bfdf23cbf79501fff82)
. Both sets of coordinates fit in 64 bytes, even though the second is equivalent to adding the first to itself about ten times for every atom in every human on the planet, and there’s nothing obvious (or, we hope, obscure) in these coordinates which would lead you to infer $2^{128}$ as the multiplier. ↩︎ - You can do substantially better than naive random guessing, though, using the baby-step, giant step algorithm[19], which reduces the number of guesses you have to make down to about the square root of the number of curve points, at the cost of about the same amount of storage. That’s still completely infeasible as far as anyone knows, though $2^{128}$ is about three quintillion times less than the number of atoms in the sun, so maybe you could crack a contemporary elliptic-curve cryptosystem if you turned the sun into a big special-purpose computer.[20] I think the hard part then would be comparing the guesses to the stored values.
There’s also the possibility that quantum computing will allow us to solve discrete-log problems with very moderate computational resources. I think that’s actually even less likely than a sun-computer, for the forseeable future, but I’m not a physicist. ↩︎ - Note that they’re using multiplicative notation on that page, for the operation we’re representing here as addition. ↩︎
- Which would be an imprudent use of critical resources, in my opinion, though it seems there are mildly serious people who are kind of excited by the idea. Making a computer out of Jupiter might trigger a smaller environmental outcry, but then you’d have about a thousand fewer atoms per guess/unit of storage. ↩︎
- i.e., the affine representation of secp256k1, for instance, is $\{(x,y)\in (𝔽_p)^2 | y^2=x^3+7\}$ ↩︎
- i.e., we all know from the general theory that there’s some $m$ such that $H=mG$, but no one can figure out specifically what $m$ is ↩︎
- For a detailed introduction to secret Pedersen arithmetic, see the primer for the Grin cryptocurrency, which uses them to obscure all balances in its transaction ledger. We don’t need all of that machinery here, though. ↩︎
- If you tried to use more coefficients to get more shared entropy, you would run into issues with smaller coalitions being able to infer linear relationships between them. Also, it’s important that all the coefficients are chosen cryptographically-securely randomly, including $a_0$. If you want to share actual information, you use $a_0$ to encrypt it, and send the encryption separately. ↩︎
- The number off to the far right of the following equation is just a reference label so I can easily call back to it later in the post. ↩︎
- I’ve encountered quite a few people who are perfectly capable of understanding this protocol, but are unaware of the mathematical notation I’m using here. This explanation is for them. ↩︎
- Regular signatures like Ethereum uses are adequate for that purpose; we’re not building a circular construction of threshold signatures here! ↩︎
- This is slightly different from the VSS protocol described in Stinson & Strobl’s paper, which does not rely on the personal signatures of the participants. There, when participant $i$ broadcasts a complaint, the dealer has the opportunity to broadcast the correct $(f(i),h(i))$ values in the clear. At the same time, if there are more than $t$ complaints it mandates that the deal is off. This is due to the implicit assumption in a $(t,n)$ threshold scheme that at most $t-1$ participants are dishonest, which implies, if there are $t$ complainants, that there was at least one honest complaint, and the dealer sent the wrong values in the first place, even if they subsequently published the correct values in response to the complaint. Requiring complaints to contain the dealer signature avoids all this hairy logic, because if it’s signed by the dealer, you know with overwhelmingly high probability that it’s come from them. ↩︎
- Pedersen initially proposed a secret-sharing scheme without Pedersen commitments, but a group including Rabin pointed out that this scheme allows a small attacking coalition to influence the distribution of the resulting public key, by choosing which of its members will be removed due to valid complaints from other members. To work around this issue, Rabin et al. proposed the initial Pedersen commitments to the polynomial coefficients. ↩︎
- Note that these are sums in $𝔽_q$, so you compute them by regular arithmetic, followed by taking the Euclidean remainder $\mod q$. ↩︎
- Here $K_x$ stands for the x ordinate of $K=zG$, in affine coordinates, and $H$ is some standard hash function with the usual assumed properties like preimage- and collision-resistance. ↩︎
- The value in that test is 37,000, gas because the base transaction cost is 21,000 gas. In my hands, the actual value comes out to be around 15,300 gas. It’s also worth noting that if the public key is already on-chain, you can save approximately 2,000 gas, which would bring the cost down to about 13,000, or less if you have data you can pack in with the
nonceAddress
. ↩︎ - Given a public key $P=(x,y)\in\mathrm{secp256k1}$, its ethereum address is the lower 160 bits of $\mathrm{keccack256}(x‖y)$, where $x$ and $y$ are represented as big-endian binary strings. ↩︎
- This is the equation checked in standard ECDSA verification (see step 5 at that link.) ↩︎
- Thanks to Ari Juels for pointing out this paper. ↩︎
- Emphasis because everyone seems to make this association at first. Even I did. ↩︎