Blockchain Analysis Versus CashShuffle: Tracking the 300 BCH donation to Bitcoin Unlimited

2019-06-22T23:30:29.000Z Honest Cash

The Bitcoin Cash community is raising funds for node software development. On 17 June, the donation counter for Bitcoin Unlimited suddenly jumped from 5 BCH to 305 BCH. The 300 BCH came from the single transaction a0f28991dfe01bf7b56e38ee6c61b49f593c2f3aa4b548379276a56bc1c95740, whose 30 inputs were all shuffled with the privacy-enhancing CashShuffle technology.

Combining multiple shuffled inputs is a known limitation of CashShuffle. So is this donation truly anonymous, or can we figure out the source of the funds? Read on to find out.

The mathematical rules

To start our blockchain analysis, we first need to specify a mathematical model. To break CashShuffle, we need to link outputs of CashShuffle transactions to inputs of the same transaction, which in turn are outputs of other transactions. We start at the donation to Bitcoin Unlimited and work our way back. Doing so, we encounter various types of transaction outputs:

• The donation output: For the final transaction that makes the donation to Bitcoin Unlimited, all 30 transaction inputs are certainly linked to this destination.

• CashShuffle shuffled output: Without further knowledge, a shuffled output of a CashShuffle transaction (output index 0 through 4) is linked with 20% probability to each of the five inputs of the transaction.

• CashShuffle change output: A change output of a CashShuffle transaction (output index 5 or higher) is linked to the inputs that have amounts greater than but closest to the amount of the output. (The difference is the contribution to the transaction fee.) Almost always this links to one input with 100% probability, except when some inputs have equal amounts.

• Output of single-input transaction: Any output of a regular transaction with a single input is certainly linked to that input.

• Other transaction outputs: For any transaction output that does not fall within any of the above categories, the transaction itself is considered to be the ultimate origin of the funds.

With these rules we can work out the possible origins of the donation. Using the probabilities, we can compute for each origin what the expected number of paths to the destination is, and the expected amount contributed to the destination. Summed over all possible origins, these will add up to 30 paths and 300.73989064 BCH, the sum of the input amounts in the donation transaction.

After processing 3704 transactions, this leads to 296 possible origins, with 4798 possible links between transaction outputs. Ordered by the expected number of paths, the top 15 is as follows:

Top 15 possible origins

The elimination of origins

We can do better than just listing 296 possible origins. Repeatedly we can eliminate the least likely origin from the list (with the least expected number of paths). This will increase the likelihood of the remaining origins. For example, if one input of a CashShuffle transaction is indirectly eliminated, the conditional probabilities of the remaining four inputs linking to a particular shuffled output increase from 20% to 25%. If removing the least likely possible origin leaves one of the 30 inputs without source, we try removing the next possible origin on the list instead. The process is repeated until only one possible origin is left, or all of the remaining possible origins are necessary to provide the funds.

The former result is what we get: one origin is left, with 124 possible links:

Our main scenario has one origin

It is possible to develop alternative scenarios by manually eliminating this origin at the start of the process. This yields a more complex scenario with three origins, with 616 possible links:

An alternative scenario with three origins

This could be of interest if one can somehow establish a connection between these three origins. However in this case our original scenario with one origin turns out to be very plausible.

The directed acyclic graph

We now proceed to visualise the relations we discovered between transaction outputs. This figure shows the relations between blocks (identified by height), transactions, and transaction outputs (identified by output index):

Our initial directed acyclic graph

CashShuffle shuffled outputs are drawn in green, CashShuffle change outputs in orange, and all non-CashShuffle outputs in blue. While the origin of all funds is certain, some intermediate links are still uncertain; the outputs and links with an expected number of paths smaller than one are dashed. Upon close manual inspection, it is revealed that CashShuffle transactions 116457c28315ce71a9be07c5d6947d9423e81cc9bf8cdc144ee5287faedbacc1 (block 585273) and 608dc7df8df14b800ffc6ddb149f875f00a5fc36fa99bf8aa3814ae26063e72d (block 585283) have one certain shuffled output and one uncertain shuffled output linking to the same input. Since the input can only link to one shuffled output, the link to the uncertain output cannot be true. Elimination of these impossible links results in the following final graph:

Our final directed acyclic graph

This is the final result of our blockchain analysis. It has no more uncertain links. We can clearly see all funds were shuffled exactly once. There were also some manual transactions before and during the shuffling, using funds that had not yet started shuffling.

Conclusion

The last point before the donation where all funds were together, was transaction e305977f93e5688ca5bb550407e03cc1e2a876400fe1957c4323593cf58421b0, output 0. This output contained 1,000 BCH in total. It belongs to single-use address qpfxjqkvsj9svzysdgg0pp3yjchydxh6zyvm4n4lw7. It is preceded by a series of single-input transactions using single-use addresses. This continues until our final origin transaction 73fa426d0677915c63415f3726ca7af183d258ecb94578f4feb4902ae469d11e, all of whose inputs originate from address pq9j5y0v8ssxdnzu24kxnaddxffy8kzeuutwp2cprn. This address is currently number 22 on the Bitcoin Cash rich list, holding about 51,786 BCH, or 0.3% of the circulating supply.

The (direct or indirect) origin of the Bitcoin Unlimited donation is hereby identified as Hodler XXII. Mission accomplished.

To get better CashShuffle privacy yourself, you can avoid combining inputs into a single transaction. In this case, you could make 30 separate donations with one input, instead of one large donation with 30 inputs. When depositing into exchanges, you can similarly make separate deposits instead of one large deposit, as long as you make sure to generate a new receiving address for each deposit. For a long-term user-friendly solution, we will likely need some extension of the CashShuffle concept to allow private consolidation of multiple inputs, for example CashFusion.

We thank the Blockchair API for providing the transaction data for this research.

Links

https://www.bitcoin.com/fundraise/

https://old.reddit.com/r/btc/comments/c1ncgt/somemassivedonationsofbchjustcameinto/erec3h4/

https://blockchair.com/bitcoin-cash/transaction/a0f28991dfe01bf7b56e38ee6c61b49f593c2f3aa4b548379276a56bc1c95740

https://cashshuffle.com/

https://explorer.viabtc.com/bch/address/pq9j5y0v8ssxdnzu24kxnaddxffy8kzeuutwp2cprn?page=1

https://github.com/cashshuffle/spec/blob/master/CASHFUSION.md

Responses


RE: Blockchain Analysis Versus CashShuffle: Tracking the 300 BCH donation to Bitcoin Unlimited

by @Ouijamonopoly

Great


RE: Blockchain Analysis Versus CashShuffle: Tracking the 300 BCH donation to Bitcoin Unlimited

by @im_uname

Great analysis! It would also be useful to analyze the probability of the one-origin hypothesis versus the three-origin hypothesis (the existence of the latter makes the former less certain, so it'll be interesting to know what kind of probability can be assigned to each).


RE: RE: Blockchain Analysis Versus CashShuffle: Tracking the 300 BCH donation to Bitcoin Unlimited

by @BigBlockIfTrue

The method currently relies on a manual plausibility assessment of any generated scenario (i.e. by inspecting the DAG). The model does not compute p-values for hypothesis testing, it only computes probabilities of origins conditional on the exclusion of other origins. Also you want to keep an eye on things not (yet) considered by the model, such as addresses and other transaction types (e.g. I decided to add the processing of single-input transactions to discover more links between origins). Surely this can be further developed and extended with more traditional blockchain analysis techniques, I just put this together in a week.