Blocks in a blockchain are generally made up of a header and a set of validated transactions that spend coin or otherwise change the state of the system. Relaying blocks across the peer-to-peer (P2P) network using the least amount of bandwidth and latency has a number of advantages for the operation of any cryptocurrency. Blocks that can be relayed using less bandwidth propagate more quickly, which can increase synchronization among peers and reduce forks in the chain. Using less bandwidth to relay a block also allows greater participation by peers who are behind limited-bandwidth links and routes. Finally, an efficient mechanism for relaying blocks can allow maximum block size to increase, sustaining a larger number of transactions per second overall.
As a base comparison, consider that one option for relaying a block is to send the first 8 bytes of each transaction ID. If our block contains n=2000 transactions, then the total cost is 16,000 bytes. Let's see if we can do better
Bloom filters are an incredibly useful probabilistic data structure for determining whether items are members of a given set. In this case, our set is all transactions in the sender's block (actually, the set of transaction IDs); and the items are the transactions IDs in the receiver's mempool. A Bloom filter has two special characteristics. First, it has no false negatives. That is, if a Bloom filter tells us that a transaction ID is not in the set, then it definitely is not in the set. Second, a Bloom filters does have false positives. That is, if a Bloom filter tells us that a transaction ID is in the set, then it might not be in the set. Bloom filters are useful because we can set the false positive rate (FPR) to whatever we like. There is an important trade-off though: if the FPR is low, and the Bloom filter is therefore accurate, then it will also be larger in terms of bytes. If we don't mind some wrong answers about what transaction IDs are in the block, then the Bloom filter will be smaller.
What if we were to relay blocks using only Bloom filters? For example, we could set the FPR of the Bloom Filter to f=1/m. In that case, when the receiverchecks each of the m transaction IDs in her mempoool, we can expect that the Bloom filter will wrongly state that f*m=(1/m)*m=1 transaction is in the block on average. To make matters worse, we won't know which transaction is the wrong one; as a result of the extra transaction, the Merkle root won't validate. We can try to fix this problem by lowering the FPR of the filter. For example, if we set the FPR to f=1/(144m), then we can expect that the filter will permit a wrong answer only about once every 144 blocks relayed (i.e., only about once a day). But keep in mind, this accuracy will cost us in bytes. The size of a Bloom filter with n items inserted and a false positive rate of f=1/144m is well known to be -n*ln(f)/ln2(2) = n*ln(1/(144m))/(8ln2(2)) bytes. For example, for a block with n=2000 transactions and a mempool of m=6000 transactions total, the Bloom filter will be about 7,113 bytes.
Invertible Bloom Lookup Tables
IBLTs are another useful probabilistic data structure. They are designed to allow us to discover the symmetric difference between two sets of items. For example, we can create an IBLT of all transactions IDs that are in the sender's block, and then create another IBLT of the transactions in the receiver's mempool. A subtraction  of the first IBLT from the second will tell us exactly which transactions in the mempool are not in the block. Given this functionality, it's tempting to use IBLTs alone to relay the block from sender to receiver, but unfortunately it is not an efficient approach. The size in bytes of an IBLT increases linearly with the size of the symmetric difference recovered from it. An IBLT uses about 12 bytes per transaction ID that is part of the symmetric difference, and the overhead of an IBLT (in bytes) is about 140%. And so if the mempool is 2000 transactions larger than the block (i.e., the symmetric difference is 2000), then the sender's IBLT will be about (1.4*2000)*12=33,840 bytes. Not a great choice.
The best solution is a combination of both data structures, which is Graphene's solution. First, we pass all transactions in the receiver's mempool through a Bloom filter of the sender's block; however, we allow a good number of false positives, which results in a small Bloom filter. We clean up any mistakes made by the Bloom filter with an IBLT also sent by the sender. The symmetric difference is now quite small: it's equal to number of false positives that were produced by our Bloom filter. There is a trade-off: we can make the Bloom filter larger (more accurate), which results in a smaller IBLT; or we can make the IBLT larger (able to correct more mistakes), which results in a smaller Bloom filter. Graphene picks the parameters of both data structures together so that the summed size is optimally small. For example, for n=2000 and m=6000, a sender computes that an IBLT that can recover a=27 items and a Bloom filter of n items with a FPR of f=0.00675 is minimal. In our test implementation, this results in 3,316 byte total based on a 715-byte IBLT and a 2601-byte Bloom filter, which is about 1/5 the size of sending 8-bytes per transaction. If a canonical transaction order is not defined, an expression of the transaction ordering must also be sent, which increases the total by 2,750 bytes to 6,066 bytes (i.e., 38% of the cost of sending 8-bytes per transaction). The IBLT will fail to decode about once every 240 blocks.
Graphene maintains this size advantage as block size grows. For example, for a block of n=10,000 transactions, listing 8-bytes of each transaction ID would be 80,000 bytes. With a mempool of m=30,000 transactions, Graphene's cost is 14,482 bytes (or 31,091 bytes without a canonical transaction ordering).
Comparison to related approaches
A simple comparison of Graphene to related work is as follows. A block holds n=2000 transactions, which the receiver holds in its mempool along with 4000 other transactions; in other words m=2000+4000=6000.
Using Graphene, the sender sends an INV, and the receiver responds with a GETDATA and the value of m. The sender computes that an IBLT that can recover a=27 items and a Bloom filter of n items with a FPR of f=a/(m-n)=27/(6000-2000)=0.00675 is minimal. In our test implementation, this results in 3,316 byte total based a 715-byte IBLT and a 2601-byte Bloom filter. Without a canonical transaction order, an expression of the transaction ordering must also be sent, increasing the total by 2,750 bytes to 6,066 bytes. (The receiver's IBLT is not sent over the network.)
Tschipper's XTreme Thin Blocks has the receiver start by sending a 3956-byte Bloom Filter of the mempool with an FPR of f=1/m=1/2000=0.0005 and 8-bytes for each of the n=2000 transactions. The total is therefore 3956+8*2000= 19,956.
Corallo's Compact Blocks would send over just the 8-bytes for each of the n=2000 transactions, for a total of 8*2000= 16,000.
Graphene grows linearly with block size, but very slowly compared to the other two methods. As the example shows, Graphene would benefit significantly from a standardized, canonical ordering of transactions in blocks.
Are 8 bytes sufficient to avoid collison?
An 8-byte (64-bit) transaction ID results in a very low probability of collision. Let b be the size in bits of the shortened transaction IDs. The probability of collision in this "Birthday Attack" scenario is well-known to be approximated by 1-exp(-m(m-1)/2**(b+1)). For example, for a mempool of m=10,000,000 transactions, the probability of collision using b=64 bits is approximately 0.00000271050.