11/03/2011

Tokenization IS Encryption – NOT! – Part 2

This is the second of a three-part series written by Steve Sommers, Shift4’s SVP of Applications Development. The first section can be found here. The final installment will be published later in the week.

PCI SSC Tokenization May or May Not Be Encryption – Consult Your QSA
In late 2005, Shift4 released tokenization to the public domain. Through the years, tokenization steadily gained in popularity. By 2010, tokenization was all the rage and every vendor needed to fill that checkbox on their datasheet. The problem was, while many vendors proudly checked the tokenization checkbox, they did not understand what tokenization was. As a result, many Tokenization In Name Only (TINO) solutions were born. In late 2010, PCI SSC decided to finally recognize tokenization as a way to possibly secure CHD, so they embarked on creating guidelines to clear the waters. In mid-2011 PCI SSC’s Tokenization Guidelines document was born, lending underserved credibility to TINO solutions everywhere.

PCI SSC Tokenization and Krikken’s Post
Now, the hard part. I am a strong advocate for TrueTokenization, (which we formerly referred to as tokenization until PCI SSC bastardized the term). But here I will go through Krikken’s post point-by-point and validate each based on PCI SSC Tokenization Guideline. With this, you can compare TrueTokenization with the PCI SSC tokenization. Part of me feels like I am slaughtering my baby doing this analysis, but people need to know…

“[encryption-based tokenization is] where the token is mathematically derived from the original PAN through the use of an encryption algorithm and cryptographic key”

“Where token generation is based on the use of cryptographic keys, compromise of the keys could result in the compromise of all current and future tokens generated with those keys. “

Unfortunately, this is very true. PCI SSC allows for encrypted data to be used as tokens.

If the tokenization server is a black box, an external observer cannot tell the difference between random and encrypted tokens; strong encryption exactly creates cipher text that is indistinguishable from random.

Now, I understand the meaning of this statement. Since a token comprised of random data and a token comprised of encrypted data or hashed data can look the same, the merchant (or QSA for that matter) does not know how dangerous the token really is because the two types represent very different risks – the former virtually none, while the latter carries a risk of unauthorized decryption and therefore should be protected. Now, with my PCI hat on, all fruit does appear to be oranges.

If the tokenization server is a black box, an attacker must compromise it to gain access – randomization or not – and will gain similar access to keys or token tables.

I still agree with this statement, provided that the “black box” has merchant-accessible de-tokenization capabilities.

If the critical factor in limiting the use of encryption is to prevent compromise of future tokens, the black box model allows key rotation to limit future token compromise.

Still confused with this one, sorry.

Even when randomization is used, pre-generation of token pairs [for performance] may result in the compromise of future tokens.

Since this point specifically mentions randomization, the prior argument stands: how and why? Random tokens, by definition, would not be mathematically related, so there is no risk of decryption. Again, the same example: if I told you that the next three tokens I assign are going to be 4, 5, and 6, how does that expose the next three card numbers you give me to tokenize?

The only possible weakness here would be the same that I previously mentioned: a merchant- accessible de-tokenization layer.

The life time of a credit card number is short enough to account for dealing with breakthroughs in cryptanalysis.

Again, still confused. The price of tea in China has not changed.

Some so-called non-encryption-based token algorithms [because of the strict way in which the guidance defines what is encryption] have plenty of vulnerabilities to put them on par with encryption-based tokens.

If the token is mathematically derived from the PAN or CHD, very true. And, back to the formerly confused “external observer” point: since you don’t know for certain whether the token is encrypted data (possibly poorly based on this point), or a TrueToken, or anything in between, you have to closely analyze the token algorithm (or your QSA will need to) or else protect the data as if it were encrypted CHD.

Even random tokens have problems if they’re not designed right.

Back to random tokens. Implying that the token is not mathematically derived from the PAN, random tokens are just as strong as with TrueTokens, with the only caveat being whether or not there are merchant accessible de-tokenization capabilities.
And, most importantly:

Any tokenization creates a persistent mapping between tokens and originals. Even in the form of a token database, this is at the core a form of encryption.

The false argument that all oranges are fruit therefore all fruit are oranges still applies.

In the final post of this series, I’ll explain how the PCI SSC’s version of tokenization fell so far from the tree, and I’ll point out some of the consequences that we may face if we let it become any sort of an industry standard. Again, your comments are welcomed on what we’ve covered thus far.