11/01/2011

Tokenization IS Encryption – NOT!

This is the first of a three-part series written by Steve Sommers, Shift4’s SVP of Applications Development. Additional sections will be published later in the week.
There is still much confusion about tokenization. Recently, I found a blog post by Ramon Krikken entitled I’ll go ahead and say it: Tokenization IS Encryption. The author also presented a related slide show at RSA Conference 2011 – DAS-303 – Towards Secure Tokenization Algorithms and Architectures.pdf. When I read the post and viewed the slide show, the first thing that came to mind was that this guy does not have a clue what tokenization is. But, over the years, I have learned that often it is better to sit back and think about the topic before ripping into someone. So, I did.

In this case, my initial judgment of cluelessness was based on my intimate knowledge of the original definition of tokenization – not the abomination created by PCI SSC’s redefinition, but the original, true, definition of tokenization laid out by Shift4 when we released the technology back in 2005. Today, having lost control of the term “tokenization” to the public domain, we refer to this original definition as TrueTokenization®. TrueTokenization is a registered trademark of Shift4 Corporation and we did this not to control the technology, but to maintain control of the term and make sure anyone using the term upholds the original “true” definition. Hindsight being 20/20, we should have done this back in 2005 to avoid the entire redefinition issue. Live and learn!

So, now having taken a few days to think, I’ve decided to present two arguments here. The first, presented in this post, is based on TrueTokenization, the original definition of tokenization. The second, to be posted later in the week, is based on the PCI SSC definition of tokenization. I think that you will see the stark differences between the two.

It is important to note that both TrueTokenization and PCI SSC tokenization allow for merchant-accessible de-tokenization capabilities. Solutions containing de-tokenization capabilities have a much greater risk profile. In other words, more places for a potential compromise. Shift4’s DOLLARS ON THE NET® TrueTokenization does NOT require, nor does it have, merchant-accessible de-tokenization capabilities. De-tokenization is all but a requirement for in-house solutions, thus one of the reasons Shift4 recommends outsourcing to reduce scope and risk.

TrueTokenization IS NOT Encryption
Referencing the 2008 Tokenization in Depth document released by Shift4, the definition of tokenization is: “to•ken•i•za•tion [toh-kuh n-hyz-ey-shuh n] – the concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In PCI context, tokens are used to reference cardholder data that is stored in a separate database, application, or offsite secure facility.”

While the definition simply states “non-decryptable,” further articles and various posts from Shift4 specifically state that true tokens are not mathematically derived from the Primary Account Number (PAN) and/or Cardholder Data (CHD) that they are protecting.

Now let’s look at the definition of encryption as found on Reference.com: “Process of disguising information as ‘ciphertext,’ or data that will be unintelligible to an unauthorized person…” While this portion of the definition seems to support Krikken’s argument that tokenization is encryption, there is more to the definition: “In cryptography, encryption is the process of transforming information (referred to as plaintext) using an algorithm (called cipher) to make it unreadable to anyone except those possessing special knowledge usually referred to as a key. The result of the process is encrypted information (in cryptography, referred to as ciphertext). In many contexts, the word encryption also implicitly refers to the reverse process, decryption (e.g. “software for encryption” can typically also perform decryption), to make the encrypted information readable again (i.e., to make it unencrypted) (see also cryptography).”

TrueTokenization and Krikken’s Post
Here, I will go through Krikken’s post point-by-point and counter with arguments based on TrueTokenization. I will provide a similar comparison based on PCI SSC Tokenization in the coming days.

[encryption-based tokenization is] where the token is mathematically derived from the original PAN through the use of an encryption algorithm and cryptographic key.

Where token generation is based on the use of cryptographic keys, compromise of the keys could result in the compromise of all current and future tokens generated with those keys.

Countering these first two points is easy. TrueTokens are NOT mathematically derived from the original PAN. Period.

If the tokenization server is a black box, an external observer cannot tell the difference between random and encrypted tokens; strong encryption exactly creates cipher text that is indistinguishable from random.

I’m not sure of the point of this statement. Just because encrypted data may look like other data means all data is encryption? Using this logic, all oranges are fruit so therefore all fruit are oranges. Using this same logic, this post is an encrypted version of Krikken’s post. Just because something looks like a duck does not mean it must walk and talk like one as well. Unless, of course, it’s a duck.

If the tokenization server is a black box, an attacker must compromise it to gain access – randomization or not – and will gain similar access to keys or token tables.

I guess I can agree with this, assuming that when he says “black box” he is talking about an in-house merchant solutions that requires both tokenization and de-tokenization capabilities. With Shift4’s tokenization, DOLLARS ON THE NET is an outsourced tokenization “black box” and DOLLARS ON THE NET is the payment gateway. There is no reason for merchant side de-tokenization and there are not capabilities to do so and therefore no way to steal credentials to de-tokenize. Once the payment information is tokenized, it is never de-tokenized from the merchant’s location.

If the critical factor in limiting the use of encryption is to prevent compromise of future tokens, the black box model allows key rotation to limit future token compromise.

Umm, what? I guess the speaking portion of the RSA Conference session would have shed some light on this random piece of trivia.

Even when randomization is used, pre-generation of token pairs [for performance] may result in the compromise of future tokens.

How and why? The beauty of TrueTokenization is that the token is used for reference to the original PAN; it is not mathematically related to the PAN. Our original white paper, “Tokenization in Depth,” allowed for random tokens, sequential tokens, or any combination thereof. If I told you that the next three tokens I assign are going to be 4, 5, and 6, how does that expose the next three card numbers you give me to tokenize?

I guess the vulnerability would be if you had de-tokenization capabilities. In which case, being able to calculate or determine a valid list of tokens to look up the PAN could be a potential weakness. From my experience, this is not a concern, since DOLLARS ON THE NET does not have a merchant-facing de-tokenization layer.

One point I would like to make here is that token predictability, even in solutions with de-tokenization capabilities, is not much of a concern. You have to remember: tokens replace PANs and CHD and the reason many merchants and vendors select tokenization is because it’s much easier to remove the data than protect the data. With this in mind, you must assume that the active tokens are freely available to the potential bad doer, so predictability is moot.

The lifetime of a credit card number is short enough to account for dealing with breakthroughs in cryptanalysis.

Umm, so? I guess my argument would be most school busses are yellow. What does either have to do with the price of tea in China?

Some so-called non-encryption-based token algorithms [because of the strict way in which the guidance defines what is encryption] have plenty of vulnerabilities to put them on par with encryption-based tokens.

Again, TrueTokenization tokens are not mathematically derived from the PAN, so the strength of randomness, or lack thereof, is moot.

Even random tokens have problems if they’re not designed right.

Provided there is no merchant-accessible de-tokenization capabilities, randomness is not a requirement, nor is it a vulnerability.

And, most importantly:

Any tokenization creates a persistent mapping between tokens and originals. Even in the form of a token database, this is at the core a form of encryption.

This goes back to the false argument that all oranges are fruit therefore all fruit are oranges.

In my next post, I’ll take a look at Krikken’s points in comparison with the PCI DSS bastardized version of “tokenization.” Until then, I welcome your comments here.