April 28, 2011

Executive Insight: Resilience + Redundancy = Reliability

Companies that processes payment transactions talk about ”five nines” or 99.999% reliability – but none truly deliver it. In reality, a really good processing company delivers around 99.8%. That seems pretty good, too. But is it? It means in a given year, they average less than two hours a month of service disruption.

Unfortunately, to “achieve” these uptimes, most processors fudge the numbers. They use the phrase “net of scheduled maintenance,” when describing their uptime, which means only unplanned downtime counts against their total even though their scheduled maintenance still disrupts service.

Being up 100% of the time on a 24-hour basis means being up 8,760 hours. So, 99.8% would be up around 8,742 hours; or down 18 hours a year. “Net of scheduled maintenance” doesn’t change things that much; it still means down approximately 18 hours a year or 1.5 hours each month.

Now, let’s get rid of the “net of scheduled maintenance” and look at the numbers again. 8,760 (up 24/7/365), minus 18 hours (99.8% up), minus 12 hours (maintenance) equals a real world uptime of 8,730, which is actually 99.65%.

Even if they are up 100% of the available time, with “net of scheduled maintenance” removed, the best they can do is 99.86%.

The point is, when you are comparing actual uptimes, scheduled maintenance is a significant consideration. Why is this important? It’s important because Shift4 maintains all of our systems at the highest industry and security levels and we do it without any scheduled maintenance downtime. So, when we say we are up 99.98% of the time, we really are. And when we say that we have the best uptime in the payments industry we mean just that – and the numbers prove it.

In the interest of full disclosure, just because we are up 99.98% of the time, doesn’t mean that you will be able to authorize payments 99.98% of the time. If the processor that you or your bank have chosen is one of those 99.6% processors, that’s the best you can expect. Your own Internet Service Provider can also affect your uptime.

How is Shift4 able to maintain this level of uptime? Architecture, architecture, architecture! Our data centers are designed so that each function of the system is redundant. Some functions have as much as 12 times redundancy and are load balanced. Our SQL servers are connected to a Raid 10 Storage Attached Network (SAN) and all data on the SAN is replicated to a matching SAN in a data center at a different location.

Unlike most gateways, we have redundant connections to each processor. If processors have multiple geographically disbursed data centers, Shift4 connects to each one. Because of this, you will have better uptime for a particular processor with Shift4 than you could have with a direct connection. Each Shift4 data center utilizes multiple networks so that no single hub, router, or firewall could bring the system down.

Because Internet connectivity is of paramount importance, each of Shift4’s data centers has four different connections to the Internet. Unlike other gateways and processors that employ a single, very large pipe to the Internet, Shift4 believes that several large pipes can guarantee better uptime and performance.

Performance is further improved by Shift4’s adaptive routing technology. We “score” routes from your Internet provider to our four providers based on speed and reliability and send back the optimal route for you to use.

Of course, Shift4 also utilizes multiple levels of hardware and software firewalls to assure the security of all transactions. Uninterrupted Power Supplies (UPS) and generators back up building power while redundant HVAC systems maintain an operational environment thus ensuring system availability of “five nines.”

What does it mean for a system to be resilient?
System resiliency means that the system can automatically adjust to the external environment without any interruption in service. For us, that means if a processor is having trouble or goes down in one geographical location we can connect to the processor’s alternative connection in another state. Further, if the processor’s local telecommunication company has trouble with one of our processing centers, we can automatically route the traffic through our alternative data center with a different telecommunications provider.

Resilience also means that if hardware supporting a particular function experiences malfunctions, it can be fixed without an interruption in service. If operating systems or Shift4 software need to be upgraded, it can happen without interruption. It also means that if one of our Internet providers cannot provide connectivity to us, our system will switch to an alternative connection. What’s even better is that it tells the system at your location not to bother with the problem provider, but to merely move to another.

A recent alert to our customers told you that you need to make sure that your firewalls have outbound rules that allow connectivity to spans of IP addresses. If you lock yourself down to only one IP address, when DOLLARS ON THE NET® sends you an alternative path, you will not be able to take it. Effectively, this negates the resiliency of your own operation. Some folks believe they should lock things to a single IP because security and PCI require it. As member number one of the PCI council, we can assure you that this is not the case. PCI only requires that you know to whom you are connecting. (Note that we are talking about “outbound rules.”)

Follow Shift4’s prescribed procedures for installation and setup and you will enjoy payment processing with the best uptime in the industry.