We are coming up on the third anniversary of our first service provider, multi-tenant cloud product launch at Windstream. Other than Amazon and a few other major players, there weren’t a lot of people running or even thinking abour running multi-tenant cloud environments for enterprise consumption. Even Amazon was not a player in the enterprise market other than development shops that didn’t care much about SLAs tied to their environments. Today, we have a lot of enterprise class customers running on our cloud infrastructure and loving the benefits along with the very high uptimes we have been able to provide.
I step back to that time just to paint a picture of how far we’ve come since that first launch. Back then, there were no Vblocks or Flexpod reference architectures. There was no Cisco VMDC design and implementation guide. We built everything on our own, pieced each technology together, and came out with what has turned out to be a pretty solid platform serving the majority of our cloud customers. We have standardized on a hybrid Vblock sort of architecture for our cloud pods now and we are now the proud owners of four such pods using Vblock as the foundation for compute and storage.
Since we didn’t have reference architectures from all the major vendors to go on, we had to approach each piece one at a time and test all the various details to make sure it met each requirement for the product we wanted to build and sell. One of those requirements was remote access into the virtualized environments for each of the tenants in the cloud pod.
I’ve written several previous posts about the selection process we went through for firewalls but what the final decision came down to was the ability to provide a comprehensive remote access solution for our cloud customers. We ultimately chose the Juniper ISG platform because of it’s ability to terminate IPSEC site-to-site tunnels and remote access VPN sessions directly into the customer’s virtual firewall environment. Cisco could not do this at the time and Check Point became extremely cost prohibitive once you started adding in all the costs of the platform to run them on (Crossbeam), the Provider One license to manage them all, and the Check Point licensing itself which has notoriously been very expensive. So, we went with the Juniper and actually front ended our second and third Vblock based cloud pods with the Juniper ISG platform as well. It has been solid for us with the exception of the limited IPSEC throughput of the platform (you basically have to disable three of the four VPN engines when in multi-tenant mode due to a bug we found that Juniper has yet to fix). This takes your theoretical maximum IPSEC throughput down to roughly 600Mbps, which is plenty for our needs.
Let me take a minute to explain why this one feature is so critical for us. First, we traditionally were not a telco company. We were a data center and managed hosting service provider. We did not have a massive MPLS network that we wanted to take advantage of and sell our cloud products to those customers. We owned data centers and a large amount of our business came from managed services within those data centers (colo, Internet, firewalls, load balancing, servers, switches, etc.). For our cloud platform, we would be asking our customers to step into a completely virtualized infrastructure where they would never have direct access to the equipment running their environment. They would have their own virtualized firewall, their own VLANs and trust boundaries, their own VMs, and their own storage pools. However, they would never be able to physically touch any of it. To manage it, they would need remote access to it and if it was an enterprise environment running their enterprise applications, they might need secure connectivity to the cloud infrastructure so their corporate employees could use those applications in a secure fashion.
So having the background we had with our dedicated offerings, the answer was simple. We would just provide a way to terminate IPSEC VPN tunnels directly into the customer’s cloud environment and provide them with remote access VPN clients.
We are now a much different organization. Windstream purchased our company and then made us the headquarters for their new data center division. They wanted to buy into some growth markets and they liked what we had done with all of our managed hosting offerings, including cloud. A year later, they purchased Paetec, another large telco company in the northeast with their own data center assets. Those data centers have also been migrated underneath the data center division of the company now. However, both Windstream and Paetec serve a completely different customer than we traditionally served. Our customers were mainly local to wherever a data center happened to be located. They would come in and build their infrastructures up in the managed colo space on our data center floors and many would have us managed that infrastructure for them. With Windstream and Paetec, both brought tremendous MPLS network footprints with thousands of existing customers that could take advantage of our data center and managed hosting products. We could basically bring in several 10Gbps circuits to our data centers, provide the connectivity to our existing cloud pods, and all of a sudden we could be facing hundreds and even thousands of those MPLS customers buying cloud and other data center services from us. Well, that is exactly what we are working on now and it can’t come soon enough for our sales teams.
Stepping forward, when you then go and combine a data center service provider with enterprise MPLS and DSL providers, you basically have to accomodate both types of customers. You have the traditional data center customer who has no corporate network connectivity into your cloud and needs some form of remote access solution. You also have your MPLS customers who are getting their VRFs extended over to the cloud environments. Those customers may also have remote access needs but they also may already have managed remote access solutions somewhere else in their network that can provide the access they need once connectivity has been established to their MPLS VPN.
The bottom line? Remote access needs are still alive and well when building multi-tenant cloud solutions. We now have two more Vblocks we are building and this time, we are trying to adapt the Cisco Virtualized Multi-tenant Data Center (VMDC) architecture for the network stack above the Vblock. One of the largest challenges, once again, is providing for remote access. As of this writing, Cisco ASA firewalls still do not offer the capability to terminate IPSEC VPNs into a customer virtual context.
The new VMDC 2.2 design and implementation guide recently came out and there are some updates that provide remote access connectivity solutions. We have also been working with Cisco on this problem and were trying to work with them three years ago as well but no one really understood why we would need the kind of remote access we needed for our clients. Today, I think it’s still a challenge to help vendors understand these needs and why they exist.
The first solution Cisco came back with was to set a pair of ASR1Ks off to the side to handle the remote access needs (both site-to-site and remote access). We wouldn’t be able to provide SSL VPN but that was ok, we traditionally were providing IPSEC VPN clients for our customers and it was ok with us to continue this practice.
Once we started looking at that solution however, we learned that the ASR1Ks could not maintain statefulness during a failover event between the redundant pair. Once of our requirements was that during failover, state must be maintained so the clients could seamlessly fail over during an event and continue working as if nothing happened. Well, once this requirement came up, Cisco said they would go away and come back with another alternative.
We met with several of their VMDC guys last week here in Raleigh and their other solution was to implement a second pair of ASA 5585 firewalls that would be dedicated to handling VPN needs. This pair of firewalls obviously would be running in single context mode since IPSEC is still not supported once multi-context mode is enabled. However, there is one small problem. Actually, it’s a pretty big one. Since it won’t be running multi-context mode, you can’t have overlapping IP space. In a multi-tenant environment, this is HUGE. We would basically have to NAT on both our end and the customer ends of the tunnels. Anyone who has supported large amounts of IPSEC VPN knows that the NAT transactions through tunneling is one of the most problematic parts. When you add NAT on both ends, ends controlled and managed by two separate organizations, you’re asking for big trouble. Plus, we just didn’t want to have to burden our customers with this information. Our customers are used to not having to worry about overlapping RFC1918 IP space in their cloud environments.
So we are now back to using a pair of ASA 5585s running in multi-context mode to handle the edge firewalls for each customer in addition to a pair of ASR1Ks for IPSEC needs. We will just have to write some specific failover language into the product guides so that customers understand VPN service, when it fails over to another chassis, won’t maintain state and their tunnels will have to re-key. If a supervisor fails in one of the ASRs, my understanding is that state is maintained on a single chassis and frankly, that’s where most of your failures are going to occur anyway.
This also helps us with our MPLS integration efforts as well. You need some kind of device that is VRF aware so that you can bring your MPLS customers into your cloud environment and provide them with their own internal routing tables. In our Vblock implementations where we are deploying the new Cisco VMDC infrastructure, those functions are handled by a pair of Catalyst 6509 VSS switches. We didn’t want to have to go purchase two more pairs of that expensive platform and tack them on to our legacy cloud pods just to bring MPLS in. So going with the ASR1Ks as part of the infrastructure, we’ll most likely buy a couple more pairs of those to add to our legacy pods. Then we could have that single platform handle MPLS termination, IPSEC site-to-site termination, and IPSEC remote access client connectivity. That would help offload the Junipers and avoid our IPSEC throughput limitations discussed earlier.
OK, that’s enough on that subject. I just wanted to document most of this while it was fresh on my mind.
Filed under: Business & Networking, Cisco, Cisco UCS, Cloud Computing, Data Center, Network Virtualization | Tagged: Cisco Nexus, Cisco UCS, Cloud Computing, STP Logical Interface, Vblock, VCE, VLAN Port Count, VLAN Port Instance, VLAN Port Limit | Leave a comment »