Vblock Series 300 FX and 300 EX Builds Complete!

After three solid weeks of eating and sleeping Vblock, we have our production and lab Vblocks up and operational.  We’ve also completed full failover testing of the entire environment.

Here is a brief overview of each component for our design:

  1. Aggregation Routers:  These are what we consider our provider edge devices even though we manage services all the way down into the customer environments.  These will either be Cisco Catalyst 6500 class or Nexus 7K class depending on the data center.  For this particular cloud pod, it’s a Catalyst 6509 cluster.
  2. Juniper ISG 2000 Firewalls:  These are our customer firewalls.  We use the Juniper VSYS technology to virtualize customer firewalls on a single hardware platform.  We looked at a lot of different platforms to serve as our customer virtual firewalls but the Juniper ISG was the only one that gave us everything we wanted.  It gives us the resource partitioning we need between customers and it also allows us to terminate both site-to-site and remote access IPSEC VPN directly into the customer environment.  We like Cisco ASAs as well but they do not offer this functionality.  We have been doing enterprise cloud services for almost two and a half years now and 90% of our cloud customers either have site-to-site or remote access VPN needs for their cloud environment.  The Juniper platform gives us what we need to accomplish this and it’s worked well for us.
  3. Cisco Nexus 5548P:  This is our core switching component within the cloud environment.  It’s a great platform and offers us the 10G density we require.  It’s also included as part of the Vblock now (it wasn’t originally).  This is basically the same as our other cloud environments although the 5548P is a little upgrade over the Nexus 5020s we deployed previously.
  4. Cisco UCS 6140 Fabric Interconnects:  These are a critical component of a UCS build.  It ties all of your chassis together and passes that traffic up to the upstream Nexus switches.  Note that there is no switching that goes on within the fabric interconnects.  All switching is passed up to the parent 5548P.  Also note that each fabric interconnect handles either the A or B side of the fabric.  These are treated as completely separate paths even on the network side (it looks like a traditional FC network with SAN-A/SAN-B.  I’ll cover more of this later.
  5. Cisco UCS 5108 Chassis:  There are four of these in our Vblock, each containing four Cisco UCS B230 M1 blades, giving us a total of 16 B230 blades starting out.  This will leave us capacity to add 8 more blades in each chassis, bringing a total of 32 blades.  From there, we can expand and add 4 more chassis, bringing the total to 64 blades.
  6. Cisco 2104 FEX:  There are two of these fabric extenders in each of the 5108 chassis.  You have an A side and B side, each having it’s own FEX.  All four ports from FEX-A run to fabric interconnect A and all four ports on FEX-B run to fabric interconnect B.  As you can trace out in the diagram above, each vNIC on your VM is tied all the way up to either an A or B side.  The way this works is pretty sweet when you see it in action.  For example, during our failover testing, we brought the complete virtual portchannel down between fabric interconnect A and the two Nexus 5548Ps (reference the diagram above).  When you take that portchannel down upstream of everything, the actual vNIC on the VM will report that it is down.  This is extremely important to understand in a large production environment.  Basically, you could get an alert that one of your vNICs is down, and in reality it could be one of the FEXs, one of the VPCs, or one of the fabric interconnects.  If that A or B side can’t reach all the way up to the parent Nexus cluster, the vNIC will report as a downed interface.  Again, this was neat to see in our failover testing this week.

So that’s it!  Overall this has been a fun project.  I’m glad it’s coming to an end.  We’ve learned a lot and I’ll be sharing some of those lessons over the next few months as we move into getting ready for our pilot customers on this pod.

VLAN Port Instance Limitation On Cisco UCS (STP Logical Interfaces)

In our planning for our second cloud pod using Cisco UCS infrastructure, we’ve run across a limitation that everyone planning a large multi-tenant deployment should be aware of.  During our discussions with Cisco, VCE, and EMC, we always focused on the VLAN port limitation.  In a typical Vblock deployment using Cisco UCS 6140 Fabric Interconnects and upstream Cisco Nexus 5548P switches, the lowest common denominator for VLAN limitation exists on the Fabric Interconnects.  The stated number of supported VLANs is 1024 but 32 are set aside and reserved for VSANs so it’s really only 992 max.

992 VLANs is a big enough challenge within itself.  Across our first cloud pod, we average roughly two VLANs per customer environment.  If we had no management VLANs or anything else to account for, you can see that we would basically be limited to 496 customers if that average continued until the capacity of that pod was reached.

Because of what virtualization brings to the table, we can scale the compute infrastructure to support a lot more customers than we could in the past.  I’m just thinking about the Vblock buildouts that we have modeled so far.  We can certainly support many more than 496 customer environments so again, the VLAN limitation must be accounted for when discussing scale.  So obviously this was a major discussion point as we worked through our configurations and modeling for the next cloud pod.

What wasn’t discussed was the VLAN port instance limitation.  This is also called STP Logical Interfaces or VLAN port count in the 6100 data sheet.  This is another limit everyone should be aware of on the Fabric Interconnects (and Nexus but those are higher).  VLAN port instances are calculated by:

#TRUNKS x VLANS + ACCESS_PORTS

So if we have 4 UCS chassis with 4 trunks per chassis (2 to each 2104 FEX) we have 16 trunks.  Added to one or two trunks from the FIs to the Nexus 5548s, it gives us 17 trunk ports minimum depending on your design (vPC or port channels count as one trunk for the purposes of this calculation).  There is a 6K VLAN port instance limitation right now which means we would be limited to 352 VLANs.  In the next UCSM 1.4 patch (due in July), they increase this to 14K so we will then have a 823 VLAN ceiling.  This assumes we are still only using 4 trunk ports per chassis and we have 4 chassis.  If we have 8 chassis with 4 trunk ports each, we’ll be limited to 424 VLANs across the POD.  This is definitely something to be aware of as it significantly limits the scaling capability of the UCS infrastructure using this design model.

Obviously there are other ways people are trying to get around this (think VCD-NI…but then run away).  This problem will be addressed as it’s the only way this is going to scale for large multi-tenant environments.  The answer is not VCD-NI but it may be something similar coming out of the Nexus 1000V product soon.

Cisco Or Juniper Firewalls For The Next Cloud Pod?

As mentioned in the previous post, we’re nearing the decision point on our next cloud pod architecture.  One of my tasks related to this project is to run a parallel effort to figure out what everything will look like above the compute and storage infrastructure.  In our case, that means the upper layer switching/routing, the firewalls, the load balancers, and the connectivity to our customer’s physical space/equipment in the data center.

As I’ve mentioned in previous posts, with our first cloud pod we selected the Juniper ISG platform which encompasses the VSYS technology that allows the creation of virtual firewalls for each customer.  At the time, we looked at Check Point, Cisco, and Juniper as well as a few of the pure cloud firewall players surfacing at the time.  So I wanted to go out there to see what had changed in the market since that first cloud pod was built two years ago.

I started out with the Cisco ASA platform.  Our company, and my team specifically, are now very familiar with this platform as it’s one of the products we offer as a managed firewall product.  It functions well and we’ve seen strong performance from the platform.  From a virtual firewall perspective, it also allows virtual contexts to be created.  This allows the same type of functionality as the Juniper VSYS technology as it enables us to create virtual firewall clusters for customers on shared hardware infrastructure.

One thing to keep in mind regarding our cloud offerings is that they are targeted at the enterprise market.  We aren’t trying to sell companies a few VMs or a development environment in the cloud.  Our focus is on production virtualization environments that enterprises will be able to depend on for mission-critical applications.  As such, we see customers having the same types of requirements in the cloud that we see them have in our dedicated physical data center spaces.

That leads me to why we selected the Juniper platform during cloud round one.  One of the primary reasons we made this decision was because the Juniper platform allowed us to terminate IPSEC VPN tunnels into the virtual environment of each customer.  Just like 95% of our legacy data center customers need VPN access into their physical data center environments, a similar ratio need VPN access into their virtual data centers.  Juniper offered that ability and it’s worked very well for us.  Our first cloud pod is now at about 40% capacity and almost all of those customers have at least one site-to-site VPN tunnel to a remote location and they each have several remote access VPN accounts for administrators of the environment.  We handle all of it on the Juniper ISG-2000s and the good news is that it’s supported in the same manner as we support our customers using Juniper dedicated firewalls.

Cisco ASA at the time (spring 2009) did not have the capability to terminate VPN on a virtual context once the virtual context features were enabled.  My hope was that they would have addressed this by now (two years later) and that we could move to the ASA platform after finding success with it in our dedicated spaces.  Unfortunately, I was disappointed.  Cisco is still not there and I’m not sure if they are going to get there unless someone puts some significant pressure on them to get these features on the platform.  I sat through the latest product overview with my account team and was once again told the features were not available in the virtual contexts.  The toughest part is that all the Cisco guys even seem disappointed and it appeared that I certainly wasn’t the first customer to point out these shortcomings.

Basically, Cisco tried to address the problem by proposing a new solution using a set of Cisco ASAs enabled with the virtual contexts, a set of ASAs off to the side to handle SSL remote access VPN users, and then a set of ASR 1000s to handle the site-to-site IPSEC tunnels.  Obviously that would be a stretch for me to go deploy a solution with three clusters of very expensive devices to handle the functionality of a single cluster of Juniper ISGs today.  We just can’t justify it at this point so we’re sticking with the Juniper platform which has proved reliable for these needs.  We really did want Cisco to have a viable offering but it just wasn’t there yet.

Additionally, we went through a brief overview of the new Cisco Virtual Security Gateway (VSG) which is integrated with the Nexus 1000v.  Since the 1000V is going to enable some much needed functionality on the virtual distributed switching side, we were excited to see this announcement and thought perhaps it could hold some value as a virtual firewall product for our cloud environments.  However, we learned that at this time it’s primary just meant as a VM to VM based firewall.  What we need is a border firewall that would protect all the VMs in a customer environment under a single set of administrative policies or zones, similar to what a dedicated firewall cluster would do for a data center environment.  We’re looking for something similar to VMWare’s vShield Edge and Cisco says that’s coming in the next phased release of the VSG solution.  We can’t wait to see how that plays out.

For now, we’re sticking with the Juniper ISG platform using their VSYS technology.  It works.  It’s proven.  We know how to deploy and support it.  We also know it’s limitations (we’ll save that for another day).

It’s Been A While, I Know

It’s been quite some time since I’ve written anything and it’s not that I haven’t had a lot to write about.  It’s an issue of time and of course, commitment to keeping this updated.

So I’ll get right to the point.  We’re continuing to go through the research phase for adding our second and third cloud pods.  Some things we’re looking at that would significantly impact the way our cloud operates are listed below.  I’ll try to use this list as a guideline for future posts so I can hit on what we’ve learned about each one and the potential we see.

  • Cisco’s UCS solution
  • The Cisco Nexus 1000v
  • Updates to the Nexus 5K line (and future updates that are coming)
  • The VCE initiative (joint venture between Cisco, EMC, VMware, and Intel)
  • Look back at the Juniper ISG as the virtual firewall solution for our cloud environment
  • Virtual load balancing (Cisco ACE specifically)
  • iSCSI or FC or both?
  • Cross data center needs (Once we have multiple pods, new requirements arise)

I think that’s a good list to get started.  Overall, we’ve learned a lot over the past two years since we jumped neck deep into virtualization and our enterprise cloud computing initiative.  It’s come a long way and we’ve learned a lot of lessons along the way that hopefully will make future pods operate more effectively and ease the management overhead.  Stay tuned!

We’re Launching The Cloud!

After many months of work, we’re finally ready to launch our cloud computing offering. The cloud is built and we completed failover testing yesterday. We’re also holding a special open house event/launch party for this new offering and we’ve already got customers wanting to sign up for the service. It’s really picking up steam around here and we’re all excited.

We ultimately ended up going with the following networking components for the cloud:

Virtualized Firewall Platform – Juniper ISG 2000s, creating a VSYS for each customer
Switching Platform – Cisco Nexus 5000 and Cisco Nexus 2000 Series Fabric Extenders
Virtual/Shared Load Balancer – F5 BigIP LTM & GTM for global traffic management

I think the most exciting piece of this offering is our ability to rapidly deploy new customer environments. We’re promising customers that we can deliver their entire environment from the firewalls to the load balancers to the virtual machines to the storage….all within 5 days from order submission. In the past, we were shooting for 30 days in a typical dedicated environment where the customers had dedicated hardware that had to be purchased and provisioned before we turned it over to them.

From a business standpoint, this is big. If an e-commerce site suddenly experiences increased demand, they can request more virtual machines be added to their application server pools and within 4 hours they’ll have them. Anyone that has worked in big dedicated environments can tell you that having that ability significantly increases your ability to add capacity on the fly. It eases the burden in many areas.

Look for the official press releases next week. I just wanted to report that we have completed the build and are in the final stages for launch. After the launch, I plan to provide a few helpful details around the integration of the server/VM environment with the Cisco Nexus switching platform. We learned a few lessons there for sure.

Virtualized Firewall Decision Made

I know it’s been some time since my last update but that’s not because nothing has been going on. It’s because so much has been going on which I guess is a very good thing these days! I’m happy to report that all of our research has been completed into the virtualized firewall offerings and we have made our decision.

We started off with Check Point and looked at the Power-1 VSX platform. I wrote about that in the previous blog entry. Since that time, we completed overview presentations with both Cisco and Juniper. We then put together a great comparison matrix which analyzed features and price.

First, let me discuss the Cisco offering. I’m a big fan of Cisco for what Cisco is very good at. However, they’ve struggled with security over the years and the recent research on virtualized firewall offerings indicated that they’re still struggling. They are much better and I do believe ACE is a great improvement over PIX but I still think there is a long ways to go. The management solution for the virtualized offering seemed to be tied up in a mix between their local GUI management interface, a central manager and their MARS platform. It wasn’t unified which is something we’ve been after from the beginning. If you’re planning for large growth and need scalability, you need unification and ease of management. Cisco isn’t there. They also say that you shouldn’t terminate your IPSEC tunnels onto each virtual system and when you’re a managed hosting provider, this just doesn’t work. Again, scalability and unification were key and we felt Cisco fell short.

Next, we looked at Juniper. Juniper has been running the virtual system technology for a number of years and you could tell that the product had been baked. The main findings for Juniper were:

  • Multiple options for reserving resources per virtual system
  • Very good performance numbers (which they all had at this level)
  • The ability to terminate VPN tunnels per virtual system
  • The virtual firewalls are managed through the same central manager as the dedicated firewalls
  • The total cost of ownership was very attractive (comparable to Cisco, half the cost of Check Point)
  • Backing of industry analysts such as Gartner which means a lot when you’re a managed service provider trying to prove to your customers that you’ve made solid choices in the platforms you offer

As you can probably tell, we recommended Juniper. We’re actually revamping our entire firewall offering and will be introducing a dedicated firewall line as well as the virtual offering all provided by Juniper. It provided the best model overall and the price points will allow us to be more competitive than we can be by offering Check Point to our customers.

In summary, Juniper won on this one. We did feel that Check Point had some added features that none of the others did but when you consider the high cost, you can’t beat Juniper. I am also happy to report that the Juniper solution is going to be used to protect the cloud computing environment we are building. That’s a very exciting project we’re working on and I’ll cover more on that later as we finalize the solution in the next two weeks.