Cisco Cloud Services Router – Brief Introduction For Service Providers

For those that didn’t hear, Cisco announced their Cloud Services Router at Cisco Live this year in San Diego.  They didn’t put much emphasis on it at all and if you weren’t paying attention, you would have missed it completely.  The last I heard, it was scheduled for GA towards the end of 2012 but there are some pre-release versions available if you want to get your feet wet.

So what it is?  Basically, it’s an IOS-XE router running on a VM in your cloud environment.  That’s the bottom line.  That alone should generate lots of grand thoughts about possible use cases for those that have been involved in deploying large SP cloud environments.  There is a lot of potential here for the SP and cloud provider markets in general and I was surprised by how little attention this got at Cisco Live.

Having the ability to set up a VM in your cloud environment as the gateway for that environment has a lot of potential for an SP cloud offering.  Think about it for a minute.  How are you bringing customer connectivity into your cloud environment today?  You might be terminating MPLS VRFs at the provider edge and then extending VLANs into the customer cloud environments or you might be terminating VPN services at a VPN gateway somewhere upstream of their cloud and then extending the environment back from that device.

CSR opens up the opportunity to have end-to-end customer connectivity all the way to the customers cloud environment.  It will be able to serve as the MPLS or VPN gateway for your cloud environments without the need for additional specialized upstream gear to handle these functions.  You’ll be able to do full L2 over WAN connectivity from your customer’s sites/data centers to your IaaS infrastructure.  This could be huge for the SP market who is still struggling to figure out all the various connectivity models to get to their customers outside of the facilities hosting their cloud pods.  The simple fact of being able to move the VRF termination to the cloud edge eliminates having to use VLANs from the provider edge to the cloud edge, which can be a very big challenge in a lot of SP designs.

CSR will support the protocols you are probably running in your network now (OSPF, BGP, etc.) and CSR will also function as a LISP tunnel router, allowing layer 3 address mobility between your cloud environments in different data centers.

On top of all that, there appears to be some kind of firewall capabilities that will be more than just the standard router ACLs.  Depending on the extent of what gets released, this could function as your cloud customer’s perimeter gateway or at least offer another layer of firewall services for added security (perhaps use it in combination with ASA1000v?)

It should be very interesting to see this when the final version makes it to market.  I’m looking forward to seeing how it will impact some of our SP architectures moving forward.

Understanding Remote Access Requirements For Multi-Tenant Cloud

We are coming up on the third anniversary of our first service provider, multi-tenant cloud product launch at Windstream.  Other than Amazon and a few other major players, there weren’t a lot of people running or even thinking abour running multi-tenant cloud environments for enterprise consumption.  Even Amazon was not a player in the enterprise market other than development shops that didn’t care much about SLAs tied to their environments.  Today, we have a lot of enterprise class customers running on our cloud infrastructure and loving the benefits along with the very high uptimes we have been able to provide.

I step back to that time just to paint a picture of how far we’ve come since that first launch.  Back then, there were no Vblocks or Flexpod reference architectures.  There was no Cisco VMDC design and implementation guide.  We built everything on our own, pieced each technology together, and came out with what has turned out to be a pretty solid platform serving the majority of our cloud customers.  We have standardized on a hybrid Vblock sort of architecture for our cloud pods now and we are now the proud owners of four such pods using Vblock as the foundation for compute and storage.

Since we didn’t have reference architectures from all the major vendors to go on, we had to approach each piece one at a time and test all the various details to make sure it met each requirement for the product we wanted to build and sell.  One of those requirements was remote access into the virtualized environments for each of the tenants in the cloud pod.

I’ve written several previous posts about the selection process we went through for firewalls but what the final decision came down to was the ability to provide a comprehensive remote access solution for our cloud customers.  We ultimately chose the Juniper ISG platform because of it’s ability to terminate IPSEC site-to-site tunnels and remote access VPN sessions directly into the customer’s virtual firewall environment.  Cisco could not do this at the time and Check Point became extremely cost prohibitive once you started adding in all the costs of the platform to run them on (Crossbeam), the Provider One license to manage them all, and the Check Point licensing itself which has notoriously been very expensive.  So, we went with the Juniper and actually front ended our second and third Vblock based cloud pods with the Juniper ISG platform as well.  It has been solid for us with the exception of the limited IPSEC throughput of the platform (you basically have to disable three of the four VPN engines when in multi-tenant mode due to a bug we found that Juniper has yet to fix).  This takes your theoretical maximum IPSEC throughput down to roughly 600Mbps, which is plenty for our needs.

Let me take a minute to explain why this one feature is so critical for us.  First, we traditionally were not a telco company.  We were a data center and managed hosting service provider.  We did not have a massive MPLS network that we wanted to take advantage of and sell our cloud products to those customers.  We owned data centers and a large amount of our business came from managed services within those data centers (colo, Internet, firewalls, load balancing, servers, switches, etc.).  For our cloud platform, we would be asking our customers to step into a completely virtualized infrastructure where they would never have direct access to the equipment running their environment.  They would have their own virtualized firewall, their own VLANs and trust boundaries, their own VMs, and their own storage pools.  However, they would never be able to physically touch any of it.  To manage it, they would need remote access to it and if it was an enterprise environment running their enterprise applications, they might need secure connectivity to the cloud infrastructure so their corporate employees could use those applications in a secure fashion.

So having the background we had with our dedicated offerings, the answer was simple.  We would just provide a way to terminate IPSEC VPN tunnels directly into the customer’s cloud environment and provide them with remote access VPN clients.

We are now a much different organization.  Windstream purchased our company and then made us the headquarters for their new data center division.  They wanted to buy into some growth markets and they liked what we had done with all of our managed hosting offerings, including cloud.  A year later, they purchased Paetec, another large telco company in the northeast with their own data center assets.  Those data centers have also been migrated underneath the data center division of the company now.  However, both Windstream and Paetec serve a completely different customer than we traditionally served.  Our customers were mainly local to wherever a data center happened to be located.  They would come in and build their infrastructures up in the managed colo space on our data center floors and many would have us managed that infrastructure for them.  With Windstream and Paetec, both brought tremendous MPLS network footprints with thousands of existing customers that could take advantage of our data center and managed hosting products.  We could basically bring in several 10Gbps circuits to our data centers, provide the connectivity to our existing cloud pods, and all of a sudden we could be facing hundreds and even thousands of those MPLS customers buying cloud and other data center services from us.  Well, that is exactly what we are working on now and it can’t come soon enough for our sales teams.

Stepping forward, when you then go and combine a data center service provider with enterprise MPLS and DSL providers, you basically have to accomodate both types of customers.  You have the traditional data center customer who has no corporate network connectivity into your cloud and needs some form of remote access solution.  You also have your MPLS customers who are getting their VRFs extended over to the cloud environments.  Those customers may also have remote access needs but they also may already have managed remote access solutions somewhere else in their network that can provide the access they need once connectivity has been established to their MPLS VPN.

The bottom line?  Remote access needs are still alive and well when building multi-tenant cloud solutions.  We now have two more Vblocks we are building and this time, we are trying to adapt the Cisco Virtualized Multi-tenant Data Center (VMDC) architecture for the network stack above the Vblock.  One of the largest challenges, once again, is providing for remote access.  As of this writing, Cisco ASA firewalls still do not offer the capability to terminate IPSEC VPNs into a customer virtual context.

The new VMDC 2.2 design and implementation guide recently came out and there are some updates that provide remote access connectivity solutions.  We have also been working with Cisco on this problem and were trying to work with them three years ago as well but no one really understood why we would need the kind of remote access we needed for our clients.  Today, I think it’s still a challenge to help vendors understand these needs and why they exist.

The first solution Cisco came back with was to set a pair of ASR1Ks off to the side to handle the remote access needs (both site-to-site and remote access).  We wouldn’t be able to provide SSL VPN but that was ok, we traditionally were providing IPSEC VPN clients for our customers and it was ok with us to continue this practice.

Once we started looking at that solution however, we learned that the ASR1Ks could not maintain statefulness during a failover event between the redundant pair.  Once of our requirements was that during failover, state must be maintained so the clients could seamlessly fail over during an event and continue working as if nothing happened.  Well, once this requirement came up, Cisco said they would go away and come back with another alternative.

We met with several of their VMDC guys last week here in Raleigh and their other solution was to implement a second pair of ASA 5585 firewalls that would be dedicated to handling VPN needs.  This pair of firewalls obviously would be running in single context mode since IPSEC is still not supported once multi-context mode is enabled.  However, there is one small problem.  Actually, it’s a pretty big one.  Since it won’t be running multi-context mode, you can’t have overlapping IP space.  In a multi-tenant environment, this is HUGE.  We would basically have to NAT on both our end and the customer ends of the tunnels.  Anyone who has supported large amounts of IPSEC VPN knows that the NAT transactions through tunneling is one of the most problematic parts.  When you add NAT on both ends, ends controlled and managed by two separate organizations, you’re asking for big trouble.  Plus, we just didn’t want to have to burden our customers with this information.  Our customers are used to not having to worry about overlapping RFC1918 IP space in their cloud environments.

So we are now back to using a pair of ASA 5585s running in multi-context mode to handle the edge firewalls for each customer in addition to a pair of ASR1Ks for IPSEC needs.  We will just have to write some specific failover language into the product guides so that customers understand VPN service, when it fails over to another chassis, won’t maintain state and their tunnels will have to re-key.  If a supervisor fails in one of the ASRs, my understanding is that state is maintained on a single chassis and frankly, that’s where most of your failures are going to occur anyway.

This also helps us with our MPLS integration efforts as well.  You need some kind of device that is VRF aware so that you can bring your MPLS customers into your cloud environment and provide them with their own internal routing tables.  In our Vblock implementations where we are deploying the new Cisco VMDC infrastructure, those functions are handled by a pair of Catalyst 6509 VSS switches.  We didn’t want to have to go purchase two more pairs of that expensive platform and tack them on to our legacy cloud pods just to bring MPLS in.  So going with the ASR1Ks as part of the infrastructure, we’ll most likely buy a couple more pairs of those to add to our legacy pods.  Then we could have that single platform handle MPLS termination, IPSEC site-to-site termination, and IPSEC remote access client connectivity.  That would help offload the Junipers and avoid our IPSEC throughput limitations discussed earlier.

OK, that’s enough on that subject.  I just wanted to document most of this while it was fresh on my mind.

Understanding VLAN Port Count In A Cisco UCS Environment – Part 3 (Final)

I’ve posted two previous blog entries trying to cover the VLAN Port Count in a cloud environment.  In a Cisco UCS environment (specifically a Vblock in our case), you have to be concerned about VLAN port count on both the UCS Fabric Interconnect and the upstream Nexus 5548Ps (or whatever upstream Nexus you’re using).  Just as a side note, the upstream Nexus switches are now included in a Vblock.  When we first started looking at the Vblock model, the upstream Nexus switches were not included in a Vblock configuration.  I posted in a previous blog all the things you need to consider to reach the VLAN port count on the upstream Nexus 5Ks.  For this post, I’ll just focus on the downstream Cisco 6100 Fabric Interconnects.

First, understand that in a cloud environment utilizing Cisco UCS and upstream Nexus switches, the VLAN port count limits on the Fabric Interconnects are the lowest common denominator.  Our Cisco 6140 Fabric Interconnects currently support a VLAN port count limit of 6000.  This means that most of the focus at this time should be put on the Fabric Interconnects unless you’ve got a design that for some reason calls for many more port count instances on the upstream Nexus switches than on the 6140 fabric interconnects.  I understand that this 6000 limit will change to 14K+ with a future release, matching the current limitations of the Nexus 5548s.

So now we know our limit in UCS is 6000 VLAN port instances.  That number is made up from a combination of border ports (or uplink ports) and access ports.  See the screenshot below from UCSM.  This is found by going to the “equipment” tab in UCSM, go to the fabric interconnect, and you’ll see a drop-down tab on the right for VLAN Port Count (located right under the High Availability Details tab).  You can also do this in the CLI of the fabric interconnect by typing “show vlan-port-count”.  Just make sure you use the “scope fabric-interconnect” to get to the right level to run this command.  If you try from the top level, the sh vlan-port-count command will not be an option.

The border VLAN ports are your uplink ports and the VLANs carried over those ports so they are pretty easy to understand.  Port channels or virtual port channels (vPC) are considered as one uplink port.  In our design, we have three 10Gbps links going from a 6140 fabric interconnect to each upstream Nexus 5548 giving us a vPC consisting of 6 10Gbps links.  We have pre-staged some customer VLANs and of course we have all of our management, vMotion, replication, and other VLANs.  So our current count on VLANs in the environment is 110.  We are showing 111 for our “Border VLAN Port Count”.  I’m guessing that it adds the VLAN count to the actual trunk port giving us our 111 count but I may be wrong on that.  Perhaps there is a VLAN configured that is not showing up in UCSM that is represented here for some reason.  It could also be the Fabric-A VSAN that is carried on the FC uplink ports.  I bet that’s it now that I think more about it.  Maybe someone reading this can confirm where that one extra count comes from.  We confirmed this count by removing a configured VLAN from the LAN=>VLAN tab in UCSM.  When we did this, the count went to 110 total which means that VLAN was no longer being carried across the uplink vPC.

The next step is to figure out your access VLAN port count.  As you can see above, our count is current 184.  I can’t actually use my numbers for this example because we are still testing with this Vblock cloud pod before it goes into production.  So some VLANs are mapped down to some vnics and some are not (test networks, etc..).  The numbers don’t match up across the board so I don’t want to confuse anyone.

This number is actually incremented by each VLAN carried to each defined vNIC, plus any vHBAs.  So let’s assume you have 8 ESX hosts in your cluster.  Each ESX host has two vNICs and two vHBAs.  You have 100 VLANs configured in UCSM and all 100 of those VLANs are enabled on each defined vNIC.  Your two vNICs are configured as vNIC0 associated with fabric interconnect A and vNIC1 associated with fabric interconnect B.

We will use Fabric Interconnect A for the example to reach the number.  FI-A has 8 ESX hosts with each host presenting one vNIC and one vHBA to it.  Since we are carrying 100 VLANs across the environment, our count would be 808.  That’s VLANs on each vNIC, added together, and then added to 8 vHBAs (one from each ESX host).  Hopefully that makes sense.

Assuming your environment is configured properly and you don’t have strange reasons to carry certain VLANs on one side but not the other, your number on the fabric interconnect B should look the same.

So now we have our total of 808 Access VLAN Port Count and our 112 Border VLAN Port Count.  Our total VLAN port count is 920.  So we have 5080 vlan port instances to go before we hit the limit.

I’ll tell you that I saw more than one Cisco TAC response to a forum question where Cisco says they have lots of cloud service providers that never even get close to this number.  They’re basically telling someone they shouldn’t have to worry about this.  I can’t speak for other cloud service providers.  I know we sell a lot of cloud services to enterprise customers (in other words, not just dev shops).  In our public cloud environment, our customers consume at least one VLAN per customer and we average 1.5 VLANs across our entire cloud customer base.  Let’s use that average for our example below.

Let’s say I have a Vblock with two clusters spread across 32 ESX hosts (16 nodes per cluster).  For one reason or another (or maybe a future proof reason), we may need to extend some customer environments across both clusters.  Because of this, we extend all customer VLANs across both clusters.  I’ll assume that I have 100 customers and that my average VLAN per customer is what our actual average is (1.5 per customer).  That gives me 150 VLANs extended across 32 ESX hosts, plus 32 vHBAs, plus my 10 management/control/packet VLANs trunked to each host.  Finally, I’m running one vPC to the upstream Nexus cluster and I have all 160 VLANs (customer plus management) extended across that vPC trunk.

(150 Customer VLANS x 32 Hosts) + (32 Hosts x 10 Management VLANs) + 32 vHBAs per side + 160 Border VLAN ports = 5312

As you can see, I’m right under my 6000 number and I’ve only provisioned 100 of my typical customers in this multi-tenant service provider class cloud environment.  If I provision 15 more customers, I’ve hit my limit and I can no longer provision.  Also, you’ll note that I’m well below the actual VLAN limitation in a UCS environment.  VLAN Port Limit  is the number that cloud service providers need to focus on, not total VLANs supported.  At least this would be the case if they’re model is similar to ours.

Obviously it’s important to try to prune as many VLANs from each vNIC or trunk as possible.  But, as I explain above, there are a lot of business reasons why that may not be possible depending on what you’re trying to do.  Our work we are doing with vCD puts even more requirements/restrictions on how we can set things up so we have to be very cognizant of this limit.

I understand some things are going to be announced at vmWorld next week that could help with this scaling issue and we’re certainly looking forward to it.  I also know that VMWare has some things it can do with mac-in-mac encapsulation but as a guy who has been doing network engineering for over 15 years now, I tend to shy away from large L2 bridging environments.  We learned our lessons there back in the 90s.

Disabling Failback On VMware dvSwitch: Why You Should Consider

During some recent testing with our 1.0 cloud configurations in our lab, we noticed some behavior in failure scenarios that was pretty surprising.

The scenario is this.  You have a VMware based cloud pod running in an Active/Passive configuration (Some of our original cloud pods were implemented before vPC was supported on the Cisco Nexus platform).  You have a reboot of one Nexus switch.  Everything fails over to the other Nexus switch and it works perfectly.  You lost at most one packet.  You’re happy and everyone breathes a sigh of relief.  Then the failed Nexus switch comes back up.  Your monitoring platform starts going crazy with alerts for unreachable VMs.  After a few minutes, everything seems to have settled down and it’s all working again.  Then everyone asks, “What the heck just happened?”

What just happened is that your Nexus switch that failed was a primary for some of your ESX hosts.  When that primary path comes back, if Failback is set to YES, it will fail traffic back over to the primary path.  Here is the kicker.  VMware considers that path active again as soon as it sees the link up, regardless of whether the port is actually in a forwarding state.  So when that Nexus switch comes back up, there will be a brief period of time where the ESX host is sending traffic and it’s not going anywhere.  This could lead the host to consider itself isolated which would of course lead to whatever action you’ve chosen if the host becomes isolated (preferably you have chosen “Leave Powered On”).

This, by the way, is a reason portfast is recommended in ESX environments.  It’s also why portfast is enabled by default when you set the port mode to EDGE on a Nexus for your ESX host ports.

We noticed this behavior before we actually figured all this out in the lab so one thought was that we would manually shut down each ESX host port on the Nexus, one at a time.  We would then reboot the Nexus and since the ports were shut down anyway, the traffic would not fail back over.  Again, this was before we learned that VMware considered the path active just because it sees voltage going across the wire (which I think is a very bad implementation….would like to read more on why they did it this way).  So we did this.  We administratively brought down every ESX host port.  Everything again worked perfectly.  We saw one host at a time fail over to the secondary path.  Then we rebooted.  The Nexus appeared to be coming back up and everything was still responding well.  Then BOOM!  The exact same behavior happened again.

I had one of my engineers capture the interface logs for some of the ports the ESX hosts were attached to.  The Nexus actually brings all of the ports back up for two minutes before applying the running configuration to those ports.  So even though they were administratively shut down, it brought them back up for two minutes, the ESX hosts saw active links, failed the paths back over and traffic started moving, the Nexus applied the running configuration which brought the ports back down, and failover happened again.  Then of course when we brought each interface up, yet another failback event would have to occur.  Yikes!

I personally think this is a very poor implementation of networking fundamentals by VMware but I’m sure there are good reasons.  Perhaps that link status is the only thing they can use to trigger failover for some reason.  Again, I’d like to understand more about this and research it some more.

So here is what we decided to do as a result of this.  Our main goal was to prevent a host declaring itself isolated, triggering the isolation actions:

  1. Diasable Failback (Failback:NO).  This one step pretty much eliminates the problem from our point of view and our lab testing confirms this.  Since failover works great, everything follows the secondary path during a failure and remains on that path, even when the primary path for a particular host comes back up.  Since our entire environment is redundant and since one side of the infrastructure is fully capable of handling the full load, we really don’t care if the traffic gets switched back to the primary path.  Yes, it’s nice to know where your traffic is supposed to be going but in reality, VMware tries to balance out the primary and secondary paths anyway.  You do give up this balancing functionality if you disable failback but again, for our environment we would rather have rock solid stability.
  2. Set additional isolation addresses.  For those that don’t know, if a host fails to receive HA heartbeats from other hosts within a certain time period (13 seconds by default), it will initiate tests to what VMware calls isolation addresses.  By default when using ESXi, the isolation address is the gateway of the ESX management VLAN.  You can change this and set additional isolation addresses to check.  A few options are the upstream Nexus switch (perhaps a loopback address) or if you’re using iSCSI, NFS, or FC0E, you can use an address on your storage device.  If the host fails to receive any heartbeats and can’t reach any of the isolation addresses, it will be declared as isolated.  That’s when whatever action you’ve set gets triggered.
  3. Make sure you are using portfast.  If you have Active/Passive environments using STP, make sure you set your ESX host ports to use portfast.  This will allow the port to get put into a forwarding state as quickly as possible instead of waiting for STP to run through the Blocking=>Listening=>Learning=>Forwarding cycle which could take 30-50 seconds depending on your configurations.  This will definitely lead to host isolation if you are not running it and your failover scenarios will show a lot of downtime when you test them (and of course in production if you actually implement it that way).  Also remember, the Nexus EDGE ports have portfast enabled by default so you don’t have to explicitly call that out in your config.

Hope this helps someone avoid an unplanned/bad/unexpected outage.  We’ve learned a lot of lessons about this over the past week.  For anyone with a networking background (and virtualization guys that want to know how this works in detail), I recommend the book by Duncan Epping and Frank Denneman titled VMware vSphere 4.1 HA and DRS Technical Deepdive.  This book was loaned to me by one of our virtualization engineers and it has some extremely valuable insights into how HA works within a vSphere environment.

Vblock Series 300 FX and 300 EX Builds Complete!

After three solid weeks of eating and sleeping Vblock, we have our production and lab Vblocks up and operational.  We’ve also completed full failover testing of the entire environment.

Here is a brief overview of each component for our design:

  1. Aggregation Routers:  These are what we consider our provider edge devices even though we manage services all the way down into the customer environments.  These will either be Cisco Catalyst 6500 class or Nexus 7K class depending on the data center.  For this particular cloud pod, it’s a Catalyst 6509 cluster.
  2. Juniper ISG 2000 Firewalls:  These are our customer firewalls.  We use the Juniper VSYS technology to virtualize customer firewalls on a single hardware platform.  We looked at a lot of different platforms to serve as our customer virtual firewalls but the Juniper ISG was the only one that gave us everything we wanted.  It gives us the resource partitioning we need between customers and it also allows us to terminate both site-to-site and remote access IPSEC VPN directly into the customer environment.  We like Cisco ASAs as well but they do not offer this functionality.  We have been doing enterprise cloud services for almost two and a half years now and 90% of our cloud customers either have site-to-site or remote access VPN needs for their cloud environment.  The Juniper platform gives us what we need to accomplish this and it’s worked well for us.
  3. Cisco Nexus 5548P:  This is our core switching component within the cloud environment.  It’s a great platform and offers us the 10G density we require.  It’s also included as part of the Vblock now (it wasn’t originally).  This is basically the same as our other cloud environments although the 5548P is a little upgrade over the Nexus 5020s we deployed previously.
  4. Cisco UCS 6140 Fabric Interconnects:  These are a critical component of a UCS build.  It ties all of your chassis together and passes that traffic up to the upstream Nexus switches.  Note that there is no switching that goes on within the fabric interconnects.  All switching is passed up to the parent 5548P.  Also note that each fabric interconnect handles either the A or B side of the fabric.  These are treated as completely separate paths even on the network side (it looks like a traditional FC network with SAN-A/SAN-B.  I’ll cover more of this later.
  5. Cisco UCS 5108 Chassis:  There are four of these in our Vblock, each containing four Cisco UCS B230 M1 blades, giving us a total of 16 B230 blades starting out.  This will leave us capacity to add 8 more blades in each chassis, bringing a total of 32 blades.  From there, we can expand and add 4 more chassis, bringing the total to 64 blades.
  6. Cisco 2104 FEX:  There are two of these fabric extenders in each of the 5108 chassis.  You have an A side and B side, each having it’s own FEX.  All four ports from FEX-A run to fabric interconnect A and all four ports on FEX-B run to fabric interconnect B.  As you can trace out in the diagram above, each vNIC on your VM is tied all the way up to either an A or B side.  The way this works is pretty sweet when you see it in action.  For example, during our failover testing, we brought the complete virtual portchannel down between fabric interconnect A and the two Nexus 5548Ps (reference the diagram above).  When you take that portchannel down upstream of everything, the actual vNIC on the VM will report that it is down.  This is extremely important to understand in a large production environment.  Basically, you could get an alert that one of your vNICs is down, and in reality it could be one of the FEXs, one of the VPCs, or one of the fabric interconnects.  If that A or B side can’t reach all the way up to the parent Nexus cluster, the vNIC will report as a downed interface.  Again, this was neat to see in our failover testing this week.

So that’s it!  Overall this has been a fun project.  I’m glad it’s coming to an end.  We’ve learned a lot and I’ll be sharing some of those lessons over the next few months as we move into getting ready for our pilot customers on this pod.

VLAN Port Limit (Continuing to Gain an Understanding)

Wouldn’t you know it!  Just as we start digging into the VLAN port count (or VLAN port instance or STP logical interface) in preparation for our Vblock cloud deployment, we see a mysterious error pop up one of our Nexus 5020 clusters we deployed in our lab environment.  We use this environment to test the limits of our cloud pods as we scale the environments.  Here is the error we got:

STP-2-VLAN_PORT_LIMIT_EXCEEDED

After bringing the concerns up with VCE a few weeks ago surrounding VLAN port limits within UCSM and the Fabric Interconnects, we had some discussion around the topic and as you can see from earlier posts, we received lots of different answers as to how to arrive at this limit and what to do to get around it.  At the time, it was our understanding that this was a UCS specific challenge.

It turns out that it also applies to the Nexus switch products as we found out by seeing the error above and then dealing with the fix.  We have a somewhat official answer on how to calculate this within a UCS environment and I’ll address that in a following post.  However, I wanted to make sure everyone was aware of this limitation on the Nexus 5000 and Nexus 5500 series switches.  It’s a very real problem, especially in a service provider/multi-tenant type cloud environment.

On the Nexus switches, this number is calculated basically like this:

(VLANS_ON_TRUNK_1 + VLANS_ON_TRUNK_2 + VLANS_ON_TRUNK_X…..)

So in this small cloud pod that we experienced the error, we had 30 trunks each carrying all the VLANs configured within the environment which was a total of 112 VLANs.  This brought our total VLAN port count to 3360 on each Nexus 5020 in the pod.  The limit on the firmware we were running was an older version (this cloud pod has been running without failure for over two years).  The VLAN port limit on that version supported up to 3140.

Well obviously we were over this limit and the cloud pod was operating just fine with no issues.  So what is the effect of exceeding this limit?  We asked Cisco.  And what they told us was fairly alarming.  They said you could continue to provision new VLANs and new trunks.  However, some VLANs will end up without a spanning-tree instance, meaning if you have redundancy in the network with multiple paths for that VLAN, a loop would be created because spanning-tree would not be blocking for that VLAN.

To address this, our plan of action was:

  • Immediately put together a plan to prune the VLANs.  Since all the trunk ports were carrying all of the management VLANs and customer VLANs, we could prune lots of numbers off that count simply by carrying only the needed VLANs across the trunk.  This allowed us to get back to a safe number until the next step could be implemented.
  • After we had pruned as much as possible, we began planning for a firmware upgrade.  For the Nexus 5000 series, 5.0(3) in L2 configurations will bring our VLAN port limit up to 12000.  This will allow us to add a lot more blades and more VLANs without reaching that limit anytime soon (at least we hope we don’t reach it before the next update).  Obviously this was a major change for our cloud pod.  This environment had been up and running with no issues for over 2 years so this put us to the test.  Everything worked as planned thanks to full redundancy and lots of failure testing two years ago, but it’s still nerve racking to mess with a stable environment like this.
  • Our next step would be to convert the trunks from each ESX host into a single Virtual Portchannel.  Now that we had a version of code that supports VPC, we can use it on this cloud pod (tells you how long we’ve been doing cloud services).  You’ll see why this is critical below.

If you are experiencing this same problem or think you might in your environment, here are a few things you need to know.  Again, keep in mind that this is for the Nexus 5000 and 5500 specifically.  I’ll address the other parts of UCS in a future post:

  1. Portchannels and members of Virtual Portchannels (VPCs) count as one trunk!  This is major.  In our environment, this effectively reduces our trunk count by half.  Since we are dealing with small limits here, every number counts.  If I can shave 15 trunks carrying 112 VLANs each out of my environment, I just saved 1680 VLAN port instances.
  2. Trunks or portchannels created between a parent Nexus switch and a Fabric Extender (such as a Nexus 2148 in our case) count as a trunk in the overall count.
  3. Trunks or portchannels created on a FEX will count towards the trunk count on the parent Nexus.  In hindsight this seems obvious since we’re dealing with a single control plane and the FEX is viewed like a line card, but this surprised us a bit during our testing.
  4. SPAN ports don’t count towards the overall number.  We thought it would.  Part of our VLAN pruning plan was to take the trunks going to our cloud IDS appliances and only trunk the customer VLANs for the customers who had purchased that service, instead of allowing all VLANs for ease of provisioning in the future.  We did this, but the port count did not decrease, indicating that SPAN ports aren’t counted as trunks.  I guess this makes sense since it wouldn’t be carrying spanning-tree for these VLANs.
  5. Most importantly, and perhaps I should have put this first, you need to take this into consideration as you design your environment.  This is the kind of thing that no one talks about up front but it will sneak up on you and present itself at a time further down the road when you aren’t expecting any issues like this.  If you are in a service provider or multi-tenant cloud environment, this is critical because of the number of VLANs you need to carry.
  6. We are deploying vCD (vCloud Director).  I know they want you to use VCD-NI but anyone who has been in networking as long as I have understands the challenges of creating extremely large layer-2 bridging environments.  I’ve also read some pretty good posts where some guys have run very good packet captures in a VCD-NI environment and proven that there are some real security concerns.  If you’re in an enterprise environment where segmentation is nice but not business impacting if it breaks, that’s ok.  If you’re in a multi-tenant environment where a customer’s data is their business and if it gets compromised they’re out of business, that is another story.  I understand that Cisco is working on using IP encapsulation to accomplish the same type goals as VCD-NI without the L2-bridging challenges.  I think we’ll see something about that around VM World in a few weeks.
  7. For the Nexus 5500 series switches, the VLAN port limit is 14,500 so you get 2500 more than on the Nexus 5000s with the same code.

Continued Investigation On VLAN Port Count (STP Logical Interfaces)

I wanted to post a quick note to say that we are still actively investigating the VLAN port count limitations within a Cisco UCS pod.  I’ve seen a couple of different explanations from both Cisco and VCE and some of them contradict each other.  So, we are working hard to get the accurate information from the right sources and once we do, I’ll do a more complete write up on this configuration limit and all the pieces that tie into the formula for calculating it.

VLAN Port Instance Limitation On Cisco UCS (STP Logical Interfaces)

In our planning for our second cloud pod using Cisco UCS infrastructure, we’ve run across a limitation that everyone planning a large multi-tenant deployment should be aware of.  During our discussions with Cisco, VCE, and EMC, we always focused on the VLAN port limitation.  In a typical Vblock deployment using Cisco UCS 6140 Fabric Interconnects and upstream Cisco Nexus 5548P switches, the lowest common denominator for VLAN limitation exists on the Fabric Interconnects.  The stated number of supported VLANs is 1024 but 32 are set aside and reserved for VSANs so it’s really only 992 max.

992 VLANs is a big enough challenge within itself.  Across our first cloud pod, we average roughly two VLANs per customer environment.  If we had no management VLANs or anything else to account for, you can see that we would basically be limited to 496 customers if that average continued until the capacity of that pod was reached.

Because of what virtualization brings to the table, we can scale the compute infrastructure to support a lot more customers than we could in the past.  I’m just thinking about the Vblock buildouts that we have modeled so far.  We can certainly support many more than 496 customer environments so again, the VLAN limitation must be accounted for when discussing scale.  So obviously this was a major discussion point as we worked through our configurations and modeling for the next cloud pod.

What wasn’t discussed was the VLAN port instance limitation.  This is also called STP Logical Interfaces or VLAN port count in the 6100 data sheet.  This is another limit everyone should be aware of on the Fabric Interconnects (and Nexus but those are higher).  VLAN port instances are calculated by:

#TRUNKS x VLANS + ACCESS_PORTS

So if we have 4 UCS chassis with 4 trunks per chassis (2 to each 2104 FEX) we have 16 trunks.  Added to one or two trunks from the FIs to the Nexus 5548s, it gives us 17 trunk ports minimum depending on your design (vPC or port channels count as one trunk for the purposes of this calculation).  There is a 6K VLAN port instance limitation right now which means we would be limited to 352 VLANs.  In the next UCSM 1.4 patch (due in July), they increase this to 14K so we will then have a 823 VLAN ceiling.  This assumes we are still only using 4 trunk ports per chassis and we have 4 chassis.  If we have 8 chassis with 4 trunk ports each, we’ll be limited to 424 VLANs across the POD.  This is definitely something to be aware of as it significantly limits the scaling capability of the UCS infrastructure using this design model.

Obviously there are other ways people are trying to get around this (think VCD-NI…but then run away).  This problem will be addressed as it’s the only way this is going to scale for large multi-tenant environments.  The answer is not VCD-NI but it may be something similar coming out of the Nexus 1000V product soon.

Multi-Hop FCoE On Cisco Nexus 7K and MDS 9500

I sat through the major Cisco announcement today titled “Evolutionary Fabric, Revolutionary Scale.”  There were several significant announcements today and many were very relevant to our cloud computing and next gen data center efforts at Windstream.  One of the more significant pieces of information is that multi-hop FCoE will now be supported on the Nexus 7K and MDS 9500 director class platforms.

This is rather big news as for some time now I’ve heard the experts at Cisco say that won’t be possible until we can change/eliminate the SAN-A/SAN-B model.  This session was more focused towards C-level management so there wasn’t a deep dive into the details.  I plan on spending some time digging up those details in the next few weeks and will do a brief write up once complete.

The reason I say it’s big news for me is because I’m in both the data center business and the cloud computing business.  Right now, our design models include these cloud pods that are built with all the various technology pieces in relatively close proximity.  The network, compute, and storage layers that comprise these cloud environments are all contained within a single rack or group of racks on the data center floor.  Since we are driving towards FCoE on our cloud pods now that we are using the VNX platform from EMC, having the ability to do multi-hop FCoE in the data center removes some limitations that previously existed.  We now won’t have to worry about proximity for the cloud components or have to worry about lots of large, expensive fiber runs all over the data center.  We’ll now be able to provision new compute clusters that use shared storage volumes all over the data center (and possibly in another data center) in the near future.

I know there isn’t a lot of detail here but I did want to share this bit of news early as I’ll be looking more into this over the next week or two.  Cheers.

Cisco Or Juniper Firewalls For The Next Cloud Pod?

As mentioned in the previous post, we’re nearing the decision point on our next cloud pod architecture.  One of my tasks related to this project is to run a parallel effort to figure out what everything will look like above the compute and storage infrastructure.  In our case, that means the upper layer switching/routing, the firewalls, the load balancers, and the connectivity to our customer’s physical space/equipment in the data center.

As I’ve mentioned in previous posts, with our first cloud pod we selected the Juniper ISG platform which encompasses the VSYS technology that allows the creation of virtual firewalls for each customer.  At the time, we looked at Check Point, Cisco, and Juniper as well as a few of the pure cloud firewall players surfacing at the time.  So I wanted to go out there to see what had changed in the market since that first cloud pod was built two years ago.

I started out with the Cisco ASA platform.  Our company, and my team specifically, are now very familiar with this platform as it’s one of the products we offer as a managed firewall product.  It functions well and we’ve seen strong performance from the platform.  From a virtual firewall perspective, it also allows virtual contexts to be created.  This allows the same type of functionality as the Juniper VSYS technology as it enables us to create virtual firewall clusters for customers on shared hardware infrastructure.

One thing to keep in mind regarding our cloud offerings is that they are targeted at the enterprise market.  We aren’t trying to sell companies a few VMs or a development environment in the cloud.  Our focus is on production virtualization environments that enterprises will be able to depend on for mission-critical applications.  As such, we see customers having the same types of requirements in the cloud that we see them have in our dedicated physical data center spaces.

That leads me to why we selected the Juniper platform during cloud round one.  One of the primary reasons we made this decision was because the Juniper platform allowed us to terminate IPSEC VPN tunnels into the virtual environment of each customer.  Just like 95% of our legacy data center customers need VPN access into their physical data center environments, a similar ratio need VPN access into their virtual data centers.  Juniper offered that ability and it’s worked very well for us.  Our first cloud pod is now at about 40% capacity and almost all of those customers have at least one site-to-site VPN tunnel to a remote location and they each have several remote access VPN accounts for administrators of the environment.  We handle all of it on the Juniper ISG-2000s and the good news is that it’s supported in the same manner as we support our customers using Juniper dedicated firewalls.

Cisco ASA at the time (spring 2009) did not have the capability to terminate VPN on a virtual context once the virtual context features were enabled.  My hope was that they would have addressed this by now (two years later) and that we could move to the ASA platform after finding success with it in our dedicated spaces.  Unfortunately, I was disappointed.  Cisco is still not there and I’m not sure if they are going to get there unless someone puts some significant pressure on them to get these features on the platform.  I sat through the latest product overview with my account team and was once again told the features were not available in the virtual contexts.  The toughest part is that all the Cisco guys even seem disappointed and it appeared that I certainly wasn’t the first customer to point out these shortcomings.

Basically, Cisco tried to address the problem by proposing a new solution using a set of Cisco ASAs enabled with the virtual contexts, a set of ASAs off to the side to handle SSL remote access VPN users, and then a set of ASR 1000s to handle the site-to-site IPSEC tunnels.  Obviously that would be a stretch for me to go deploy a solution with three clusters of very expensive devices to handle the functionality of a single cluster of Juniper ISGs today.  We just can’t justify it at this point so we’re sticking with the Juniper platform which has proved reliable for these needs.  We really did want Cisco to have a viable offering but it just wasn’t there yet.

Additionally, we went through a brief overview of the new Cisco Virtual Security Gateway (VSG) which is integrated with the Nexus 1000v.  Since the 1000V is going to enable some much needed functionality on the virtual distributed switching side, we were excited to see this announcement and thought perhaps it could hold some value as a virtual firewall product for our cloud environments.  However, we learned that at this time it’s primary just meant as a VM to VM based firewall.  What we need is a border firewall that would protect all the VMs in a customer environment under a single set of administrative policies or zones, similar to what a dedicated firewall cluster would do for a data center environment.  We’re looking for something similar to VMWare’s vShield Edge and Cisco says that’s coming in the next phased release of the VSG solution.  We can’t wait to see how that plays out.

For now, we’re sticking with the Juniper ISG platform using their VSYS technology.  It works.  It’s proven.  We know how to deploy and support it.  We also know it’s limitations (we’ll save that for another day).