HA iSCSI and the Storage Controller

HA with OpeniSCSI:

Further to my general musings about HA, good chum Harold Spencer and I have been working on providing a way for our users to achieve high availability in the storage controller components of Eucalyptus for those who don’t have the SAN adapter.  The storage controller is the Eucalyptus component which handles the EBS (Eucalyptus Block Storage) volumes for instances.

Let me give some background here …..

With 3.0, we’re still following our old business model, this is effectively an Enterprise Edition for us.  This is all changing with the upcoming 3.1 release, as our good friend and colleague Greg outlines here.  This is a really big deal for Eucalyptus and also very exciting, we’re going back to our unified code-base.  The only things which won’t be open are a couple of proprietary subscription-only modules; one for VMware and one for interfacing with enterprise-class storage arrays (aka SAN’s). In regards to HA, those using Eucalyptus who don’t have a subscription can use HA for the Cloud Controller components (read my last blog entry on why that’s important) but won’t currently be able to use HA with the Storage Controller since this can only be achieved with an enterprise-class storage array and the subscription-only SAN adapter.  By default, without the SAN adapter, Eucalyptus uses the open-iSCSI target daemon (tgt) to handle EBS volumes.

We still think our users will want SC HA using open-iSCSI and we want to ensure that those users who don’t have a SAN (or don’t need the performance one brings) or who don’t pay our salaries can still have some kind of high availability experience at the storage level using some of the service and infrastructure HA platforms out there for Linux.  After all, running an open-iSCSI target daemon as a resource on top of a Linux clustering solution is nothing new and works very well of course.  In our testing we’ve been using pacemaker + corosync and running tgtd as a resource on top of a DRBD backed logical volume across a two-node cluster.  It just works, you can fail over LUN’s whilst clients are writing data to them and only achieve a brief wait at the client before disk operations can continue on the migrated LUN.  Data integrity is handled by the proper failover of resources and the excellent DRBD solution, courtesy of the great guys at Linbit.

You’ll want to check out Harold’s blog here for some more details on the work we’ve been doing to try to integrate this.

High Availability & Cloud ….

In my previous job, I spent a fair amount of time with enterprise customers who were – more often than not – moving from proprietary Unix to Linux.  Coming from Unix, these customers were often heavy users of resource(or service)-based High Availability (HA) products; such as HP Service Guard, Sun Cluster, Veritas Cluster Server etc.  When moving to Linux, they wanted feature parity with their traditional Unix HA technologies and they ran very Enterprise-level workloads; like database, CRM, and so on. More often than not, they would end up running Red Hat Cluster Suite (HA Addon) or SUSE High Availability Extension; both of which are paid add-ons for these distributions where you pay an additional support premium on top of your normal subscription.  These two stacks are of course open source, so you can go and download pacemaker, corosync and other tools and use them to keep your own services highly available at no cost.  Of course, there are other proprietary products available for Linux like SIOS LifeKeeper and HP Service Guard for Linux.  The latter is back from the dead due to popular demand.

I use the term resource-based in the previous paragraph to describe the nature of the HA solution; install the software on one host, then install your application and configure an active/passive failover environment across multiple servers.  Upon failure of the service on one node, the resource management and infrastructure layers will detect failure(s) and move the service resources (i.e. your application and dependencies, like a floating IP) across to the second node, with a small window of downtime.  The applications you run on top of such clustering layers are often not built with high availability in mind. Rather the application itself is blissfully unaware that it’s in some highly available configuration and instead the data and configuration is mirrored with underlying technologies, like DRBD.  This is where you want a really good cluster resource manager, like pacemaker and the important infrastructure layer, such as corosync and openAIS, to handle things in bullet-proof fashion.  These technologies have good mission-critical references too, Deutsche Flugsicherung (German Air Traffic Control) run their systems on top of SLES High Availability Extension, for example.   I’ll refer to this as service & infrastructure HA.

Then we have another way of achieving HA.  This is what some term as in-built HA, whereby the application itself has high availability features built into it.  A widely known example would be Oracle’s RAC product.  It’s UBER expensive but has differentiation over a simple active/passive Oracle DB running on top of one of the Linux HA stacks.  If your wallet permits, it allows you to scale out in large active/active database configurations.  This can be extremely useful for performance reasons whilst also achieving resilience.  These features are really useful for data warehousing and other demanding use cases where money is no object, like I dunno, utility billing or inland revenue and tax collection (har!).  Oh, did I mention cost?   You’re paying for a fair bit of development time and also features, you can do cool stuff that you just can’t do with the “normal” clustering products on Linux (and Oracle will ensure this stays the case).

With the release of Eucalyptus 3.0, we see Eucalyptus deliver the industries first cloud platform with in-built HA.  This is a big thing and something that enterprise customers need.  Whilst the notions and requirements of high availability in the cloud may be different depending on who you talk to, a customer will often require that their cloud be high availabilty to safeguard service level agreements (SLA’s) with users.

Furthermore, with 3.1, we’ll be enhancing flexibility further to give customers and users the option to run supported hybrid-HA topologies.  What do I mean by this?  Well, the Cloud Controller (frontend) and Walrus (S3) components can have redundant spares but the Cluster Controller and Storage Controler don’t, this saves on hardware costs and complexity for some.  It also unlocks an interesting possibility in terms of your SLA’s.  So, why would you want to do this?  Well, if an availiablity zone goes down, this might not bother you; users could just use another one in the timebeing until service is restored.  In the meantime you’d like to ensure users can still interact with the cloud and use a different availability zone, recovering their work from snapshots or data from bukkits (S3).  Without the Cloud Controller and Walrus, this wouldn’t be possible and this would be a much bigger issue for you; users wouldn’t be able to do *anything* and there goes your service.

Eucalyptus 3.0 HA lays the groundwork for flexible high availability in a truly distributed cloud platform.  It’s one to watch 😉