HA iSCSI and the Storage Controller

HA with OpeniSCSI:

Further to my general musings about HA, good chum Harold Spencer and I have been working on providing a way for our users to achieve high availability in the storage controller components of Eucalyptus for those who don’t have the SAN adapter.  The storage controller is the Eucalyptus component which handles the EBS (Eucalyptus Block Storage) volumes for instances.

Let me give some background here …..

With 3.0, we’re still following our old business model, this is effectively an Enterprise Edition for us.  This is all changing with the upcoming 3.1 release, as our good friend and colleague Greg outlines here.  This is a really big deal for Eucalyptus and also very exciting, we’re going back to our unified code-base.  The only things which won’t be open are a couple of proprietary subscription-only modules; one for VMware and one for interfacing with enterprise-class storage arrays (aka SAN’s). In regards to HA, those using Eucalyptus who don’t have a subscription can use HA for the Cloud Controller components (read my last blog entry on why that’s important) but won’t currently be able to use HA with the Storage Controller since this can only be achieved with an enterprise-class storage array and the subscription-only SAN adapter.  By default, without the SAN adapter, Eucalyptus uses the open-iSCSI target daemon (tgt) to handle EBS volumes.

We still think our users will want SC HA using open-iSCSI and we want to ensure that those users who don’t have a SAN (or don’t need the performance one brings) or who don’t pay our salaries can still have some kind of high availability experience at the storage level using some of the service and infrastructure HA platforms out there for Linux.  After all, running an open-iSCSI target daemon as a resource on top of a Linux clustering solution is nothing new and works very well of course.  In our testing we’ve been using pacemaker + corosync and running tgtd as a resource on top of a DRBD backed logical volume across a two-node cluster.  It just works, you can fail over LUN’s whilst clients are writing data to them and only achieve a brief wait at the client before disk operations can continue on the migrated LUN.  Data integrity is handled by the proper failover of resources and the excellent DRBD solution, courtesy of the great guys at Linbit.

You’ll want to check out Harold’s blog here for some more details on the work we’ve been doing to try to integrate this.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s