Embedding Cassandra into your JavaEE AppServer

Recently I started some tests with Apache Cassandra. Sincerely I got impressed with its aptitude to horizontally scale and therefore to handle load increase. Among its main virtues are:

Ability of replacing failed nodes without downtime
Complete absence of a single point for failure. Every node is identical in regard to functionality

not to mention a few other features.
But still there is one point that bothers me: I still see it and its API as a foundation thing. Something like JGroups turned into. JGroups is largely used nowadays but rarely directly by the application developer. You use it indirectly when you use a clustered JBoss or JBoss TreeCache or Infinispan.
One of these responsibilities that still lies on application developer is the fail over capacity. When you connect to a Cassandra Database node through your application you still need to fetch through its JMX API the rest of the nodes that are part of this cluster, otherwise if this node fails even though your Cassandra cluster is still up and healthy your application won’t know how to connect to it. Another possibility (and in fact this is a recommendation even if you retrieve this node list) is to have a list of reliable servers to serve as bootstrap connection servers but remember Murphy Law even all the servers on this list may go down so you still need to retrieve the whole list for failing over.
So, in summary we have something like what is depicted on the following picture:

Cassandra Regular usage thru JCA and Thrift

Note that there is an iherent complexity in this solution, the JCA connector will be responsible for keeping a list of fail over nodes, something a Cassandra instance already does, so we end up violating the DRY principle.
But what alternatives do we have?

The StorageProxy API

It turns out that CassandraDaemon class (the main one responsible for starting up a Cassandra node) does only a few things that we can embed into our application, or should I say into our JCA Connector since in a JavaEE this is the only place you should be spawning threads and opening files directly from filesystem.
In fact those few steps are properly described on Cassandra Wiki.
Then if you take what we could call a regular approach you’d spawn the embedded Cassandra and perform a connection to localhost in order to communicate make the application talk to the Cassandra Server you’ll end up having an unecessary network communication that could reduce performance by increasing latency. It turns out that this can be avoided too by using the (not so stable) StorageProxy API.
By taking all the steps describe above you’d end up with a much simpler architecture as the one below:

Cassandra Embedded into JavaEE Server

With this architecture you are shielded from the complexity of failing over, Cassandra handles this automatically for you. Then you could argue: what if I need to independently scale the Cassandra layer? No problem! You can resort to an hybrid architecture like the one below:

Cassandra Embedded with Extra Nodes

In order to achieve this you only need to provide to these extra nodes the address of some of the JavaEE servers, this way, the extra standalone nodes can communicate with Cassandra daemons inside the AppServer and become part of the Cassandra cluster.

Memory mapped TTransport

My first attempt on doing this with Cassandra involved implementing a TTransport class that would be responsible for sending over a byte buffer commands to the server worker thread and receiving response in a similar fashion. I tried this first due to the complete ignorance of the existence of the StorageProxy API. But later I thought this could solve the issue related to the lack of stability this API has (as per the apache Wiki page). But this turned to be a not so easy task.
I thought of having a CassandraMMap that would act as the CassandraDaemon class but it would differ on TThreadPoolServer initialization as below:

		this.serverTransp = new MMapServerTransport();
		TThreadPoolServer.Options options = new TThreadPoolServer.Options();
		options.minWorkerThreads = 64;
		this.serverEngine = new TThreadPoolServer(new TProcessorFactory(
				processor), serverTransp, inTransportFactory,
				outTransportFactory, tProtocolFactory, tProtocolFactory,
				options);

The same instance of MMapServerTransport would be handled to the client through a getter method in order to open client connections as follows:

		TTransport tr = mmap.getServerTransp().getClientTransportFactory()
				.getTransport(null);
		TProtocol proto = new TBinaryProtocol(tr);
		Cassandra.Client client = new Cassandra.Client(proto);
		tr.open();

Requests through getTransport would be queued on server using the class below and a TTransport for the server would be returned upon acceptImpl:

public class MMapTransportTuple {
	private TByteArrayOutputStream server2Client;
	private TByteArrayOutputStream client2Server;
	private TTransport clientTransport;
	private TTransport serverTransport;

	public MMapTransportTuple(int size) {
		server2Client = new TByteArrayOutputStream(size);
		client2Server = new TByteArrayOutputStream(size);
		clientTransport = new MMapTransport(server2Client, client2Server);
		serverTransport = new MMapTransport(client2Server, server2Client);
	}
        //certain codes ommited for brevity

This class would be responsible for binding the memory buffers from client to server and vice-versa.
The last class involved in this implementation would be the MMapTransport:

public class MMapTransport extends TTransport {

	private TByteArrayOutputStream readFrom;

	private TByteArrayOutputStream writeTo;

	/**
	   *
	   */
	public MMapTransport(TByteArrayOutputStream readFrom,
			TByteArrayOutputStream writeTo) {
		this.readFrom = readFrom;
		this.writeTo = writeTo;
	}
        //read and write would operate on the respective buffer
        //and they would point to different buffers on client and server TTransport instances...
}

But this turned to be harder than I thought at first and as time became a short resource I’ll stick with the StorageProxy API approach for now.

5 Responses to “Embedding Cassandra into your JavaEE AppServer”

Feed for this Entry Trackback Address

1 Eduardo Costa
December 29, 2011 at 11:03 pm

Just the way I wanted to do! Any good results using StorageProxy API?

- 2 rafaelri
  January 12, 2012 at 4:10 pm
  
  Nop, unfortunately I am not working on such systems anymore (or at least not now :D)
  
3 joelmob
May 13, 2012 at 9:11 am

How did you embedd the cassandra proccess? I’m learning Java EE and i would like to learn how to start proccesses in application servers but i can’t seem to find any examples on this

4 joelmob
May 13, 2012 at 9:13 am

How did you embedd the cassandra proccess? I’m learning Java EE and i have been looking for examples on how to start proccesses inside Java EE servers either outside/inside a container? If you know how to, please let me know.

- 5 rafaelri
  May 13, 2012 at 12:49 pm
  
  This is one particular point where you are advised to follow the specs… and by following the specs I mean avoiding spawning threads everywhere and also unmanaged threads (those not provided by a container factory) since this would make it harder to spot performance problems latter in case they happen.
  And now answering your question: the “embedding” thing isn’t really about embedding a process inside another since it isn’t possible but you end up reproducing what the program main method would do inside some class that you code as a ResourceAdapter (that’s the advisable way of doing inside an AppServer). And if possible provide Thread factories that delegate up to the container the Thread creation process.

IT Developer World