Archive for the 'Software Architecture' Category


Designing an ASTERIX Protocol Parser

In the next weeks I’ll try to describe the steps taken in order to decompose ASTERIX, Eurocontrol variable length binary protocol for Radar message Exchange. According to Eurocontrol, ASTERIX Stands for: All Purpose STructured Eurocontrol SuRveillance Information EXchange. Try not to confuse it with Asterisk (the opensource telephony project).

One of the major challenges was to achieve a great reuse of the domain classes designed to mirror the protocol fields since ASTERIX employs a design where some packet types are simply extensions of previous released versions, therefore sharing a huge amount of fields. One example is the Cat034 and Cat002 packets. If you are unfamiliar with ASTERIX, Eurocontrol site has a great deal of documentation on ASTERIX format.


Fluent Interfaces for Searching Hibernate Entities

Recently I changed my job, I am now at my previous employee… the reasons are out of scope for this post so let’s get back to the real point… in the process of developing the software that is scope of this new project a colleague suggested that we should adopt fluent interfaces for the creation of our entities. Sincerely, the term fluent interfaces was new to me but I’ve been a Hibernate user since two dot something (a long time ago) and since Hibernate makes extensive usage of fluent interfaces it turned out not to be that much news. In fact I have to admit that fluent interfaces increase code readability a lot!

Fluent Interfaces for POJOs

The code for the POJOs fluent interfaces was based on a post on Code Monkeyism blog. After adopting this code we started to think about further improvements to that class that made this handle collections, direct access for public fields whenever possible and we crafted an extension to JavaBeans style that let us hide the setter for collections, make the getter return an immutable list and added an addSomething method (all following further good practices that were also on Code Monkeyism).
Project development went on then we hit the point in where we needed to develop our entity search infrastructure. Our first conclusion was that we would probably need a counting facility for our queries and this would probably rule out the usage of NamedQueries since we would need two for each of them (one for counting and one for returning the data – anyways we are still considering if we’ll ever use counting queries for pagination), next obvious point: pagination. But the key point for our search infrastructure was that we had DAOs and we weren’t willing to expose HibernateSession or EntityManagers and we are going to have predefined searches that could have different parameters (some mandatory, some optional). This predefined search scenario is one of the ideal applications for something similar to the fluent interface idea for creating POJOs except that if we resorted to dynamically creating queries this could turn to a huge effort (not to mention a huge wheel reinvention since Hibernate has the incredible Criteria API).

The Solution for Searching Hibernate Entities

The trick for having fluent interfaces on top of hibernate criterias is based around two things:

  1. Basing the solution on top of Criteria ability of expressing constraints and the most important part: creating subcriterias upon entity association navigation
  2. Providing an way of composing the fluent interfaces for expressing collections and integrating this with the concept of the subcriteria

The first point is to implement the InvocationHandler class and making it able to store a Criteria instance which will be responsible for storing the Root Criteria instance and also the SubCriterias (see Subcriteria class) when InvocationHandlers are created for the entity associations.

public Object invoke(Object proxy, Method m, Object[] args) throws Throwable {
	try {
		String name = m.getName();
      		if ("list".equals(name)) {
        		return criteria.list();
		else if ("count".equals(name)) {
			return criteria.setProjection(Projections.rowCount());
		else {
			if (args == null || args.length == 0) {
				Criteria subCriteria = criteria.createCriteria(name);
				//create another InvocationHandler instance enclosing the newly created subCriteria and pass it to Proxy.newProxyInstance	and return the just created proxy
			else {
				criteria.add(Restrictions.eq(name, args[0]));
    return proxy;

The next step is to develop the interfaces, in order to make it easier to understand the idea I’ll provide a sample domain model to make things clearer.

Sample Domain Model

We are going to use the well known Order, OrderItem, Product model in order to make things clearer. Below is a class diagram for our model:

Sample Domain Model

Sample Domain Model

Our fluent interfaces for those classes would (hugely simplified) be:

public interface OrderFluentSearch {
	OrderFluentSearch code(String code);
	OrderItemFluentSearch items();
	List<Order> list();
	long count();

The OrderItemFluentSearch would be as follows:

public interface OrderItemFluentSearch {
	OrderItemFluentSearch quantity(int quantity);
	ProductFluentSearch product();
	List<OrderItem> list();
	long count();

And finally our ProductFluentSearch interface would be similar to the one below:

public interface ProductFluentSearch {
	ProductFluentSearch code(String code);
	List<Product> list();
	long count();

Suppose now that you had a with() method in your Order DAO similar to the one on the Code Monkeyism blog post a search for Orders with an item that references a product with code “10” would be as easy as:

	OrderFluentSearch search = orderDAO.with();

Pretty readable isn’t it? But it still has some drawbacks…
One obvious is that it isn’t possible to express it into a single line. The other one is that we still have an extra interface that wont get automatically renamed when we rename the entity class but I feel like there is one way of fixing this.

Future improvements

Although this idea presents a great improvement regarding code readability of Criteria queries expressed in an application it is still vulnerable to renames on classes attributes. One possibility that I’ll give a try is to use a DSL similar to the one employed on JMock that uses a combination of custom crafted CGLIB classes based on entity classes and a ThreadLocal composition process that allows methods that have no linking between them to share context.


JBoss Seam application blueprint

I am a JBoss Seam user since its v2.0 alpha something (back in 2007). I still remember the hard decision in picking up the alpha and later beta version instead of sticking with the stable but feature missing 1.2.
Seam is an incredible framework for web applications, it covers the majority of the requirements you have in such applications. But this tremendous power comes with a price, it is often hard to find the best combination in the first application you develop. Its variety of contexts combined with the possibilities of handling the page data through injection and outjection results in a challenge for the Seam newbie, not to mention the possibilities of handling the flow between pages…

That’s the reason I thought about developing a blueprint for Seam applications… I know that for now SeamGen generated applications are considered blueprints for Seam applications but I really feel like there are plenty of missing parts. SeamGen applications don’t use Conversation scopes (only to give an example cause the list of Seam features that are not explored by a SeamGen application are enormous). I am also sure that I won’t develop THE blueprint for a Seam application but at least I’ll try to document all the knowledge I’ve gathered from a few projects and a few POCs.

Managing page flow and conversation demarcation

Seam has a neat feature for specifying page flow: pages.xml and files. The first one is able to specify navigation rules for every view in the application and should be used for specifying global rules as when Exceptions are thrown or for actions that have the same result independent of the current view. This post presents good practices for defining page flows using Seam. There is even one thing covered in this post that I recommend: specifying conversation demarcation on files but there is one thing suggested on this post that I need to investigate carefully: the impact of joining conversations instead of spawning new ones. I agree with the post that not joining may spawn unwanted conversations and thus increase memory usage but I cant say beforehand which are the drawbacks of joining a conversation.
A good way of redirecting the user to a new view with total control over conversation propagation is by using the s:button tag. This tag has one property named view that specifies the target view-id and another one called propagation in which you can specify the conversation propagation. The following example redirects to a view named “newUser.xhtml” and suspending the current conversation scope:

<s:button value="New User" view="/newUSer.xhtml" propagation="none"/>

And if this view required a conversation scope this could be specified on with the <begin-conversation> tag.

Authentication and authorization

Everytime that you are developing an application targeting a deploy on a full fledged J2EE application server and if possible, prefer to delegate authentication and authorization to the Application Server JAAS. The following code when configured on Seam components.xml delegates to the specified JAAS domain:

<security:identity jaas-config-name="myJAASDomainName" remember-me="true"/>

This avoids the rather limited approach of specifying an authenticator method on an application Seam component since a JAAS authenticated user will be propagated all the way down the EJB container (in case you are using it).

That’s all for now I’ll try to update this post as soon as I format more knowledge around JBoss Seam.


Instrumentation and Monitoring as means of problem detection and resolution

While designing the last big project I worked on we chose to place some monitoring code in specific points of the application. Those metrics were meant for emitting information targeted at business users but also to technical users, this information was then split according to audience through the respective JMX agents used (was it a Tivoli ITCAM agent or a JOPR agent).
At first we were expecting only that this could provide us valuable information for live environment such as when we had anything abnormal on the legacy software we were integrating with but we ended up noticing that this could also provide us valuable informal of the operation behaviour of our software and the best part: on live environment. And in fact it turns out that this is so common that we can find others pointing into this direction as well.
The picture below presents an overview of the application in question as well as the instrumentation points.

Application overview with instrumentation points

Application overview with instrumentation points

The instrumentation points gathered the following information:

  • The instrumentation point on the left of the OuterMDB collected the sum of the messages processed in the current and last hour as well as the messages per second.
  • The instrumentation point on depicted in the top of the OuterMDB collected the sum of the time spent in pre-processing as well as the number of times pre-processing was invoked.
  • The instrumentation point on top of the InnerMDB collected the sum of messages processed in the current and last hour as well as the messages per second.
  • And finally, the instrumentation point on the bottom of the InnerMDB collected the sum of time spent in the communication with the legacy system as well as the average of processing time per request in the current and last hour, the minimum and maximum times of processing for current and last hour and the amount of request processed as well as the timeouts.

The comparison between the number of messages processed in the InnerMDB and OuterMDB could provide us means of comparing how we should size the Thread pools for each of these components. This is such an information that would be harder to obtain by any other means. We also used those metrics for detecting misfunction on the pre-processing legacy software that was invoked by our PreProcessing component, this way we could switch off pre-processing and avoid a negative impact on overall system performance.
But this monitoring was key to the detection of a misbehavior of our JCA connector. A misimplementation of the equals/hashcode method pair for the ManagedConnection lead to a huge performance degradation after a few hours of software operation. By using our monitoring infrastructure we could isolate the problematic code area. Sure it did not point towards the equals/hashcode pair but it was clear that it was related to connection acquisition code.
Finally, the monitoring in our application provided us an effective way of monitoring the legacy application we were communicating with since it did not provide any native way of monitoring its operation. We were then able to instantly respond to outages on the legacy application through metrics collected on our software.


Designing for Extreme Transaction Processing – Memento Pattern

Applications with huge transaction processing requirements or tight response times always result in a careful architecture design. Special care has to be taken on how to partition data in order to be able to better parallelize load. This task can be even trickier if some of the data isn’t suitable for proper partitioning. Being ableto partition data is an essential requirement for some of the elastic frameworks around – some of them even demand that data is local to the node processing the request while others may still work but with a significant performance drop if it is not. This negative impact due to difficulties in the proper partitioning and grouping of data can be mitigated at the cost of increased memory usage as it is always possible to increase replication forcing data to be in multiple nodes to avoid serialization upon request as there isn’t the concept of gravitation on such tools as there was on previous distributed caches.
Billy Newport (the mind behind IBM eXtreme Scale) proposed a classification scheme for the different styles of Extreme Transaction Processing (XTP) systems he identified:

  • Type 1: Non partitionable data models – Applications in this category usually perform adhoc queries that can span an unpartitionable amount of data which leaves scaling up as the only scaling possibility.
  • Type 2a: Partitionable data models with limited needs for scaling out – Applications in this category already present means of partitioning data and thus load but still on a limited fashion. Hence those applications can still be built using regular application servers backed by Sharded Databases, elastic NoSQL databases or IMDGs but still won’t use a Map/Reduce pattern for processing requests.
  • Type 2b: Partitionable data models with huge needs for scaling out – Finally this category is composed of applications that presents means of partitioning data as on Type 2a but instead of being exposed to limited load, Type 2b are applications are pushed to the limit. Type 2b applications usually evolved from Type 2a and were faced with the scalability limits of traditional architectures moving from regular application servers to Map/Reduce frameworks. It is worth noting that it is possible to scale to similar load requirements with traditional architectures based on regular application servers but they’ll usually require more human resources for administration as well as more hardware infrastructure.

Among the list of common items on a classic Transaction Processing (TP) system that must be avoided on an XTP system are two phase commit resources. Note that it is not that you can’t have a two phase commit resource as part of an XTP but they must be carefully evaluated and excess will certainly compromise system performance.
Billy presented (at the eXtreme Scale meet the experts session@IBM Impact 2010) an excellent example scenario on where an excess of two phase commit resources could undermine the performance of an e-commerce solution. In his example, the checkout process of the referred e-commerce site would, as part of a single transaction, scan the whole shopping cart and for each product in the cart it would then perform an update on a fictional inventory database as well as updating a few other two phase commit databases from other departments. If any of the items were out of stock transaction would be rolled back and an error would be presented to the user. It is obvious that this hypothetical system wouldn’t be able to scale much since the cost related to the long time span of this transaction combined with the number of resources involved would be tremendous. Not to mention that there would be a huge load on the transaction manager.
Instead of using this rather obvious approach, Billy suggested that these updates could be chained and combined with the Memento design pattern – the updates would then be sequentially applied and if any of them failed the Memento pattern would then be used to revert the changes that have already been applied. Using this approach the contention on all the databases involved would be minimal and the application requirement would still be fulfilled.
This is one of the many examples we can point that need to be carefully observed when designing XTP systems.


Replacing Persistence Caching with In Memory DataGrids

Anyone that designed an enterprise system in Java with non-functional requirements that demanded tight response times faced the challenge of tuning the overall solution. If you are using Hibernate for persistence your first idea is to turn on Hibernate Caching but unfortunately it only Caches loads through Entity Id and sometimes you need to lookup through a secondary unique key. Another solution you might think is turning on Query Cache but as this post points out is more harmful to scalability and latency than you can ever imagine. Although Hibernate Docs and the post referenced above suggest that using @NaturalId annotation and Querying using Criteria API and a specially crafted Restriction through naturalId method would alleviate much of that problem it turns out that updates on related tables hugely undermines the improvements NaturalId queries may bring as this post suggest:

[…] Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. […]

Source: Hibernate: Cache Queries the Natural Id Way
Add to this the fact that second level hibernate caches are data caches, not object caches so even on the best case of a cache miss you’d still have the cost of hydrating the object (as it is called on Hibernate terminology).

Object Caching

So, after giving up on all those previous attempts you’ll start considering taking the next obvious path: using/implementing an Object Caching. Have you noticed that I mentioned implementing? Yes, implementing. Now you’ll probably have to take care of the loading process in the case of cache misses, in fact you’ll be using a pattern called Side Caching.
But this can become a tedious code for maintaining and very prone to errors not to mention that implementing asynchronous write-behind won’t be an easy task.

In Memory DataGrids

Fortunately newer solutions marketed as In Memory DataGrids (or IMDG for short) provide an almost automagic way of plugging specially crafted classes that would be responsible for querying the underlying persistent storage on Cache misses, not to mention that they also provide improved partitioning and a great amount of other facilities usually not available on regular caches.
WebSphere eXtreme Scale Loaders and JBoss InfiniSpan CacheStore are examples of such specially crafted extensions for IMDGs. Those APIs also allow for updating the underlying storage (even asynchronously) with the changes that were passed to the in memory cache. One thing still missing (on different extents) on both solutions is the ability of querying the underlying store using a natural key. On eXtreme Scale there is already a framework for loading JPA entities but it lacks the ability of querying the underlying storage by anything different than the Cache Key that in those cases defaults to Entity Id (eXtreme Scale Query API is capable of querying the cache through any eXtreme Scale entity attribute but it only sees the data in memory). InfiniSpan on the other hand as of the 4.0.0 release does not provide anything similar to eXtreme Scale JPALoader or JPAEntityLoader it still sees the database (as on its JDBC Based CacheStore) as means of providing offloading and surviving reloads but never as means of querying data based on application defined format (eg.: application entities). Instead it uses its own format for storing data which limits its capability of loading from application storage in case of cache misses.

Querying IMDGs through Natural Keys

In summary, if you need to query through a natural key, in both cases you’ll need to roll your own Loader/CacheStore. And by querying through a natural key I mean something along the lines of what is described on Bill Burke’s EJB3 book:

The first way to define a primary-key class (and, for that matter, composite keys) is to use the @IdClass annotation. Your bean class
does not use this primary-key class internally, but it does use it to interact with the entity manager when finding a persisted object through
its primary key.
In your bean class, you designate one or more properties that make up your primary key, using the @Id annotation. These properties
must map exactly to properties in the @IdClass.

This way you can query the Cache using something in the lines of:

MyObjectNaturalKey key = new MyObjectNaturalKey("keyField1", "keyField2");
MyObject o = (MyObject) cache.get(key);

And you’ll have the guarantee that if this object exists either already on memory or in JPA EntityManager it’ll be returned.

Implementing a InfiniSpan CacheStore

Infinispan’s CacheStore interface has the following methods:

package org.infinispan.loaders;
public interface CacheStore extends CacheLoader {
   void store(InternalCacheEntry entry) throws CacheLoaderException;
   void fromStream(ObjectInput inputStream) throws CacheLoaderException;
   void toStream(ObjectOutput outputStream) throws CacheLoaderException;
   void clear() throws CacheLoaderException;
   boolean remove(Object key) throws CacheLoaderException;
   void removeAll(Set<Object> keys) throws CacheLoaderException;
   void purgeExpired() throws CacheLoaderException;
   void prepare(List<? extends Modification> modifications, GlobalTransaction tx, boolean isOnePhase) throws CacheLoaderException;
   void commit(GlobalTransaction tx) throws CacheLoaderException;
   void rollback(GlobalTransaction tx);
   public CacheStoreConfig getCacheStoreConfig();

If you need further explanation of what each method does, see its JavaDoc. InfiniSpan team suggests having a look on DummyInMemoryCacheStore in order to have a general idea of how should a CacheStore be implemented but still I have the feeling it misses an explanation of how InternalCacheEntries should be instantiated and in fact there is a factory class for it.
At last, if you need further information JBoss InfiniSpan Wiki has an overview of the builtin CacheStore classes.

Implementing an eXtreme Scale Loader

eXtreme Scale has plenty information on how to write a Loader as well. InfoCenter has two important pages: the first one presents a general overview on how to write a loader, the second one presents one important concept of eXtreme Scale: its tuples. If you implement a Loader that stores its data based on tuples you are free from having to have the stored classes (in case of custom application classes) in the same classloader as the container process.
WebSphere eXtreme Scale Wiki has a page dedicated to explaining how to write a Loader which presents the signature of a class that implements the Loader interface:

public interface Loader {
    static final SpecialValue KEY_NOT_FOUND;
    List get(TxID txid, List keyList, boolean forUpdate) throws LoaderException;
    void batchUpdate(TxID txid, LogSequence sequence) throws LoaderException, OptimisticCollisionException;
    void preloadMap(Session session, BackingMap backingMap) throws LoaderException;

Apart from the Wiki there’s a post on Billy New Port’s DevWebSphere blog
that has the source code available for what he calls the PureQuery Loader for eXtreme Scale. It is a loader capable of loading POJOs from SQL queries.
Finally if you need a starting guide for eXtreme Scale, have a look at ObjectGrid programming model guide and samples that although slighlty old is still a good starting guide for WebSphere eXtreme Scale.
Lastly I’d like to say that Packt Publishing book on eXtreme Scale is also a good reference for it.


Mapping JavaEE and .NET stack components

Recently I started to research how to develop on .NET as I’ve been developing for JavaEE.
First thing that came to my mind as a JavaEE architect was: “Okay, IIS and ASP.NET are some of the presentation tier alternative I have for .NET but what for business tier?” or rephrasing as someone that has
been using JavaEE for a long time: “What would be the .NET EJB?”.
JavaEE-DotNETComparisonFirst thing I missed was the concept of the MDB that in .NET stack that seems to be replaced by an MSMQ message trigger.

Another major .NET difference is that even though it seems to have the concept of the VM process (as Java does) it has the concept of a shared library, or the Global Assembly Cache. If you think in terms of Java then you have to either manually replicate the jar or share them using a shared storage and having it on the classpath.

As we are on the classpath subject, .NET and its CLR seems to avoid (and limit it to J#) the concept of a ClassLoader. Although sometimes problematic, the Java concept of the ClassLoader allows very sofisticate scenarios of application composition and also for hot code generation (by the application) at runtime.

I hope this post has helped anyone in the situation as I were before and I also hope it does not sound as a comparison of which platform is better as you might guess that I sincerely know that each one has its applications, strengths and weaknesses.


Blog Stats

  • 353,165 hits since aug'08

%d bloggers like this: