Posts Tagged ‘monitoring


Instrumentation and Monitoring as means of problem detection and resolution

While designing the last big project I worked on we chose to place some monitoring code in specific points of the application. Those metrics were meant for emitting information targeted at business users but also to technical users, this information was then split according to audience through the respective JMX agents used (was it a Tivoli ITCAM agent or a JOPR agent).
At first we were expecting only that this could provide us valuable information for live environment such as when we had anything abnormal on the legacy software we were integrating with but we ended up noticing that this could also provide us valuable informal of the operation behaviour of our software and the best part: on live environment. And in fact it turns out that this is so common that we can find others pointing into this direction as well.
The picture below presents an overview of the application in question as well as the instrumentation points.

Application overview with instrumentation points

Application overview with instrumentation points

The instrumentation points gathered the following information:

  • The instrumentation point on the left of the OuterMDB collected the sum of the messages processed in the current and last hour as well as the messages per second.
  • The instrumentation point on depicted in the top of the OuterMDB collected the sum of the time spent in pre-processing as well as the number of times pre-processing was invoked.
  • The instrumentation point on top of the InnerMDB collected the sum of messages processed in the current and last hour as well as the messages per second.
  • And finally, the instrumentation point on the bottom of the InnerMDB collected the sum of time spent in the communication with the legacy system as well as the average of processing time per request in the current and last hour, the minimum and maximum times of processing for current and last hour and the amount of request processed as well as the timeouts.

The comparison between the number of messages processed in the InnerMDB and OuterMDB could provide us means of comparing how we should size the Thread pools for each of these components. This is such an information that would be harder to obtain by any other means. We also used those metrics for detecting misfunction on the pre-processing legacy software that was invoked by our PreProcessing component, this way we could switch off pre-processing and avoid a negative impact on overall system performance.
But this monitoring was key to the detection of a misbehavior of our JCA connector. A misimplementation of the equals/hashcode method pair for the ManagedConnection lead to a huge performance degradation after a few hours of software operation. By using our monitoring infrastructure we could isolate the problematic code area. Sure it did not point towards the equals/hashcode pair but it was clear that it was related to connection acquisition code.
Finally, the monitoring in our application provided us an effective way of monitoring the legacy application we were communicating with since it did not provide any native way of monitoring its operation. We were then able to instantly respond to outages on the legacy application through metrics collected on our software.


Websphere PMI: enabling and viewing data

For those who ever needed to have a deeper look at application internals that may be impacting performance probably had this impression:

  • System.out.println with System.nanoTime (or currentTimeMillis) is tedious, errorprone and limited
  • A profiler is an overkill not to mention cumbersome (and unavailable for certain platforms [eg.:tptp on AIX]*)
  • This is the scenario where Websphere PMI is a killer feature.

    Imagine that your application isn’t performing as expected. Many can be the reasons for the poor performance. I’ve faced myself a scenario where the application was waiting a long time for getting a JMS connection from Websphere internal provider since its default configuration of 10 connections maximum isn’t acceptable for any application with performance requirements of even 100 transactions per second.

    Enabling PMI

    By default, Websphere 6.1 ND comes with basic PMI metrics enabled. These include for example:

    • Enterprise Beans.Create Count
    • JDBC Connection Pools.Wait Time
    • JDBC Connection Pools.Use Time

    If you need anything more than the default, you can change under:

    Monitoring and Tuning > Performance Monitoring Infrastructure (PMI)

    then click on the desired server.

    After you have chosen the desired metrics (remember that more metrics involve more CPU impact on runtime), go to the following menu:

    Monitoring and Tuning > Performance Viewer > Current Activity

    Now you need to check if your server is in fact already collecting data, if it is already enabled but not collecting, Collection Status will show Available. In order to start collecting, check the desired server and click Start Monitoring button. After clicking the button it will now show Monitored on the status column.

    Now you can click on the desired server and tick for example one of your connection pools under the tree on the left, you should see an structure similar to the below:

    Performance Modules > JDBC Connection Pools > Oracle XA Provider > yourDataSource

    After clicking the metric you’ll have a graph display of the current data and also a tabular with the snapshot of the indicator below.

    * note: Eclipse TPTP is said to be supported on AIX on version 4.3.1 but I have not been able to make it work


    Blog Stats

    • 375,197 hits since aug'08

    %d bloggers like this: