Archive for November, 2008


Creating load data for stress test with Benerator

Anyone working on a project with huge non-functional requisites such as handling high data volumes probably faced the challenge of creating the test data for simulating the system load. Depending on the system, the creation of the data might involve creating a simple program for outputting a csv or similar format.

But the problem is, this program from a simple program might end up in a complex program full of lookup tables (for cities, customer categories), complex data formats, etc. But, instead of hurrying to start coding your own program load creator, give a try on Benerator.

Benerator is based on an XML file that drives the data generation, it also comes with facilities for creating custom complex types, outputting data from pre-defined CSV files, etc.

Below is an example xml file from benerator:

<?xml version="1.0" encoding="iso-8859-1"?>
<setup	xmlns=""
		xsi:schemaLocation=" org/databene/benerator/benerator-0.5.5.xsd">
	<include uri="" />
    <create-entities name="customers" count="{${count}}">
        <attribute name="customer_id" type="long" min="1" max="9223372036854775807" distribution="step"/>
		<attribute name="zip" pattern="[0-9]{6}[0-9]{3}" />
		<variable name="countryCSV" type="entity" source="src/countries.csv" distribution="random"/>
        <attribute name="countryName" script="{${countryCSV.desc}}"/>
		<variable name="person" generator="org.databene.domain.person.PersonGenerator"/>
        <attribute name="personName" script="{${person.givenName} ${person.familyName}}"/>
		<attribute name="gender" pattern="[MF]{1}" />
		<variable name="birthDate"  generator="org.databene.domain.person.BirthDateGenerator" dataset="{${country}}"/>
        <attribute name="birthDate" script="{${birthDate?string('yyyyMMdd')}}"/>
        <consumer id="csv" class="org.databene.platform.csv.CSVEntityExporter">
<property name="uri" value="gen/customers.csv"/>
<property name="properties" value="customer_id,countryName,personName,gender,zip,birthDate"/>
<property name="datePattern" value="yyyyMMdd"/>

This, when executed through Benerator generates a CSV with 6 columns and the number of entries (or rows) will be configured in a properties file called
Here is a sneak peak of a possible output of the execution of the sample project:

1,China,Daniel Johnson,M,682587482,19760526
2,United States,Samantha Morgan,F,161370995,19580313
3,China,Hannah Moore,M,240197663,19780109
4,China,Ava Williams,M,723234265,19370319
5,China,Olivia Lee,F,864928029,19480817
6,Brazil,Hannah Morris,F,741226969,19350909

In order to make it simple for anyone willing to give a try on Benerator, I’ve provided a sample Eclipse project. Open the PDF and save the attached tar.gz, later import it into Eclipse using “Import Existing Projects” option.

Run the sample right clicking Benerator.launch, choosing “Run as” and then “Benerator”.

One last note: You need JDK6 in order to run Benerator since it need classes from javax.script package.


This post is outdated since benerator syntax changed a lot between 0.5 and 0.6 series (which documentation says is more aligned towards 1.0 final syntax). I’ll try to provide a new post dedicated to other benerator features and already covering the new syntax.


Impressions of going through a MSc

First post not really related to IT but I had to write about this.

I am not sure how much of the readers this blog has is currently undertaking a MSc but after spending the whole day trying to focus my research that was the impression I had:

A master is like jumping with a parachute. I have never jumped with a parachute in my life but still I’ll risk saying that for sure it is not easy to pick a spot and manage to stop on it after jumping. But why am I saying a MSc is like jumping in an airplane with a parachute? The whole universe you have for researching is enormous (this only for the subject you are researching) and if you want to stick to the time constraint (not to mention other constraints) you have and be able to finish your master that is probably one of the hardest parts (at least it is being for me).

Anyways, I am currently researching metrics for Knowledge Management, if anyone by accident is interested on it as well, feel free to leave a comment.


Blog Stats

  • 357,125 hits since aug'08

%d bloggers like this: