Archive for the 'Uncategorized' Category


Roll your own end-to-end solution for hosting your next BIG Project

Recently, a great friend – that happens to also be a former ex-business partner – contacted me to discuss how we could apply machine learning to reduce the costs associated with an activity related to the financial department of every company here in Brazil – and worldwide I guess. To be more precise, he wished to understand if it was possible to use any classification algorithm to automatize part of this process.

After a few days performing a preliminary investigation we found the problem was worth it and we decided to move forward with a PoC. Even though he already had a few servers for hosting our PoC, I decided that I wouldn’t risk interfering with his production environment and that instead I would setup a minimal server under any cheap hosting solution. Ideally I wished I had the same infrastructure I am currently working with: a Kubernetes cluster running on top of Rancher Server but this would require at the bare minimum one server with more than 2GB of memory (with overlapping planes, that I personally don’t recommend). Since such a huge environment was out of consideration for a PoC I opted to stick with a machine running Rancher OS, this way I could still have at least a minimal Linux that could be able of running Docker containers. But still one thing surprised me: Rancher OS doesn’t run on machines with less than 2GB of memory, my first attempt was to run it on a machine with only 1GB and it got stuck on an endless loop boot. But it wasn’t a huge problem since it only increased my monthly fee from U$5 to U$10 and I got twice the memory and more storage.


After setting up the Rancher OS instance I started to setup the stack for hosting the code and also running the solution after the first week of coding.

Git Hosting

I could have chosen Bitbucket for hosting my code but I thought it would be better to run something on my own premises, Gitlab could have been an option but its HUGE memory footprint is a no-go. Then I found a thread on reddit mentioning gitea as an interesting alternative to Gitlab. I decided to give it a try and it was a huge surprise! It has an impressive low memory footprint: 45Mb when idle and occasionally it spikes up to 60Mb and goes down again, not to mention that it had everything I needed.

Reverse Proxy

Then it was time to setup the reverse proxy that would take care of routing every HTTP(S) request to one of my services – No kubernetes, no ingress. Remember? – be it part of the dev infrastructure, be it part of the solution itself. Nginx to the rescue! The most simple setup I could think of was to have a directory for hosting a shell script that was responsible for running the Nginx container and doing a bind mount for the default.conf file (from the host pwd/default.conf to the container /etc/nginx/conf.d/default.conf), the file simply had a bunch of server sections with server_names and proxy_pass directives.

By that time, this was how the server looked like:

I was already using DuckDNS to avoid having to memorize the server ip address any time I had to SSH to it. Then I realized I needed HTTPS and therefore host names, for a PoC in such initial stages, buying a domain would be an overkill, so, DuckDNS again, this time for the actual solution.

Free HTTPS certificates

If spending with the domain wasn’t being considered by that time, with the HTTPS certificate was also a no-go. But with the advent of Let’s Encrypt we can now have SSL on our solutions without spending a penny. And requesting the certificate is even easier if you have the possibility of running CertBot’s public available docker container. As I wasn’t willing to investigate how to use certbot nginx plugin, I opted to run its simplest mechanism of issuing a certificate: the one you provide it a public available directory on your HTTP server in which it writes the content it receives from CertBot service during the certificate granting process.

Lightweight CI/CD

After struggling to run git on RancherOS (I even tried to run git as a container but I had so much file permissions issues that I gave up soon on this approact – repositories were cloned as root) I thought it’d be a good reason to anticipate the deployment of a CI/CD solution. I ended up ditching Jenkins for the same reason I had already dropped Gitlab: impressive memory footprint. After some research I found Buildbot: a python based minimalist CI/CD solution. Apart from the stock Buildbot Master Container, I’ve created a few worker containers: one based on stretch with support for PyEnv (in order to have a good support for scikit), a similar one based on Alpine, one Alpine based with Node (for building the UI) and a few other base containers (check them out at my Dockerhub account).


Finally, I had to deploy MySQL and PostgreSQL for both the dev stack and also for my own solution that was being developed. PostgreSQL was deployed as is but for MySQL I opted to slim its memory usage a little bit by following this post.

Wrap up

The project I am working on is based on Python and uses scikit-learn, Flask and SQLAlchemy with Alembic for the backend (running on Waitress) and Angular for the frontend.

The following picture provides an overview of the current containers and components running on my U$10 server:

The idea of this post was to give the overview of a recipe on how to build a cheap but comprehensive solution for hosting you next BIG idea. In the next post I’ll try to drill down on the details of setting up each of the components that were used. Feel free to comment if you have any questions.




Websphere PMI: enabling and viewing data

For those who ever needed to have a deeper look at application internals that may be impacting performance probably had this impression:

  • System.out.println with System.nanoTime (or currentTimeMillis) is tedious, errorprone and limited
  • A profiler is an overkill not to mention cumbersome (and unavailable for certain platforms [eg.:tptp on AIX]*)
  • This is the scenario where Websphere PMI is a killer feature.

    Imagine that your application isn’t performing as expected. Many can be the reasons for the poor performance. I’ve faced myself a scenario where the application was waiting a long time for getting a JMS connection from Websphere internal provider since its default configuration of 10 connections maximum isn’t acceptable for any application with performance requirements of even 100 transactions per second.

    Enabling PMI

    By default, Websphere 6.1 ND comes with basic PMI metrics enabled. These include for example:

    • Enterprise Beans.Create Count
    • JDBC Connection Pools.Wait Time
    • JDBC Connection Pools.Use Time

    If you need anything more than the default, you can change under:

    Monitoring and Tuning > Performance Monitoring Infrastructure (PMI)

    then click on the desired server.

    After you have chosen the desired metrics (remember that more metrics involve more CPU impact on runtime), go to the following menu:

    Monitoring and Tuning > Performance Viewer > Current Activity

    Now you need to check if your server is in fact already collecting data, if it is already enabled but not collecting, Collection Status will show Available. In order to start collecting, check the desired server and click Start Monitoring button. After clicking the button it will now show Monitored on the status column.

    Now you can click on the desired server and tick for example one of your connection pools under the tree on the left, you should see an structure similar to the below:

    Performance Modules > JDBC Connection Pools > Oracle XA Provider > yourDataSource

    After clicking the metric you’ll have a graph display of the current data and also a tabular with the snapshot of the indicator below.

    * note: Eclipse TPTP is said to be supported on AIX on version 4.3.1 but I have not been able to make it work


    Blog Stats

    • 372,287 hits since aug'08

    %d bloggers like this: