Archive for March, 2019

24
Mar
19

Roll your own end-to-end solution for hosting your next BIG Project

Recently, a great friend – that happens to also be a former ex-business partner – contacted me to discuss how we could apply machine learning to reduce the costs associated with an activity related to the financial department of every company here in Brazil – and worldwide I guess. To be more precise, he wished to understand if it was possible to use any classification algorithm to automatize part of this process.

After a few days performing a preliminary investigation we found the problem was worth it and we decided to move forward with a PoC. Even though he already had a few servers for hosting our PoC, I decided that I wouldn’t risk interfering with his production environment and that instead I would setup a minimal server under any cheap hosting solution. Ideally I wished I had the same infrastructure I am currently working with: a Kubernetes cluster running on top of Rancher Server but this would require at the bare minimum one server with more than 2GB of memory (with overlapping planes, that I personally don’t recommend). Since such a huge environment was out of consideration for a PoC I opted to stick with a machine running Rancher OS, this way I could still have at least a minimal Linux that could be able of running Docker containers. But still one thing surprised me: Rancher OS doesn’t run on machines with less than 2GB of memory, my first attempt was to run it on a machine with only 1GB and it got stuck on an endless loop boot. But it wasn’t a huge problem since it only increased my monthly fee from U$5 to U$10 and I got twice the memory and more storage.

Infrastructure

After setting up the Rancher OS instance I started to setup the stack for hosting the code and also running the solution after the first week of coding.

Git Hosting

I could have chosen Bitbucket for hosting my code but I thought it would be better to run something on my own premises, Gitlab could have been an option but its HUGE memory footprint is a no-go. Then I found a thread on reddit mentioning gitea as an interesting alternative to Gitlab. I decided to give it a try and it was a huge surprise! It has an impressive low memory footprint: 45Mb when idle and occasionally it spikes up to 60Mb and goes down again, not to mention that it had everything I needed.

Reverse Proxy

Then it was time to setup the reverse proxy that would take care of routing every HTTP(S) request to one of my services – No kubernetes, no ingress. Remember? – be it part of the dev infrastructure, be it part of the solution itself. Nginx to the rescue! The most simple setup I could think of was to have a directory for hosting a shell script that was responsible for running the Nginx container and doing a bind mount for the default.conf file (from the host pwd/default.conf to the container /etc/nginx/conf.d/default.conf), the file simply had a bunch of server sections with server_names and proxy_pass directives.

By that time, this was how the server looked like:

I was already using DuckDNS to avoid having to memorize the server ip address any time I had to SSH to it. Then I realized I needed HTTPS and therefore host names, for a PoC in such initial stages, buying a domain would be an overkill, so, DuckDNS again, this time for the actual solution.

Free HTTPS certificates

If spending with the domain wasn’t being considered by that time, with the HTTPS certificate was also a no-go. But with the advent of Let’s Encrypt we can now have SSL on our solutions without spending a penny. And requesting the certificate is even easier if you have the possibility of running CertBot’s public available docker container. As I wasn’t willing to investigate how to use certbot nginx plugin, I opted to run its simplest mechanism of issuing a certificate: the one you provide it a public available directory on your HTTP server in which it writes the content it receives from CertBot service during the certificate granting process.

Lightweight CI/CD

After struggling to run git on RancherOS (I even tried to run git as a container but I had so much file permissions issues that I gave up soon on this approact – repositories were cloned as root) I thought it’d be a good reason to anticipate the deployment of a CI/CD solution. I ended up ditching Jenkins for the same reason I had already dropped Gitlab: impressive memory footprint. After some research I found Buildbot: a python based minimalist CI/CD solution. Apart from the stock Buildbot Master Container, I’ve created a few worker containers: one based on stretch with support for PyEnv (in order to have a good support for scikit), a similar one based on Alpine, one Alpine based with Node (for building the UI) and a few other base containers (check them out at my Dockerhub account).

Databases

Finally, I had to deploy MySQL and PostgreSQL for both the dev stack and also for my own solution that was being developed. PostgreSQL was deployed as is but for MySQL I opted to slim its memory usage a little bit by following this post.

Wrap up

The project I am working on is based on Python and uses scikit-learn, Flask and SQLAlchemy with Alembic for the backend (running on Waitress) and Angular for the frontend.

The following picture provides an overview of the current containers and components running on my U$10 server:

The idea of this post was to give the overview of a recipe on how to build a cheap but comprehensive solution for hosting you next BIG idea. In the next post I’ll try to drill down on the details of setting up each of the components that were used. Feel free to comment if you have any questions.

 

 

Advertisements



ClustrMaps

Blog Stats

  • 372,749 hits since aug'08
Advertisements

%d bloggers like this: