Posts Tagged ‘python

24
Mar
19

Roll your own end-to-end solution for hosting your next BIG Project

Recently, a great friend – that happens to also be a former ex-business partner – contacted me to discuss how we could apply machine learning to reduce the costs associated with an activity related to the financial department of every company here in Brazil – and worldwide I guess. To be more precise, he wished to understand if it was possible to use any classification algorithm to automatize part of this process.

After a few days performing a preliminary investigation we found the problem was worth it and we decided to move forward with a PoC. Even though he already had a few servers for hosting our PoC, I decided that I wouldn’t risk interfering with his production environment and that instead I would setup a minimal server under any cheap hosting solution. Ideally I wished I had the same infrastructure I am currently working with: a Kubernetes cluster running on top of Rancher Server but this would require at the bare minimum one server with more than 2GB of memory (with overlapping planes, that I personally don’t recommend). Since such a huge environment was out of consideration for a PoC I opted to stick with a machine running Rancher OS, this way I could still have at least a minimal Linux that could be able of running Docker containers. But still one thing surprised me: Rancher OS doesn’t run on machines with less than 2GB of memory, my first attempt was to run it on a machine with only 1GB and it got stuck on an endless loop boot. But it wasn’t a huge problem since it only increased my monthly fee from U$5 to U$10 and I got twice the memory and more storage.

Infrastructure

After setting up the Rancher OS instance I started to setup the stack for hosting the code and also running the solution after the first week of coding.

Git Hosting

I could have chosen Bitbucket for hosting my code but I thought it would be better to run something on my own premises, Gitlab could have been an option but its HUGE memory footprint is a no-go. Then I found a thread on reddit mentioning gitea as an interesting alternative to Gitlab. I decided to give it a try and it was a huge surprise! It has an impressive low memory footprint: 45Mb when idle and occasionally it spikes up to 60Mb and goes down again, not to mention that it had everything I needed.

Reverse Proxy

Then it was time to setup the reverse proxy that would take care of routing every HTTP(S) request to one of my services – No kubernetes, no ingress. Remember? – be it part of the dev infrastructure, be it part of the solution itself. Nginx to the rescue! The most simple setup I could think of was to have a directory for hosting a shell script that was responsible for running the Nginx container and doing a bind mount for the default.conf file (from the host pwd/default.conf to the container /etc/nginx/conf.d/default.conf), the file simply had a bunch of server sections with server_names and proxy_pass directives.

By that time, this was how the server looked like:

I was already using DuckDNS to avoid having to memorize the server ip address any time I had to SSH to it. Then I realized I needed HTTPS and therefore host names, for a PoC in such initial stages, buying a domain would be an overkill, so, DuckDNS again, this time for the actual solution.

Free HTTPS certificates

If spending with the domain wasn’t being considered by that time, with the HTTPS certificate was also a no-go. But with the advent of Let’s Encrypt we can now have SSL on our solutions without spending a penny. And requesting the certificate is even easier if you have the possibility of running CertBot’s public available docker container. As I wasn’t willing to investigate how to use certbot nginx plugin, I opted to run its simplest mechanism of issuing a certificate: the one you provide it a public available directory on your HTTP server in which it writes the content it receives from CertBot service during the certificate granting process.

Lightweight CI/CD

After struggling to run git on RancherOS (I even tried to run git as a container but I had so much file permissions issues that I gave up soon on this approact – repositories were cloned as root) I thought it’d be a good reason to anticipate the deployment of a CI/CD solution. I ended up ditching Jenkins for the same reason I had already dropped Gitlab: impressive memory footprint. After some research I found Buildbot: a python based minimalist CI/CD solution. Apart from the stock Buildbot Master Container, I’ve created a few worker containers: one based on stretch with support for PyEnv (in order to have a good support for scikit), a similar one based on Alpine, one Alpine based with Node (for building the UI) and a few other base containers (check them out at my Dockerhub account).

Databases

Finally, I had to deploy MySQL and PostgreSQL for both the dev stack and also for my own solution that was being developed. PostgreSQL was deployed as is but for MySQL I opted to slim its memory usage a little bit by following this post.

Wrap up

The project I am working on is based on Python and uses scikit-learn, Flask and SQLAlchemy with Alembic for the backend (running on Waitress) and Angular for the frontend.

The following picture provides an overview of the current containers and components running on my U$10 server:

The idea of this post was to give the overview of a recipe on how to build a cheap but comprehensive solution for hosting you next BIG idea. In the next post I’ll try to drill down on the details of setting up each of the components that were used. Feel free to comment if you have any questions.

 

 

Advertisements
02
Nov
10

Setting up Mercurial on Apache

Recently I started investigating the two major Distributed Version Control Systems (DVCS) mainly due to the historical SVN deficiency in handling renames. You may say that you don’t need a DVCS for tracking renames … Yes, in fact I know… it was only an excuse to start learning a DVCS after all there are plenty differences between a regular VCS and a DVCS.

My first option

After analysing whether I should stick with Git or Hg I decided to go with Hg since I have a trauma of using native applications originally written for Linux on Windows. Not that I am a Windows only user, in fact for a long time I had been using Linux as a Desktop option instead of Windows but you can’t deny that there is still a huge crowd that won’t switch from Windows over anything. The problem with native Linux applications that highly depend on a collection of shell scripts and other Linux dependent solutions is that they usually have a suboptimal performance on Windows, either they miss some functionality or they depend on a myriad of rare libraries. Have said that, I went with Mercurial on my first attempt.

First attempt with Hg

I wasn’t really lucky on my first attempt to install Hg. My first mistake was to pay too much attention to python.org’s warn on main downloads page:

If you don’t know which version to use, start with Python 2.7;

This warning is probably updated after each stable version is released but if I had seen the other advice on releases page I’d have thought twice:

Consider your needs carefully before using a version other than the current production version.

I chose to download latest python and build Hg myself and obviously it prove to be not that smart as it was my first experience with Hg.

Comes Git

As I gave up on Hg I decided to give a try on Git. First thing was to download msysGIT and surprisingly enough (following this tutorial) it was rather easy to set it up but its drawbacks were related to its tooling. As soon as I setup Git and tried to clone a repository over HTTPS with authentication I realized that JGIT does not support authentication over HTTP and as it was what I planned (in fact SSH on Windows is not very advisable since I have never seen a good free port of a SSH Server for Windows).
I had to get back to Hg but I decided to check whether I was taking a complex approach since Git employs a similar approach and had been much easier, I used what I learned with the tutorial used for Git setup.

Second attempt on Hg

As already mentioned, I decided to do something similar to what I done on Git, so, I chose CGI. I’ll highlight the important points for the installation here:

  • The file to be downloaded is now named hgweb.cgi and not hgwebdir.cgi
  • Download python 2.5 as noted here
  • Unzip library.zip as noted here and edit the sys.path.insert line and the first line (the one with the #! (sha-bang) ) to point to python executable
  • Configure style and templates entries under [web] on hgweb.config
  • Configure an entry under [paths] for each repository (eg.: repository = c:/users/hg/repository)
  • Enable pushing for the configured repositories
  • Configure authorization on Apache. Either using htpasswd or ldap, but authorization is really recommended.
  • Configure SSL on Apache (there is a short explanation on how to do this in portugues over here, the only thing is that SSLPassPhraseDialog builtin is not supported on Windows, so instead, provide a .bat file with a simple @echo yourpassword and use exec instead of builtin (eg.: SSLPassPhraseDialog exec:C:/Progra~1/Apache~1/Apache2.2/bin/passphrase.bat

Perform an hg init for each configured repository, start Apache and try cloning the repository over HTTPS (remember to provide your credentials if you configured any authentication method).




ClustrMaps

Blog Stats

  • 372,288 hits since aug'08
Advertisements

%d bloggers like this: