Faster CSV parsing in Python with pandas

Recently I had to write a Python script which needed to parse large gzip-ed CSV files. First I reached for the standard csv module, which is quite straightforward to use. Unfortunately, it proved to be too slow. In fact, I gave up waiting for it to parse even a single file containing about 30 million rows! And my whole data set had more than 300 million rows altogether.

Creating Drupal theme based on boostrap and material design from Google

For quite some time I wanted to have a Drupal theme that would allow me to have a bootstrap based theme together with the look and feel as dictated by the Material Design principles from Google. Mainly because I like the way these two look and because I myself have very limited abilities in creating pretty user interfaces.

High system load with Ubuntu 15.10 and 16.04 in AWS

I have recently upgraded my Amazon cloud t2.micro instance from Ubuntu 15.04 to 15.10. And soon I noticed a very strange high load on the machine - constantly above 2.0, and without doing much. The apparent culprit seemed kswapd0 process. I tried various simple steps to reduce the load, but nothing seemed to work, mainly because, as I already mentioned, no heavy activity was supposed to be happening.

clog - add colours to your log life (and files)

Developers often have to look at log files. This is sometimes boring, sometimes tedious, but it's a fact of life. It would really help if some special lines, such as errors, warnings etc. could stand out. One way to do it is to use the clog utility.

To make clog usage more streamlined from a terminal, you can define this bash function:

Cmake FindMySQL script

Cmake is a great build tool. It takes care of generating build system files, targeting, among others, Unix makefile, MS Visual Studio, ninja, XCode etc. There are various extra modules available for Cmake. One in particular allows the discovery of MySQL development files, such as headers and libraries. It can be found here. Unfortunately, it has a few issues: it doesn't work for Windows, and it has a bug towards the end. Other alternatives I found on the web were even worse.

TeamCity and publishing NUGet packages to private repositories

Recently I had to figure out how to publish a NUGet package to a private password-protected repository as part of a TeamCity CI build process. NUGet, for those who are not aware, is a tool for managing .NET software in a convenient manner. It is similar in its purpose to Maven, the Java packaging framework. The implementation details are, of course, quite different.

Scrapy and persistent cookie manager middleware

Scrapy is a nice python environment for web scraping, i.e. extracting information from web sites automatically by crawling them. It works best with anonymous data discovery, but nothing stops you from having active sessions as well. In fact, scrapy transparently manages cookies, which are usually used to track user sessions. Unfortunately, the sessions don't survive between runs. This, however, can be fixed quite easily by adding custom cookie middleware. Here is an example:


No more Mr JavaScript guy?

After doing some web development work recently, I have clearly remembered why I hate JavaScript so much. Not only is it ugly as a language, lacking in type checking or even decent syntax for inheritance (actually faking it with various workarounds), but the actual fact of having effectively two separate code bases - front end and back end forces you to repeat quite a bit of code. And hunting for missing variables and fields or mismatching types is the favourite pastime of JavaScript developers.