Scrapy and persistent cookie manager middleware
Scrapy is a nice python environment for web scraping, i.e. extracting information from web sites automatically by crawling them. It works best with anonymous data discovery, but nothing stops you from having active sessions as well. In fact, scrapy transparently manages cookies, which are usually used to track user sessions. Unfortunately, the sessions don’t survive between runs. This, however, can be fixed quite easily by adding custom cookie middleware. Here is an example:
No more Mr JavaScript guy?
After doing some web development work recently, I have clearly remembered why I hate JavaScript so much. Not only is it ugly as a language, lacking in type checking or even decent syntax for inheritance (actually faking it with various workarounds), but the actual fact of having effectively two separate code bases - front end and back end forces you to repeat quite a bit of code. And hunting for missing variables and fields or mismatching types is the favourite pastime of JavaScript developers. CoffeScript makes life more bearable, but it is essentially a fancier and more concise JavaScript with all that it implies.
.NET's MemoryStream and buffer slice
.NET’s MemoryStream is a very convenient class. It allows you to use a byte[] array as storage, while accessing it via standard Stream API. Among other things, it allows you to work with a section of a larger byte array, which is very handy when different actions need to be taken for different slices.
Unfortunately, when it comes to access the raw buffer stored in a MemoryStream instance, there is no way to know which position the stream considers as its beginning. Of course, internally this information is stored, however it is a private member and as such is inaccessible to the end user. And this is extremely inconvenient. A number of workarounds can be applied, but they all boil down to having to track this information yourself. Arguably the easiest way is to derive a new class from MemoryStream and add a public property to store the offset in the underlying byte array. For specific applications, more access methods and properties can be provided, to increase code reuse and security. Hopefully, Microsoft will address this shortcoming of MemoryStream in the future.
Ubuntu 15.04 freezing
I use Ubuntu (a modern Linux distribution) as my main work station. Everything worked fine until I upgraded from version 14.10 to 15.04 of the OS. At this point strange things started to happen. After working for an hour or two my environment would freeze and stop responding altogether. Neither Ctrl-Alt-Del nor Ctrl-Alt-Backspace would produce any reaction. I couldn’t connect to the machine remotely as well. Only hard reset would take me out of this state.
.NET threads Abort vs Interrrupt
Not an uncommon task in multithreaded programming - what if you want to terminate a background thread, which is blocked at the moment, and then to make sure it has actually exited. In .NET, you can do the following:
<br />
thread.Interrupt();<br />
thread.Join();<br />
Unfortunately, the call to Join can happen before the thread interruption instruction is issued to the other thread. That means that the thread will be in Join state at the time of processing that instruction. And, according to Microsoft’s documentation, that will prevent it from being interrupted!
JVM-based web framework
What JVM-based web framework to choose? There seems to be so many out in the big world, and all claim to be better than the rest. Some examples of the frameworks of which I am personally aware:
- Lift
- Play Framework
- Grail
- Struts
- JSP and related technologies
But there are many many more, with varying degrees of features, support, community sizes and popularity. Honestly, it is a daunting task to choose one for your own project, even if you know precisely your requirements. I read a rather nice review of available options, but after finishing it was not much wiser, unfortunately. Feels like if I do need to make a choice, I’ll have to actually evaluate a few most popular ones. Popularity is, realistically speaking, an important factor in itself - signalling that there will be better support from the community and, hopefully, more features and third-party extra functionality.
Google's Inbox
Earlier I wrote about issues with the new offer from Google - the “Inbox” mail service. Recently Google released updates for its Inbox web and Android clients. As a result, it works almost perfectly. In particular:
- Web interface works in Firefox as well as Chrome now. It looks much slicker than the old GMail, but I’m yet to be convinced it’s more productive. This is quite different from Android, where Inbox is a clear winner in terms of productivity
- The 2-step authentication issues I reported earlier proved to have nothing to do with Inbox per se and were resolved independently
- Inbox on Android now allows per-account notification configuration, including sound
I did notice the following unfortunate issue with Android Inbox - for the life of me I couldn’t figure out how to attach a file to an email. This is quite annoying and something Google should address as soon as possible.
Scala "const" values
val
values in Scala are, by specification, const, or final in Javaspeak. That means once assigned, they cannot be re-assigned. This doesn’t mean that the assignment has to be something dull or simple. In fact, you can have a large piece of code doing all sorts of calculations with the result assigned to your value. Do it like this:
val result = {
  val temp = callFunc() match {
    case Some(x) =>
    ....
  }
  ...
}
Whatever is the value of the code block in curly brackets will be assigned as the value of
Is it worth using prepared statements on Android?
When it comes to SQL queries, there are two ways you can execute them from your program, whether it targets Android or not. One is ad hoc - create an SQL statement that finds your data (or manipulates it), with all the parameters of the actual query clauses already embedded in it. For example:
SELECT * FROM someTable WHERE a=5 AND b='CCC' AND c IN (1, 2, 3)
This is simple and straightforward. However, this approach has a few disadvantages, as it is well known by the development community. The most obvious and most dangerous is security, or the lack of such. Ff query parameters come from user input, carelessly building the query directly out of them might be very unwise. There is even a special name for the attack vector exploiting this approach - SQL injection.
@SerialVersionUID and Android
I was writing a piece of code in Scala, which was supposed to de-serialize some Java object, previously serialized elsewhere. However, I kept seeing exceptions like these:
java.io.InvalidClassException: com.example.MyStuff; Incompatible class (SUID): com.example.MyStuff: static final long serialVersionUID =10L; but expected com.example.MyStuff: static final long serialVersionUID =-7513795898815927590L;
In my code I did have classes declared with the @SerialVersionUID attribute. The same code worked fine on my development machine. So it looked like it got lost from the class bytecode by the time that I ran it on Android. I initially suspected, that there could be a discrepancy in serialization representation between Dalvik (Android’s JVM) and the classic JVM released by Sun, now Oracle. However, after some research, I reached a conclusion that it was overzealous proguard tool which simply stripped @SerialVersionUID from my classes. I fixed the issue using the following config: