C++ has always been considered a language for mission-critical server-side functionality. Web development, although in part server-side based, is done using different software development languages. I try to analyse what caused this situation. Possibility to provide a portable pure C++ web development framework is explored.
This article is for people, who believe that C++ can be successfully used to build sophisticated web applications and are frustrated by the lack of basic tools to do so. It doesn't provide any kind of definite plan on how to create a C++ web development framework, but rather tries to share some thoughts, give an inspiration and some guidelines.
This article is also an invitation to dialogue – please share your thoughts and ideas, using the comments form below or my contact details page.
What this article is not?
To make things clear, this article definitely does not claim, that C++ is better than other software development languages, or that it should be used for every project, or that it doesn't have problems of its own. It rather tries to describe a niche set of requirements and applications, where using a C++ web development framework would make more sense, despite all the issues involved.
Today there are 3-4 most popular platforms to do server-side web development. These include:
Java and Java-based technologies, such as Servlets/JSP, Java Face, Struts and so on. The multitude of Java-based frameworks is a little bit dizzying
ASP (legacy) and ASP.NET, together with all Microsoft-related technologies (ADO.NET and other .NET libraries, legacy COM+ etc.)
PHP (for smaller sites)
Perl (CGI – legacy, mod_perl)
I don't want to describe in detail all these and other frameworks, available on the market. Just to mention, the last two entries are actually scripting languages and their performance is even worse than that of bytecode-compiled ones, like Java. In terms of functionality, all of them provide enough tools to create flexible and powerful web applications. They can and have been used to solve real life problems. However, in my opinion, for some extremely demanding sites they are simply aren't good enough.
There are a few frameworks, using C++, most notably Microsoft's ATL sever. They are either Windows-only or quite limited in their functionality or both. Their specifications have not been updated for years and most of them do not enjoy any vendor or community support.
Why C++ is not used?
One of the main reasons C++ is not actively used to do web development is the lack of standard tools and libraries. There is simply no feature rich library, which includes all the necessary boiler-plate code needed to write web applications.
The numbers of new C++ programmers, trained in universities and by the industry are falling, and some existing developers are converting to other more popular technologies.
Why a new framework?
C++ is an extremely advanced and flexible language. Its performance is still unmatched by other languages, popular today – likes of Java and C# (for an interesting point of view on why many benchmarks, claiming that Java is faster than C++ are not applicable to the real world look here). Even more important, it can be tightly integrated in most widespread web servers (Apache, IIS) to run in-process, and not as an external service, thus eliminating inter-process communication.
One of the main advantages of C++ is its tight management of resources, such as memory allocation. And indeed, although doing explicit memory handling properly can be a nightmare for an unexperienced developer, by using modern programming paradigms, it can be exploited to give a powerful and safe control over resource allocation and destruction. The predictable nature of C++ memory management, as opposed to concurrently run garbage collectors, is an important feature when squeezing every single flop from your CPU.
Also, using C++'s template system and rather sophisticated techniques, based on it, such as meta-programming, can help in creating sets of highly decoupled components, which can be assembled into complicated pieces of software easily.
Economics: isn't hardware cheaper than development time?
This is the question, usually asked by project managers and those, who pay the bill. The idea is – take whatever solution is available already on the market, and if it is a little bit too slow, throw in a bigger iron or two and your bottom line will still be better than if you wrote the staff from ground up yourself.
This is a perfectly legal argument and it is actually applicable to many real projects. Hardware, especially Intel-compatible is quite inexpensive, so throwing in an extra G of RAM or a faster hard disk is usually a matter of few hundreds of dollars.
However, when we start talking about doubling performance of a very busy web site, with a lot of heavy activity, things could look a little bit different.
For example, imagine you are running an extremely successful advertising campaign. As an outcome, your current farm of 30 2U IBM xSeries servers does not provide you the performance you want anymore, and you think that it needs at least doubling. Therefore, in the end you will have 60 servers. Let's ignore their acquisition cost for a minute (but it will be a hefty $60,000-$100,000 or more). What if you don't have enough floor space in your data center to accommodate them? Will you have to move to a new one, increasing your rent bill, incurring the cost of removals and suffering service disruption? What about extra system administrators you might need to hire in order to support them? There are other components in your system, which might require upgrade, databases for instance.
The conclusion is that doubling a large computer farm is not as simple as buying two servers instead of one. Therefore, for large and business-critical application it makes sense to consider investing into performance, even if it means using less popular technologies.
I have already stated, that not every single web application should be written in C++. My intention is to give a powerful and flexible framework to those entities, which need to run extremely busy sites and which are prepared to invest in development of such sites a little bit more, knowing that the result will be much more powerful and scalable than any other alternative.
I consider high profile web sites, such as popular portals (Yahoo!, MySpace, AOL), blogging communities (LiveJournal, Blogger), relationship building (LinkedIn, OpenBC), popular on-line shops and others, receiving millions of visits regularly and delivering dynamic content to the users.
Naturally, I don't believe that all of them will ditch their current platforms and switch to the new toy of the month. But if a successful platform existed, over the time some of them would evaluate it and use it for some projects.
The following would be key factors, which could make a C++ web development framework attractive for potential users:
Highly dynamic site – large portions of the content cannot be cached effectively and need recreating on every view
Huge number of visits
Availability of necessary resources – developers, administrators and so on
Good project management procedures in place
How to succeed?
I don't believe in writing code without well defined and detailed plan. Not only software design and implementation are important in such a project, but also the whole management of the development process and proper PR are essential in order to achieve success.
Learning from other projects
It is important to learn from others and analyse their mistakes and achievements. A very interesting example of a successful major project is Subversion.
For years the only realistic option, available for version control in the open source world had been CVS. Although robust and well tested, it had a long list of shortcomings, which were never addressed by newer releases. A few attempts to build an alternative failed. Nonetheless, a few developers from CollabNet made an effort and created a hugely popular Subversion system. It fixes many problems in CVS, adds new features and provides a flexible and easy to use platform for extension and embedding.
One of distinctive features of the Subversion project is their superior documentation, good development plan and reliable release schedule. Also, they deploy an interesting business model, discussed later in the text.
Learning from other frameworks
From technology point of view we shouldn't ignore experience, gained by other frameworks. Many of them were created by listening to real world feedback from web developers, and learning which issues were raised and how they were resolved is important.
Of particular interest would be the J2EE and ASP.NET platforms since a lot of serious and complex applications are built on them and they were designed by dedicated and highly experienced teams.
External applications and libraries
Wherever possible, existing libraries and tools should be used. Examples include:
Apache/IIS – HTTP server platform
ICU – Unicode and locale library
Apache apr library – for platform-independent functionality, when not provided by boost or others (DB and other stuff)
Boost (threads, regular expressions, general C++)
log4cplus (or log4cxx from apache) – logging
Adobe open source – extra algorithms
The following components are essential pieces of any modern web development platform:
Integration into containing HTTP servers/application servers, such as Apache/IIS
Abstracted HTTP request/response framework – handles of different HTTP request types
Configurable session management – preferably using cookies. Session storage should be easily changeable (shared memory, file, DB etc.)
Configurable authentication (reuse server’s services)
Centralized resource configuration (database connectivity, start-up parameters, logging)
Simple template system – for clear separation between business logic and visual representation
Common DB layer (using existing libraries like arp), an ORM library, similar to Hibernate
When developing such a project, it is important to look at the whole picture. This includes the business approach, taken by the development team. There are basically two approaches, which can be considered.
Development is done on voluntary basis – the code is open source, and people who find the project interesting join and contribute. This is a powerful approach, and a lot of popular projects have been developed this way or are being developed as you read this article. The problem, of course, is in keeping enough high quality developers and other professionals interested in the framework. Many good open source projects died because of lack of interest.
Under this topic I would file any form of development, done by salaried or otherwise paid developers. The result can be closed source or open source and in later case can even enjoy contributions from voluntary developers. However, there is always a dedicated core, willing to develop and support, for money if not out of love of writing software.
Subversion project is developed using this approach and is quite successful. Other major software packages include Sendmail, Fedora Linux distribution and many others. This model is quite flexible and allows some cost saving by using voluntary contributions, high popularity and trust due to the source being open and also income in form of money contributions, consulting services and paid-for support.
Whatever business model is eventually selected, I believe that there must be centralized leadership for the project, be it in form of core open source development team or management of the sponsoring company. Otherwise the project will never succeed due to never ending arguments.
Design needs to be clear and flexible, and there must be an organized development plan with release schedule. As much as possible and realistic, the code should be covered by automated tests. And no releases should be made without documentation, otherwise no one will know how to use the software.
It is important to provide applications together with the framework. Also, it would be very important to actually try the ideas, built into the platform in some real world code, so it would be nice to see some of the applications listed below, being created in parallel with or as part of the main development process.
A real-world example – should be extremely fast and robust. I would think about using Berkeley DB as the data storage, as opposed to a conventional relational database. This way the application would be really fast, especially if proper caching is built into it.
The data storage layer should be abstracted, so can be easily replaced.
Another nice feature would be single installation – multiple independent message boards, so one could provide hosting to different unrelated boards, all configurable and manageable independently.
Feature-rich and fast – the application should share a lot of code with Message board.
Feature-rich, similar to Gmail, Yahoo! beta e-mail system. Such application can be used to provide free or commercial web mail services for large communities.