Advertisements
Metaprogramming is Your Friend
Introduction
A Program to Write a Program
Whenever I create a new C++ file using Emacs a simple elisp script executes. This script:
* places a standard header at the top of the file,
* works out what year it is and adjusts the Copyright notice accordingly,
* generates suitable #include guards (for header files),
* inserts placeholders for Doxygen comments.
In short, the script automates some routine housekeeping for me.
On the face of it, nothing extraordinary is going on here. One program (the elisp script) helps me write another program (the C++ program which needs the new file). This unremarkable elisp script � a program to write a program � is, then, a metaprogram.
This article investigates some other metaprograms which, perhaps, we don't really notice, and some alternative metaprogramming techniques which, perhaps, we should be aware of.
What is Metaprogramming?
"Metaprogramming is the writing of programs that write or manipulate other programs (or themselves) as their data or that do part of the work that is otherwise done at runtime during compile time."
Actually, it's the first half of this definition I like (everything up to and including data). The second seems rather to weaken the concept by being too specific, and in my opinion its presence reflects the current interest in C++ template-metaprogramming � but a Wikipedia is bound to relect what's in fashion!
Why Metaprogram?
Having established what metaprogramming is the obvious follow-up is "Why?" Writing programs to manipulate ordinary data is challenging enough for most of us, so writing programs to manipulate programs must surely be either crazy or too clever by half.
Rather than attempt to provide a theoretical answer to "Why?" at this point, let's push the question on the stack and discuss some practical applications of metaprogramming.
Editor Metaprogramming
I've already spoken about programming Emacs to create C++ files in a standard format. We can compare this technique to a couple of common alternatives:
1. create an empty file then type in the standard header etc.
2. copy an existing file which does something similar to what we want, then adapt as required.
The first option is tough on the fingers and few of us would fail to introduce a typo or two. The second is better but all too often is executed without due care � maybe because a programmer prefers to concentrate on what he wants to add rather than on what he ought to remove � and all too often leads to a new file which is already slightly broken: perhaps a comment remains which only applies to the original file, perhaps there's an incorrect date stamp.
The elisp solution is an improvement. It addresses the concerns described above and can be tailored to fit our needs most exactly. All decent editors have a macro language, so the technique is portable.
Of course, there is a downside. You have to be comfortable customising your editor. (Or you have to know someone who can do it for you.)
Batch Editing
By batch editing I mean the process of creating a program to edit a collection of source files without user intervention. This is closely related to editor metaprogramming � indeed, I often execute simple batch edits without leaving my editor (though the editor itself may shell-out instructions to tools such as find and sed).
Very early on in my career (we're talking early 80's) I worked with a programmer who preferred to edit source files in batch mode. His desk did not have a computer terminal on it. Instead, he would study printouts, perhaps marking them up in pencil, perhaps using a rubber to undo these edits, before finally writing � by hand � an editor batch file to apply his changes. He then visited a computer terminal to enter and execute this batch file.
Even then, this was an old-fashioned way of working, yet he was clear about its advantages:
* Recordable: the batch file provides a perfect record of what it has done.
* Reversable: its effects can therefore be undone, if necessary.
* Reflective: by working in this reflective, careful way, he was less likely to introduce errors. When system rebuilds can only be run overnight, this becomes paramount.
These days, builds are quicker and batch editing is more immediate. With a few regular expressions and a script you can alter every file in the system in less time than it takes to check your email. As an example, in another article I describe the development of a simple Python script to relocate source files into a new directory structure, taking care to adjust internal references to #included files.
The benefits of using a script to perform this sort of operation are a superset of those listed above. In addition, a scripted solution beats hand hacking since it is:
* Reliable: the script can be shown to work by unit tests and by system tests on small data sets. Then it can be left to do its job.
* Efficient: editing dozens � perhaps hundreds � of files by hand is error prone and tedious. A script can process Megabytes of source code in minutes.
Again, there is a downside. You have to invest time in writing the script, which may well require a larger investment in learning a new language. Many of us would regard proficiency in other languages as an upside but it may be difficult to make that initial investment under the usual project pressures.
So, once again, it may end up being a team-mate who ends writes the script for you. Indeed, many software organisations have a dedicated Tools Group which specialises in writing and customising tools for internal use during the development of core products. Perhaps this team could equally well be named a Metaprogramming Group?
The Canonical Metaprogram
The compiler is the canonical example of a metaprogram: it translates a program written in one language (such as C) into an equivalent program written in another language (object code).
Of course, when we invoke a compiler we are not metaprogramming, we are simply using a metaprogram, but it is important to be aware of what's going on. We may prefer to program in higher-level languages but we should remember the compiler's role as our translator.
We lean on compilers: we rely on them to faithfully convert our source code into an executable; we expect different compilers to produce "the same" results on different platforms; and we want them to do all this while tracking language changes.
In some environments these considerations are taken very seriously. For safety critical software, a compiler will be tested systematically to confirm the object code output by various test cases is correct. In such places, you cannot simply apply the latest patch or tweak optimisation flags. You may even prefer to work in C rather than C++ since C is a smaller language which translates more directly to object code.
In other environments we train ourselves to get along with our compilers. We accept limitations, report defects, find workarounds, upgrade and apply patches. Optimisation settings are fine-tuned. We prefer tried-and-tested and, above all, supported brands. We monitor newsgroups and share our experiences.
One last point before leaving compilers: C and C++ provide a hook which allows you to embed assembler code in a source file � that's what the asm
keyword is for. I guess this too is metaprogramming in a rather back-to-front form. The asm
keyword instructs the compiler to suspend its normal operation and include your handwritten assembler code directly. Its exact operation is implementation dependent, and, fortunately, rarely needed.
Metaproblems
Most of this article puts a positive spin on metaprogramming. I'm happy enough to leave you with this impression, but I should also mention some problems.
Trouble Shooting
The first problem is to do with trouble-shooting. You have problems with your program but the problem is actually in the metaprogram which generated your program. You are one step removed from fixing it.
I deliberately used the term trouble-shooting rather than debugging. When you think about it, debug builds and debuggers are there to help you solve these problems by hooking you back from machine code to source code. It gives the illusion of reversing the effect of the compiler. If you can provide similar hooks in your metaprograms, then similarly the fix will be easier to find.
Quote Escape Problems
The second problem I refer to as the quote-escape problem. It bit me recently when we converted a regular C++ program into one which was partially generated by another C++ program. For the purposes of this article, look at what happened when I needed to generate C++ code which produces formatted output.
Here's the code I wanted to generate:
context.decodeOut()
<< context.indent()
<< field_name << " "
<< bitwidth
<< " = 0x" << value << "
";
Here's the code I developed to do the generating:
cpp_file
<< indent()
<< "context.decodeOut() << context.indent() << "
<< quote(field_name
+ " "
+ bitwidth
+ " = 0x")
<< " << context.readFieldValue("
<< quote(field_name) + ", "
<< value
<< ") << "\n";
";
It looks even worse without the helper function, quote, which returns a double-quoted version of the input string.
I was able to defuse this problem with some refactoring but the self-referential nature of metaprogramming will always make it susceptible to these issues.
This is also part of the reason why Python is so popular as a code-generator: as has been shown by some of the preceding examples, its sophisticated string support allows us to bypass most quote-escape problems.
Build Time Complexity
I have already mentioned the problem of integrating code-generators into your build system. Some IDEs don't integrate them very well, and even if they do, we have introduced complexity into this part of the system. In general we prefer to trade complexity at build time for safety at run-time but we should always check that the gains outweigh the costs.
Too Much Code
We're nearing the end of our investigation, and I hope the Why Metaprogram? question I posed at the beginning has been addressed. The Wikipedia answers this question rather more directly:
[Metaprogramming] ... allows programmers to produce a larger amount of code and get more done in the same amount of time as they would take to write all the code manually.
It's possible to interpret this wrongly. As we all know, we want less code, not more (more software can be good, though). The important point is that the metaprogram is what we develop and maintain and the metaprogram is small: we shouldn't have to worry about the size of the generated code.
Unfortunately we do have to worry about the generated code, not least because it has to fit in our system. If we turn a critical eye on the ISO 8859 conversion functions we discussed earlier we can see that the generated code size could be halved: values in the range (0, 0x7f) translate unchanged into UTF-8, and therefore do not require 128 separate case statements. Of course, the metaprogram could easily be modified to take advantage of this intelligence, but the point still holds: generated code can be bloated.
Too Clever
Good programmers use metaprograms because they are lazy. I don't mean lazy in the sense of Can't be bothered to put the right header in a source file, I mean lazy in the sense of Why should I do something a machine could do for me?
Being lazy in this way requires a certain amount of cleverness and clever can be a pejorative every bit as much as lazy can. A metaprogram lives at a higher conceptual level than a regular program. It has to be clever.
Experienced C++ programmers are used to selecting the right language features for a particular job. Where possible, simple solutions are preferred: not every class needs to derive from an interface, not every function needs template-type parameters. Similarly, experienced metaprogrammers do not write metaprograms when they can, they do it when they choose to.
Concluding Thoughts
This article has touched on metaprogramming in a few of its more common guises. I hope I have persuaded you that metaprogramming is both ubiquitous and useful, and certainly that it shouldn't be left to a select few.
At one time, the aim of computer science seemed to be to come up with a language whose concepts were pitched at such a high level that software development would be simple. Simple enough that people could program machines as easily as they could, say, send a text message. Compilers would be intelligent and forgiving enough to translate wishes to machine code.
This aim is far from being realised. We do have higher-level languages but their grammars remain decidedly mechanical. Programs written in low-level languages still perform the bulk of processing. Perhaps a more realistic aim is for a framework where languages and programs are compatible, able to communicate with humans and amongst themselves, on a single device or across a network.
In such a framework, metaprogramming is your friend.