Programming in Biology

Home

What is Programming, anyway?

One could debate this question endlessly, but here is one answer, which may surprise you.

A program is a set of instructions allowing for two things. First, the instructions can have branches: some steps are executed if some condition holds, and other steps otherwise. Which branch will be followed need not be known in advance. Second, some sequences of steps can be repeated, in a loop. A loop of instructions can be repeated while some condition holds. How many times that happens, is not known in advance. Programs are usually instructions for handling numbers or other information, but our definition does not demand that. Take a recipe for making a cake. The last part of the recipe usually asks you to probe the inside of the cake with a knife or needle. Baking is done if the inside is dry. Otherwise, baking should continue and the test should be repeated every few minutes, while the inside is moist.

So there you have it. Once you can write down instructions with if branches and while loops, you can program. Not only that, you can write programs as powerful as anyone else. A programming genius may do it more elegantly and efficiently, but they cannot in principle do any more. This remarkable fact was proved by Alan Turing in 1936. There were no computers in 1936, and Turing was really proving a theorem in mathematics. But he used an imaginary machine in his explanations, and nowadays in his honour any abstract computer is known as a Turing machine. Indeed, you will find plenty of discussions and speculation online on the subject of DNA Turing machines.

Turing was not, however, the first to use the idea of if branches and while loops. The writings from Ada Lovelace on arithmetic from 1843 contain recognizable examples of both concepts. Hence Ada Lovelace is considered a pioneer of programming, even though she would not have had a working computer to actually run her programs for another hundred years.

So as we see, not all programming requires a computer. Likewise, not all use of computers is programming. The use of large software packages with many complicated options does not qualify as programming. Nor, for example, does a database lookup with subtle boolean search items. With these, one can do only a limited (if large) number of things. With if and while, on the other hand --- or something recognizably equivalent --- the possibilities become unlimited.

The world of programming today stretches between two opposite poles, exemplified by the programming languages Assembly and Lisp. Assembly is for people who cannot bear to be far from the electronics and want to play with every little detail of a processor. Lisp is for people --- sometimes the same ones, just in a different mood --- who regard programming as a branch of abstract mathematics. As you can imagine, not very many people actually write in Assembly or Lisp. But if you move just a little bit away from the opposite poles, the picture changes, and you find some very popular programming languages.

If you move from Assembly a little in the direction of Lisp, you arrive at C, a language close to the hardware without being too close, which is used at a foundational level by every modern computer system. Relatives of C are C++, Objective C and Java, and (more distantly) Fortran. These languages are very useful once you become expert, but gaining such expertise takes more time than we have in this course.

If you start from Lisp and decide to be not quite so pure, and wander slightly towards Assembly (but not too much), you arrive at Python. Python's syntax is so simple and intuitive, children can learn to program in it before they even meet algebra. What Python sacrifices for this qualities is speed, but fortunately, that disadvantage will be irrelevant to us, because of Python's other quality. That is the ease with the language can be extended by libraries, while keep its intuitive Pythonic character. Libraries can be written in Python, or they can be written in another language, even C. In recent years, many developers have been attracted by the potential usefulness of Python libraries, and kindly provided many libraries, such as numpy and scipy for scientific computing, matplotlib and pylab graphics, and BioPython for bioinformatics. These enable a great many things to be done with Pythonic simplicity yet C-like efficiency.

Somewhat intermediate in character between C and Python, there are several other popular languages. These tend to specialize to particular kinds of applications, even though any programming language could be used for any application. Perl is especially strong when working with text, R is designed for statistics, while IDL and Matlab are aimed at physical scientists and engineers. Python libraries are increasingly offering functionality similar to these four languages. For doing algebra, Python libraies are still not competitive with Maple or Mathematica. Finally, for code that runs entirely inside a web browser, Python cannot do that, and JavaScript is the only way.

Another distinction worth bearing in mind is proprietory systems (Mathematica, Matlab, Maple or IDL) versus free software. The paid-for systems have nicer user interfaces and integrated help systems, but the software performance of free systems is comparable. Free software also has a culture of cooperation, and you will quickly find that there are countless people online who will respond to requests for help.