This is mainly chapters 1 to 5 of Hello World.
Chapter 1 of Hello World explains about Python shells, which are interactive environments for writing and running Python programs. Hello World follows IDLE, which is the simplest Python shell, but we recommend Spyder, which is better suited to scientific work. The choice of Python shell makes no difference to the program code.
The number-guessing game nicely illustrates the character of the language. Already in the first two lines
import random secret = random.randint(1, 100)
we have a function that will be useful later. This function cannot be called directly since it is not a built-in function. Instead, it is part of the random library. This library needs to be imported before the function can be called. Almost all functions in the random library depend on the basic function random.random(), which generates a random number drawn from a uniform distribution on the interval (0,1).
Try and think up a good strategy for guessing the secret number. Will it always lead to the answer within six tries?
Chapter 2 of Hello World introduces the concept of variables. The analogy with pictures of labels and rings is a good one, and worth thinking over a little before going on.
If x
and y
are numbers, the fragment
x = x + y y = x - y x = x - y
would be nonsense in ordinary mathematics, but it is correct Python. Can you work out what it does?
Chapter 3 of Hello World gets us started on interesting operations.
Depending on your version of Python, 3/2
may give
you 1.5
or it may discard the remainder and give the
integer 1
. The latter is the old Python standard, and is
deprecated. If your installation has the old standard, you can change
to the new standard by putting
from __future__ import division
at the top of each program. If you want integer
division, 3//2
will provide it.
Chapter 4 of Hello World introduces the notion of a data
type, and explains about int
, float
and str
.
In addition to using str(x)
to convert a number into
a string, there is another method, known as formatting numbers.
Hello World covers formatting number later in Chapter 21, but
we can see it now through some examples. Try the following.
x = 22/3 x str(x) 'a number: %i' % x 'another number: %f' % x 'and yet another: %e' % x
As you can see, %
acts as an operator that inserts a
number into a string. You can specify the (minimum) number of
characters for writing an integer:
'%3i' % x
If the number is too small, it will be padded with spaces on the left. Or you can choose padding with zeros.
'%03i' % x
In similar fashion, you can specify the number of digits on the right of a float
'%25.20f' % x '%29.20f' % x
The two numbers in the format are the total number of characters (including the decimal point and a minus sign, if any), and the number of decimal digits. Printing more digits does not imply more accuracy! (Floats are good to 16 decimal digits at most.)
Here's another example of a strange-looking but perfectly valid Python statement.
x = 2 > 3
Using the interpreter, find the value and type of x
.
What other values could this type of variable have?
It is possible to change the case of letters as follows:
string1='Hello' string2=string1.upper() print string2 string3=string2.lower() print string3
Chapter 5 introduces interactive input, as well as
the urllib
library for reading web pages directly.
By the way, reading a disk file into Python works like reading a
web page, but simpler: we just use open
instead
of urllib.urlopen
and no import is needed. We will come
back to file input later.
Input from the web is not limited to simple pages. Here is an example to fetch a protein sequence from UniProt and save it in a text file.
from Bio import ExPASy, SeqIO sid = raw_input('Sequence id? ') try: handle = ExPASy.get_sprot_raw(sid) seq = SeqIO.read(handle,'swiss') SeqIO.write(seq, sid+'.genbank','genbank') print 'Sequence length',len(seq) except Exception: print 'Sequence not found'
By replacing genbank
with fasta
you can
change the output to FASTA format. The sequence data are exactly the
same, of course, but the FASTA format doesn't have some of the
metadata that the 'GenBank format has, such as the name of the person who sequenced the gene.
This example also illustrates Python's try...except
construction, which is useful for handling errors.
The sequence id F8RBX8
stands for the protein sequence of
an important gene in a well-known organism. Fetch the data and save
it as a FASTA file. Open the file using a text editor and read off the
gene name and organism.