[Verba] Nemesis Coding (revisited) and DNA coding
Stuart D. Gathman
stuart at gathman.org
Sun Jan 29 00:10:01 EST 2006
Here is a simple exercise: write a program that accepts text as input,
and outputs binary ascii code like the message you just decoded.
You can either include the text in the program as a constant:
txt = "This is a test."
or use sys.stdin.read() like the example decoder programs.
We've spent a lot of time on coding, but it is a deep principle. It is
the basis of biology. Current computers use 8-bit bytes. But the cells
in your body use 6-bit bytes. Instead of using 2 state bits, cells
use 4 nucleotides (so that each nucleotide is equivalent to 2 bits). The
4 nucleotides are represented by 4 letters: UCAG for RNA and TCAG for
DNA. The bytes (called 'codons' by biologists) map to amino acids:
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/C/Codons.html
You could use binary to represent the 4 DNA nucleotides:
00 = T
01 = C
10 = G
11 = A
Every DNA sequence has an "antisense" sequence which is the same sequence
with all bits reversed. In computers, this is called the NOT operation.
In python, the NOT operation is performed by the "~" operator. In
biology, the T and A nucleotides attract each other, and the C and G
attract each other. So a strand of DNA quickly has the opposite
nucleotides attach to it forming the antisense strand. For instance,
if the DNA strand is TAGCCAGCTTAG, then free floating nucleotides will
start to attach:
TAGCCAGCTTAG --> TAGCCAGCTTAG --> TAGCCAGCTTAG --> TAGCCAGCTTAG
G C A CGG CGA ATCGG CGAA C ATCGGTCGAATC
Exercise: write a program to compute the antisense sequence for
any DNA sequence. Translate the nucleotide letters to binary, then
use the "~" operator. Examples of NOT:
>>> ~5
-6
>>> ~1
-2
>>> ~2
-3
Whoa! Negative numbers? Integers in python are treated as infinite bit
sequences. When all the bits to the "left" (more significant) of the
interesting bits are 0 to infinity, then the number is positive (or zero).
When all the bits to the left of the interesting bits are 1, then the
number is negative. So for instance ~0 == -1, because the NOT reverses
all the bits out to infinity. (Of course, the computer doesn't *really*
store infinite bits. It just treats the most significant bit stored as
if it extended to infinity.)
Here is an outline of your program to get you started. It will use all
the things you have (hopefully) learned so far.
1. Make a function (remember 'def') to convert the letters of a DNA
sequence to a number.
2. Make a function to convert a number to the letters of a DNA sequence.
(You need some sophisticated equipment to actually convert to a DNA
sequence :-)
3. Read a sequence, convert to a number, NOT, convert to letters again,
print.
--
Stuart D. Gathman <stuart at bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
More information about the Verba
mailing list