[Verba] Computer Club - Fall 2006 Review of first two lectures
Stuart D. Gathman
stuart at gathman.org
Wed Sep 20 20:17:12 EDT 2006
This year, we are diving into the low level.
Week 1 Numbering
Number system bases in history:
base 2 Binary binary computers (most modern computers)
base 3 Trinary http://www.icfcst.kiev.ua/MUSEUM/PHOTOS/Setun-1.html
base 5 Riven computer game, sequel to Myst
base 8 Octal Used by software as shorthand for binary
base 9 Nonary Used by software for trinary computers
base 10 Decimal Roman empire, some early computers
base 12 Saxons (dozen, gross)
base 16 Hexadecimal Used by software as shorthand for binary
base 20 Mayan empire
base 60 Sexagecimal Babylonian http://en.wikipedia.org/wiki/Sexagesimal
base 64 Used in MIME (email attachments) and other software
base 95 A less common encoding also used in MIME and elsewhere
A bit is the smallest unit of information. It has two states. A byte
is composed of 8 bits. Four bits are called a nybble. A 3 state digit
in a base 3 computer is called a trit.
One kilobyte is 2^10, or 1024 bytes. One megabyte is 2^20 bytes.
Binary prefix Decimal
2^10 kilo 10^3
2^20 mega 10^6
2^30 giga 10^9
2^40 tera 10^12
2^50 peta 10^15
2^60 exa 10^18
2^70 zetta 10^21
2^80 yotta 10^24
Week 2 Coding
Coding in history:
Godel's theorem uses coding to create self reference in Typographic Number
Theory (TNT) by mapping symbols of TNT to codons.
Morse code used a system of two symbols plus spacing to represent letters,
digits, and punctuation on the telegraph.
Early teletypewriters used 5-bit baudot code. Several codes are reserved
as control codes to allow letters and figures to share the same
code space: http://en.wikipedia.org/wiki/Baudot_code
EBCDIC was an 8-bit code used by IBM punched cards.
ASCII is a 7-bit code which is the basis of all modern character codes.
Extended ASCII extends ASCII to 8-bits. Each region invented their own
coding for the 128 additional characters. These were called "code pages".
Unicode extends ASCII to 16 bits. All extended ASCII code pages are assigned
code space. In addition, simplified chinese is represented.
Unicode32 extends Unicode to 32 bits to handle traditional chinese,
fictional (Klingon, Elvish, etc), and ancient alphabets.
Our initial goal is to understand machine language, like in the following:
Address Machine Code Mnemonic Operands
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 8b 45 0c mov 0xc(%ebp),%eax
9: 03 45 08 add 0x8(%ebp),%eax
c: 79 13 jns 21 <test+0x21>
e: 83 ec 08 sub $0x8,%esp
11: 50 push %eax
12: 68 00 00 00 00 push $0x0
17: e8 fc ff ff ff call 18 <test+0x18>
1c: 83 c4 10 add $0x10,%esp
1f: eb 11 jmp 32 <test+0x32>
21: 83 ec 08 sub $0x8,%esp
24: 50 push %eax
25: 68 05 00 00 00 push $0x5
2a: e8 fc ff ff ff call 2b <test+0x2b>
2f: 83 c4 10 add $0x10,%esp
32: b8 00 00 00 00 mov $0x0,%eax
37: c9 leave
38: c3 ret
The Intel "architecture" of our computer describes the format of instructions,
data, and addresses. It is called i386.
Address
Our class computer has 256 megabytes of memory. Each byte is numbered
from 0 to 2^28-1. The architecture provides for up to 2^32 bytes of memory.
A full i386 address is usually written as 8 hexadecimal digits. This listing
shows the offset from some arbitrary starting address.
Machine Code
The program instructions are stored in memory. This displays the bits
of the instruction opcodes and operands in hexadecimal.
Mnemonic
This shows the assembler mnemonic for the machine instructions.
http://www.online.ee/~andre/i80386/Opcodes/index.html
Operands
This shows the operands specified for the machine instructions.
In this listing, '$' introduces an immediate value - a value that
is part of the instruction stream. '%' introduces a register name.
Hexadecimal values start with '0x' (except for jmp and call - which
I'll explain later).
The i386 instructions directly operate on data in several formats:
Format Sizes
signed integer 8,16,32,64 two's complement
unsigned integer 8,16,32
binary coded decimal 8
IEEE floating point 32,64,80
Additional formats can be handled with multiple instructions, limited
only be the imagination of the programmer.
More information about the Verba
mailing list