[Verba] Computer Club - Fall 2006 Review of first two lectures

Stuart D. Gathman stuart at gathman.org
Wed Sep 20 20:17:12 EDT 2006


This year, we are diving into the low level.

Week 1  Numbering

Number system bases in history:

  base 2  Binary        binary computers (most modern computers)

  base 3  Trinary       http://www.icfcst.kiev.ua/MUSEUM/PHOTOS/Setun-1.html

  base 5                Riven computer game, sequel to Myst

  base 8  Octal         Used by software as shorthand for binary

  base 9  Nonary        Used by software for trinary computers

  base 10 Decimal       Roman empire, some early computers

  base 12               Saxons (dozen, gross)

  base 16 Hexadecimal   Used by software as shorthand for binary

  base 20               Mayan empire

  base 60 Sexagecimal   Babylonian http://en.wikipedia.org/wiki/Sexagesimal

  base 64               Used in MIME (email attachments) and other software

  base 95               A less common encoding also used in MIME and elsewhere

   A bit is the smallest unit of information.  It has two states.  A byte

is composed of 8 bits.  Four bits are called a nybble.  A 3 state digit

in a base 3 computer is called a trit.

  One kilobyte is 2^10, or 1024 bytes.  One megabyte is 2^20 bytes.

 Binary prefix          Decimal 

  2^10  kilo            10^3  

  2^20  mega            10^6 

  2^30  giga            10^9

  2^40  tera            10^12

  2^50  peta            10^15

  2^60  exa             10^18

  2^70  zetta           10^21

  2^80  yotta           10^24

Week 2  Coding

Coding in history:

  Godel's theorem uses coding to create self reference in Typographic Number

Theory (TNT) by mapping symbols of TNT to codons.

  Morse code used a system of two symbols plus spacing to represent letters,

digits, and punctuation on the telegraph.

  Early teletypewriters used 5-bit baudot code.  Several codes are reserved

as control codes to allow letters and figures to share the same

code space:  http://en.wikipedia.org/wiki/Baudot_code

  EBCDIC was an 8-bit code used by IBM punched cards.

  ASCII is a 7-bit code which is the basis of all modern character codes.

  Extended ASCII extends ASCII to 8-bits.  Each region invented their own

coding for the 128 additional characters.  These were called "code pages".

  Unicode extends ASCII to 16 bits.  All extended ASCII code pages are assigned

code space.  In addition, simplified chinese is represented.  

  Unicode32 extends Unicode to 32 bits to handle traditional chinese,

fictional (Klingon, Elvish, etc), and ancient alphabets.

Our initial goal is to understand machine language, like in the following:

Address Machine Code         Mnemonic  Operands

   0:   55                      push   %ebp

   1:   89 e5                   mov    %esp,%ebp

   3:   83 ec 08                sub    $0x8,%esp

   6:   8b 45 0c                mov    0xc(%ebp),%eax

   9:   03 45 08                add    0x8(%ebp),%eax

   c:   79 13                   jns    21 <test+0x21>

   e:   83 ec 08                sub    $0x8,%esp

  11:   50                      push   %eax

  12:   68 00 00 00 00          push   $0x0

  17:   e8 fc ff ff ff          call   18 <test+0x18>

  1c:   83 c4 10                add    $0x10,%esp

  1f:   eb 11                   jmp    32 <test+0x32>

  21:   83 ec 08                sub    $0x8,%esp

  24:   50                      push   %eax

  25:   68 05 00 00 00          push   $0x5

  2a:   e8 fc ff ff ff          call   2b <test+0x2b>

  2f:   83 c4 10                add    $0x10,%esp

  32:   b8 00 00 00 00          mov    $0x0,%eax

  37:   c9                      leave 

  38:   c3                      ret   

The Intel "architecture" of our computer describes the format of instructions,

data, and addresses.  It is called i386.

Address

  Our class computer has 256 megabytes of memory.  Each byte is numbered

from 0 to 2^28-1.  The architecture provides for up to 2^32 bytes of memory.

A full i386 address is usually written as 8 hexadecimal digits.  This listing

shows the offset from some arbitrary starting address.

Machine Code

  The program instructions are stored in memory.  This displays the bits

of the instruction opcodes and operands in hexadecimal. 

Mnemonic

  This shows the assembler mnemonic for the machine instructions.  

  http://www.online.ee/~andre/i80386/Opcodes/index.html

Operands

  This shows the operands specified for the machine instructions.

In this listing, '$' introduces an immediate value - a value that

is part of the instruction stream. '%' introduces a register name.

Hexadecimal values start with '0x' (except for jmp and call - which

I'll explain later).

The i386 instructions directly operate on data in several formats:

  Format                Sizes

  signed integer        8,16,32,64              two's complement

  unsigned integer      8,16,32

  binary coded decimal  8

  IEEE floating point   32,64,80

Additional formats can be handled with multiple instructions, limited

only be the imagination of the programmer.




More information about the Verba mailing list