This year, we are diving into the low level. Week 1 Numbering Number system bases in history: base 2 Binary binary computers (most modern computers) base 3 Trinary http://www.icfcst.kiev.ua/MUSEUM/PHOTOS/Setun-1.html base 5 Riven computer game, sequel to Myst base 8 Octal Used by software as shorthand for binary base 9 Nonary Used by software for trinary computers base 10 Decimal Roman empire, some early computers base 12 Saxons (dozen, gross) base 16 Hexadecimal Used by software as shorthand for binary base 20 Mayan empire base 60 Sexagecimal Babylonian http://en.wikipedia.org/wiki/Sexagesimal base 64 Used in MIME (email attachments) and other software base 95 A less common encoding also used in MIME and elsewhere A bit is the smallest unit of information. It has two states. A byte is composed of 8 bits. Four bits are called a nybble. A 3 state digit in a base 3 computer is called a trit. One kilobyte is 2^10, or 1024 bytes. One megabyte is 2^20 bytes. Binary prefix Decimal 2^10 kilo 10^3 2^20 mega 10^6 2^30 giga 10^9 2^40 tera 10^12 2^50 peta 10^15 2^60 exa 10^18 2^70 zetta 10^21 2^80 yotta 10^24 Week 2 Coding Coding in history: Godel's theorem uses coding to create self reference in Typographic Number Theory (TNT) by mapping symbols of TNT to codons. Morse code used a system of two symbols plus spacing to represent letters, digits, and punctuation on the telegraph. Early teletypewriters used 5-bit baudot code. Several codes are reserved as control codes to allow letters and figures to share the same code space: http://en.wikipedia.org/wiki/Baudot_code EBCDIC was an 8-bit code used by IBM punched cards. ASCII is a 7-bit code which is the basis of all modern character codes. Extended ASCII extends ASCII to 8-bits. Each region invented their own coding for the 128 additional characters. These were called "code pages". Unicode extends ASCII to 16 bits. All extended ASCII code pages are assigned code space. In addition, simplified chinese is represented. Unicode32 extends Unicode to 32 bits to handle traditional chinese, fictional (Klingon, Elvish, etc), and ancient alphabets. Our initial goal is to understand machine language, like in the following: Address Machine Code Mnemonic Operands 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: 8b 45 0c mov 0xc(%ebp),%eax 9: 03 45 08 add 0x8(%ebp),%eax c: 79 13 jns 21 e: 83 ec 08 sub $0x8,%esp 11: 50 push %eax 12: 68 00 00 00 00 push $0x0 17: e8 fc ff ff ff call 18 1c: 83 c4 10 add $0x10,%esp 1f: eb 11 jmp 32 21: 83 ec 08 sub $0x8,%esp 24: 50 push %eax 25: 68 05 00 00 00 push $0x5 2a: e8 fc ff ff ff call 2b 2f: 83 c4 10 add $0x10,%esp 32: b8 00 00 00 00 mov $0x0,%eax 37: c9 leave 38: c3 ret The Intel "architecture" of our computer describes the format of instructions, data, and addresses. It is called i386. Address Our class computer has 256 megabytes of memory. Each byte is numbered from 0 to 2^28-1. The architecture provides for up to 2^32 bytes of memory. A full i386 address is usually written as 8 hexadecimal digits. This listing shows the offset from some arbitrary starting address. Machine Code The program instructions are stored in memory. This displays the bits of the instruction opcodes and operands in hexadecimal. Mnemonic This shows the assembler mnemonic for the machine instructions. http://www.online.ee/~andre/i80386/Opcodes/index.html Operands This shows the operands specified for the machine instructions. In this listing, '$' introduces an immediate value - a value that is part of the instruction stream. '%' introduces a register name. Hexadecimal values start with '0x' (except for jmp and call - which I'll explain later). The i386 instructions directly operate on data in several formats: Format Sizes signed integer 8,16,32,64 two's complement unsigned integer 8,16,32 binary coded decimal 8 IEEE floating point 32,64,80 Additional formats can be handled with multiple instructions, limited only be the imagination of the programmer.