II-INF3 - VL7-12

Einführung
Allgemeine Ausführungsumgebung
Analyse eines Beispiels

push
mov

hex2bits

sub
and
mov (2)
Debugging

Benutzte Befehle in gcc/gdb
Testfragen und Übungen

II-INFORMATIK3 WS04
VL7-11: Programmierung mit den allgemeinen Befehlen der IA32-Architektur

    Achtung : Skript gibt den mündlichen Vortrag nur teilweise wieder !!!
    Achtung : Skript noch nicht abgeschlossen !!!

AUTHOR: Gerd Döben-Henisch
DATE OF FIRST GENERATION: Oct-8, 2004
DATE OF LAST CHANGE: Jan-23, 2005
EMAIL: doeben_at_fb2.fh-frankfurt.de

1. Einführung

Nach den vorausgehenden ersten Überblicken zur IA32-Architektur und dem Kontext eines Betriebsystems sollen nun wichtige Eigenschaften der IA32-Architetur anhand konkreter Programmbeispiele weiter diskutiert werden. Wir folgen hier im wesentlichen wieder dem Intel-Software Developer's Manual Vol.1-3 sowie dem exzellenten Buch von [BREY 2003].

START

2. Allgemeine Ausführungsumgebung

GeneralExecutionEnvironment

Die Intelbefehle folgen einem allgemeinen Schema, das im nachfolgenden Bild gezeigt wird. Wir werden davon vorläufig nur solche Fälle betrachten, in denen ein Opcode auftritt, ein ModR/M-Byte sowie ein unmittelbarer Operand.

intel-instruction-format

Die allgemeinen Befehle ('general-purpose instructions') bilden eine Untermenge der IA-32 Befehle. Sie wurden mit den ersten IA-32 Prozessoren (Intel 8086 and 8088) eingeführt. Die allgemeinen Befehle umfassen Verschiebungen von Daten, Speicheraddressierung, Arithmetik und Logik,Programmflusskontrolle, Input/Output und Stringoperationen auf einer Menge von Integer-, Pointer- und BCD Datentypen.

Ein Schlüssel zum Verständnis einer CPU ist der fetch-decode Zyklus, d.h. die CPU liest aus dem Arbeitsspeicher (RAM := Random Access Memory) eine bestimmte Anzahl von Bytes, dekodiert diese Bytes und aktiviert den Teil im Mikrobefehlsspeicher, der durch diese dekodierten Bytes identifiziert wird. diese Dekodierung erfolgt in der CPU durch spezielle Hardwarevorrichtungen. Durch Softare kann man diesen Vorgang zu Lehrzwecken dadurch simulieren, dass die Bytes hinsichtlich eines abstrakten CPU-Modells dekodiert werden (vergleiche Schaubild).

fetch-decode-cycle

fetch-decode cycle

Anhand eines einfachen Kodebeispiels aus der vorausgehenden Vorlesung "Von C zum binären Kode" soll im folgenden versucht werden, ein solches abstraktes Modell anhand der offiziellen Intel-Dokumente zu rekonstruieren.

START

3. Analyse eines Beispiels

Folgender Kode war mittels dem Befehl objdump angezeigt worden (alternativ kann man sich den Bytekode auch durch den speziellen Editor mc (:= midnight commander) anzeigen lassen (mc enthält weitere nützliche Funktionen)):

gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL4> objdump -d bsp1

...

0804835c <main>:
 804835c:       55                      push   %ebp
 804835d:       89 e5                   mov    %esp,%ebp
 804835f:       83 ec 08                sub    $0x8,%esp
 8048362:       83 e4 f0                and    $0xfffffff0,%esp
 8048365:       b8 00 00 00 00          mov    $0x0,%eax
 804836a:       29 c4                   sub    %eax,%esp
 804836c:       c7 45 fc 00 00 00 00    movl   $0x0,0xfffffffc(%ebp)
 8048373:       83 ec 08                sub    $0x8,%esp
 8048376:       6a 03                   push   $0x3
 8048378:       6a 01                   push   $0x1
 804837a:       e8 0d 00 00 00          call   804838c <sum>
 804837f:       83 c4 10                add    $0x10,%esp
 8048382:       89 45 fc                mov    %eax,0xfffffffc(%ebp)
 8048385:       b8 00 00 00 00          mov    $0x0,%eax
 804838a:       c9                      leave  
 804838b:       c3                      ret    

0804838c <sum>:
 804838c:       55                      push   %ebp
 804838d:       89 e5                   mov    %esp,%ebp
 804838f:       83 ec 04                sub    $0x4,%esp
 8048392:       8b 45 0c                mov    0xc(%ebp),%eax
 8048395:       03 45 08                add    0x8(%ebp),%eax
 8048398:       89 45 fc                mov    %eax,0xfffffffc(%ebp)
 804839b:       8b 45 fc                mov    0xfffffffc(%ebp),%eax
 804839e:       c9                      leave  
 804839f:       c3                      ret   
...

Das Programm objdump (das zu dem GNU Softwarepaket binutils gehört) interpretiert den Byteinhalt des Speicherbereichs, in dem das programm abgelegt ist, als eine Folge von Befehlen bestehend aus Befehlsmnemonic und Operanden.

Die aktuelle Verteilung der Speicherbereiche auf dem Rechner erfährt man aus der Datei /proc/iomem:

00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c9000-000cb7ff : Extension ROM
000e0000-000effff : Extension ROM
000f0000-000fffff : System ROM
00100000-3ffeffff : System RAM
  00100000-002ec700 : Kernel code
  002ec701-00395dff : Kernel data
3fff0000-3fff7fff : ACPI Tables
3fff8000-3fffffff : ACPI Non-volatile Storage
40000000-400003ff : 0000:00:1f.1
f3f00000-f7efffff : PCI Bus #01
  f4000000-f5ffffff : 0000:01:00.0
    f4000000-f4ffffff : vesafb
f8000000-fbffffff : 0000:00:00.0
fd900000-fe9fffff : PCI Bus #01
  fe000000-fe7fffff : 0000:01:00.0
  fe9fc000-fe9fffff : 0000:01:00.0
feafff00-feafffff : 0000:02:0b.0
  feafff00-feafffff : 8139too
febff900-febff9ff : 0000:00:1f.5
  febff900-febff9ff : Intel ICH5 - Controller
febffa00-febffbff : 0000:00:1f.5
  febffa00-febffbff : Intel ICH5 - AC'97
febffc00-febfffff : 0000:00:1d.7
  febffc00-febfffff : ehci_hcd
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
fff00000-ffffffff : reserved

Daraus kann man ersehen, dass das Programm oberhalb des Kernels lokalsiert ist.

Hier eine Analyse. Sie geht zunächst zeilenweise vor. Startpunkt der Routine main ist die Speicheradresse

0804835c.

0804835c <main>:

Wenn man das Beispielprogramm bsp1 mit dem gdb aufruft, dann sind zu Beginn keine Register gesetzt!

gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL6> gdb bsp1
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) info register
The program has no registers now.
(gdb) $pc
Undefined command: "".  Try "help".
(gdb) p/x $pc
No registers.
(gdb) info f
No stack.

Um Informationen über die Register zum Beginn des Programms zu bekommen, muss man das Programm starten. Damit aber die Daten vom Start nicht verlorengehen, muss man das Programm nach dem Start mit einem Breakpoint sofort wieder anhalten. Dies geschieht wie folgt:


gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL6> gdb bsp1
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) l
5        * idea: illustrate conversion to objectprogram
6        *
7        * cf. [BRYANT 2003:chapt. 3]
8        *
9        **********************************/
10
11
12      int main(){
13
14        return sum(1,3);
(gdb) b 5
Breakpoint 1 at 0x804835c: file bsp1_main.c, line 5.

Man kann sehen, dass dieser Breakpoint genau auf der ersten Adresse liegt, bei der das Programm laut objdump starten soll.

Gibt man jetzt die Register aus, dann sieht man, dass sie in Funktion sind:

(gdb) run
Starting program: /home/gerd/public_html/fh/II-INF3/II-INF3-TH/VL6/bsp1

Breakpoint 1, main () at bsp1_main.c:12
12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef7c       0xbfffef7c
ebp            0xbfffefd8       0xbfffefd8
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804835c        0x804835c
eflags         0x200246 2097734
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) p/x $pc
$1 = 0x804835c
(gdb) p/x $sp
$2 = 0xbfffef7c
(gdb) p/x $ps
$3 = 0x200246

Mit dem Befehl disas Von Bis kann man auch gezielt Speicherbereiche Disassemblieren lassen, etwa:

(gdb) disas 0x804835c 0x804835e
Dump of assembler code from 0x804835c to 0x804835e:
0x0804835c :    push   %ebp
0x0804835d :    mov    %esp,%ebp
End of assembler dump.

Man kann aber auch einzelne Befehle an bestimmten Speicherzellen anzeigen lassen. So disassembliert der Befehl x/i die Speicherzelle 0x804835c im Assemblerformat und der Befehl x/x zeigt den Inhalt im Hex-Format 0x83e58955, wobei hier nur das niederwertige Byte zählt, also 0x______55. Das aber ist der Kode für den Befehl push.

(gdb) x/i 0x804835c
0x804835c :       push   %ebp
(gdb) x/x 0x804835c
0x804835c :       0x83e58955

START

3.1 push

Das Kodewort 0x55 steht für den Befehl push:

804835c

push

%ebp

Schiebt Daten vom EBP-Register auf den Stack im SS-Bereich.
%ebp = 0xbfffefd8
%esp = 0xbfffef7c ---(-4)---> 0xbfffef78

Intel schreibt dazu:

The PUSH, POP, PUSHA (push all registers), and POPA (pop all registers) instructions move data to and from the stack. The PUSH instruction decrements the stack pointer (contained in the ESP register), then copies the source operand to the top of stack (see Figure). It operates on memory operands, immediate operands, and register operands (including segment registers). The PUSH instruction is commonly used to place parameters on the stack before calling a procedure. It can also be used to reserve space on the stack for temporary variables."

The PUSHA instruction saves the contents of the eight general-purpose registers on the stack (see Figure). This instruction simplifies procedure calls by reducing the number of instructions required to save the contents of the general-purpose registers. The registers are pushed on the stack in the following order: EAX, ECX, EDX, EBX, the initial value of ESP before EAX was pushed, EBP, ESI, and EDI.

The POP instruction copies the word or doubleword at the current top of stack (indicated by the ESP register) to the location specified with the destination operand. It then increments the ESP register to point to the new top of stack (see Figure ). The destination operand may specify a general-purpose register, a segment register, or a memory location.

The POPA instruction reverses the effect of the PUSHA instruction. It pops the top eight words or doublewords from the top of the stack into the general-purpose registers, except for the ESP register (see Figure ). If the operand-size attribute is 32, the doublewords on the stack are transferred to the registers in the following order: EDI, ESI, EBP, ignore doubleword, EBX, EDX, ECX, and EAX. The ESP register is restored by the action of popping the stack. If the operand-size attribute is 16, the words on the stack are transferred to the registers in the following order: DI, SI, BP, ignore word, BX, DX, CX, and AX.

In der allgemeinen Kode-Tabelle ist der spezielle Wert "0x55" nur indirekt spezifiziert. Dort steht, dass "0x50+r" gelten soll. "r" ist die Nummer eines Registers. Die Register werden von 0 ... 7 gezählt. Aus der folgenden (nur partiell sichtbaren) Tabelle ersieht man, dass mit der Nummer "5" das %ebp-Register gemeint ist:

Folgende allgemeine Register gibt es:

EAX Accumulator for operands and results data
EBX Pointer to data in the DS segment
ECX Counter for string and loop operations
EDX I/O pointer
ESI Pointer to data in the segment pointed to by the DS register; source pointer for string operations
EDI Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations
ESP Stack pointer (in the SS segment)
EBP Pointer to data on the stack (in the SS segment)

Demnach geht es also darum, den EBP Pointer auf den Stack zu legen:

(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef78       0xbfffef78
ebp            0xbfffefd8       0xbfffefd8
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804835d        0x804835d
eflags         0x200346 2097990
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) x/i 0x804835d
0x804835d :     mov    %esp,%ebp
(gdb) x/x 0x804835d
0x804835d :     0x......89

START

3.2 mov

Als nächstes finden wir den Befehl mov.

804835d

89 e5

mov

%esp,%ebp

Kopiert Daten vom EBP-Register zum ESP-Register

Zum Befehl mov heisst es:

"Move instructions. The MOV (move) and CMOVcc (conditional move) instructions transfer data between memory and registers or between registers. The MOV instruction performs basic load data and store data operations between memory and the processor's registers and data movement operations between registers. It handles data transfers along the paths listed in the Table below".

NOTES:

* The moffs8, moffs16, and moffs32 operands specify a simple offset relative to the segment base, where 8, 16, and 32 refer to the size of the data. The address-size attribute of the instruction determines the size of the offset, either 16 or 32 bits.

** In 32-bit mode, the assembler may insert the 16-bit operand-size prefix with this instruction (see the following Description section for further information).

Copies the second operand (source operand) to the first operand (destination operand).

The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, or a doubleword.

Aus der folgenden Tabelle kann man entnehmen, dass das zweite Byte des mov-Befehls kodiert: von gv nach ev

Ev: The ModR/M byte follows the opcode to specify a word or doubleword operand. Gv : The reg field of the ModR/M byte selects a general-purpose register

Damit stellt sich die Frage, was das ModR/M byte ist. Im Intels Software Developer's Manual Volume 2A: Instruction Set Reference, A-M findet sich unter Abschnitt 2.4 der folgende Text:

MODR/M AND SIB BYTES
Many instructions that refer to an operand in memory have an addressing-form specifier byte (called the ModR/M byte) following the primary opcode. The ModR/M byte contains three fields of information:

The mod field combines with the r/m field to form 32 possible values: eight registers and 24 addressing modes.
The reg/opcode field specifies either a register number or three more bits of opcode information. The purpose of the reg/opcode field is specified in the primary opcode.
The r/m field can specify a register as an operand or it can be combined with the mod field to encode an addressing mode. Sometimes, certain combinations of the mod field and the r/m field is used to express opcode information for some instructions.

Certain encodings of the ModR/M byte require a second addressing byte (the SIB byte). The base-plus-index and scale-plus-index forms of 32-bit addressing require the SIB byte. The SIB byte includes the following fields:

The scale field specifies the scale factor.
The index field specifies the register number of the index register.
The base field specifies the register number of the base register.

Damit ergibt sich ein Zusammenspiel von Opcode und MordR/M-Byte wie folgt:

Opcode						D	W

D = 0 := Daten fliessen von R/M zu REG

D = 1 := Daten fliessen von REG zu R/M

W = 1 := Datengrösse ist grundsätzlich 1 Byte

W = 1 := Datengrösse ist ein Wort oder ein Doppelwort

MOD		REG			R/M

MOD = 11 := Register Addressing Mode; dann spezifiziert das R/M-Feld ein Register und keinen Speicherbereich.

MOD = 00 := Data Memory Addressing with no displacement

MOD = 01 := Data Memory Addressing with an 8-Bit sign-extended displacement

MOD = 10 := Data Memory Addressing with an 16-Bit (32-Bit) displacement

Sign-extension := Das Bit für das Vorzeichen (0 oder 1) wird bis zum 16-Bit Rahmen aufgefüllt, aus '00h' wird '0000h' bzw. aus '80h' wird 'ff80h'

Im Falle von MOD = 11 ergibt sich ein weiterer Zusammenhang, der durch ie folgende Tabelle erläutrt wird:

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

R/M Code	16-Bit Addressing Mode
000	DS:[BX+SI]
001	DS:[BX+DI]
010	SS:[BP+SI]
011	SS:[BP+SI]
100	DS:[SI]
101	DS:[DI]
110	SS:[BP]
111	DS:[BX]

R/M Code	32-Bit Addressing Mode
000	DS:[EAX]
001	DS:[ECX]
010	DS:[EDX]
011	DS:[EBX]
100	uses scaled index byte
101	SS:[EBP]*
110	DS:[ESI]
111	DS:[EDI]

An dieser Stelle wird deutlich, dass man sich zur Interpretation der Befehlsworte auf die Bitebene begebgen muss. Dies bedeutet, dass man ein leines Hilfsprogramm benötigt, das diese Bits im Hinblick auf diese Schemata auswertet. Ein erster Schritt ist folgendes einfaches Programm. Es nimmt eine Hex-Zahl und vewandelt sie in eine 32-Bit-Zahl gruppiert zu je 4 Bits:

3.2.1 hex2bits


/**********************
 *
 *  hex2bits.c
 *
 ***************************
 *
 * FUNKTIONALITAET:
 *
 * Eine Hexzahl  wird in nx gespeichert.
 * Die entsprechenden Stringrepraesentation finden sich in nstring
 *
 * HILFSFUNKTIONEN
 *
 *
 * void bin2str(int n, char *string) : Uebersetzt eine  Integerzahl in eine Stringrepraesentation
 * void printbin(char *); Druckt eine Bin-Zahl in Gruppen zu 4 Bits
 *
 * KOMPILIEREN: gcc -o hex2bits hex2bits.c
 * AUFRUF:  hex2bits
 *
 **********************/

#include <stdio.h>

#define NUMBERSTRING_LENGTH 32  /* Der Typ 'int' hat 4 Bytes = 4*8 Bits = 32 Bits */


int main(void) {

   extern void bin2str(int, char *);
   extern void printbin(char *);

   int nx;
  char nstring[NUMBERSTRING_LENGTH+1];
 

  while(1) {

    /**********EINGABE EINER INTEGERZAHLEN n1************/

    printf("\n\n############################\n");

  printf("BITTE HEX-ZAHL: ");
  fscanf(stdin, "%x",&nx);

  printf("HEX: %x ==> DEC: %d = BIN: ",nx,nx);

bin2str(nx, nstring);

 printbin(nstring);


  } /* end of while */
} /* end of main */



 /**********UEBERSETZEN DER ZAHLEN IN 1-0-REPRAESENTATION**************/
/*
 * Index  NUMBERSTRING_LENGTH-1-i ist notwendig, um die 1en und 0en
 * von rechts nach links anzuordnen
 *
 **********************************************/

void bin2str(int n, char *str) {

  int i;

  for(i=0; i< NUMBERSTRING_LENGTH; i++) {
    if (n & (1 << i)) { str[ NUMBERSTRING_LENGTH-1-i]='1'; }  
    else { str[ NUMBERSTRING_LENGTH-1-i]='0'; }
  }
  str[ NUMBERSTRING_LENGTH]='\0';

}

 /**********AUSDRUCK IN GRUPPEN ZU 4 BITS**************/


void printbin(char *str) {

  int i,j;

  for(i=0; i< NUMBERSTRING_LENGTH; i+=4) {

    for(j=0;j<4;j++){

      printf("%c",str[i+j]);

    }//End of j

  printf("-");

  }//End of i

printf("\n");

}

START

Gibt man nun das Befehlswort im Hexformat ein, so erhält man:

gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL6> gcc -o hex2bits hex2bits.c
gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL6> ./hex2bits

############################
BITTE HEX-ZAHL: 89e5
HEX: 89e5 ==> DEC: 35301 = BIN: 0000-0000-0000-0000-1000-1001-1110-0101-

Trägt man diese Bitmuster in die Schemata für Opcode und ModR/M-Byte ein, so erhält man:

Opcode						D	W
1	0	0	0	1	0	0	1

D = 0 := Daten fliessen von R/M zu REG

W = 1 := Datengrösse ist ein Wort oder ein Doppelwort

MOD		REG			R/M
1	1	1	0	0	1	0	1

MOD = 11 := Register Addressing Mode; dann spezifiziert das R/M-Feld ein Register und keinen Speicherbereich.

Sign-extension := Das Bit für das Vorzeichen (0 oder 1) wird bis zum 16-Bit Rahmen aufgefüllt, aus '00h' wird '0000h' bzw. aus '80h' wird 'ff80h'

Im Falle von MOD = 11 lautet die Registerzuweisung für das Reg- und R/M-Feld in Abhängigkeit von W wie folgt:

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Dies ist genau das Ergebnis des Disassemblers: Der mov-Befehl schiebt Daten vom %ebp-Register in das %esp-Register.

An diesem beispiel kann man auch erkennen, dass sowohl ein Interpreter für ein Assembler-Programm wie auch umgekehrt ein Disassembler eine ziemlich komplexe Übersetzungsarbeit leisten müssen.

START

3.3 sub

Dann folgt der arithmetische Befehl sub:

804835f:	83 ec 08	sub	$0x8,%es

Intel schreibt dazu:

"The ADD (add integers), ADC (add integers with carry), SUB (subtract integers), and SBB (subtract integers with borrow) instructions perform addition and subtraction operations on signed or unsigned integer operands. The ADD instruction computes the sum of two integer operands. The ADC instruction computes the sum of two integer operands, plus 1 if the CF flag is set. This instruction is used to propagate a carry when adding numbers in stages. The SUB instruction computes the difference of two integer operands. The SBB instruction computes the difference of two integer operands, minus 1 if the CF flag is set. This instruction is used to propagate a borrow when subtracting numbers in stages."

Hier die Liste aller SUB-Befehle:

"Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The SUB instruction performs integer subtraction. It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate an overflow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In unserem Fall hätten wir:

83 /5 ib := Subtrahiere sign-extended immediate8 von r/m32 und speichere das Ergebnis im Register

Die Bit-Analyse liefert das folgende Ergebnis:

BITTE HEX-ZAHL: 83ec08
HEX: 83ec08 ==> DEC: 8645640 = BIN: 0000-0000-1000-0011-1110-1100-0000-1000-

Man sieht, wie die Hex-Zahl 0x8 von binär 1000 sign-extended wurde zu 0000-1000.

Eine Analyse von Opcode und ModeR/M-Byte liefert:

Opcode						D	W
1	0	0	0	1	0	0	1

D = 0 := Daten fliessen von R/M zu REG

W = 1 := Datengrösse ist ein Wort oder ein Doppelwort

MOD		REG			R/M
1	1	1	0	0	1	0	1

MOD = 11 := Register Addressing Mode; dann spezifiziert das R/M-Feld ein Register und keinen Speicherbereich.

Sign-extension := Das Bit für das Vorzeichen (0 oder 1) wird bis zum 16-Bit Rahmen aufgefüllt, aus '00h' wird '0000h' bzw. aus '80h' wird 'ff80h'

Im Falle von MOD = 11 lautet die Registerzuweisung für das Reg- und R/M-Feld in Abhängigkeit von W wie folgt:

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Demnach wird also die Konstante 0x8 vom Registerinhalt abgezogen und das Ergebnis wird wiederum dort gespeichert. Nicht erklärt ist der Unterschied zwischen dem Ergebnis des disassemblers, der das Register %es nennt und der Tabelle, die das Register %esp nennt. Allerdings fungiert das Register %es als Stapelregister.

START

3.4 and

Das Kodewort 0x83 steht für den Befehl and.

8048362:

83   e4 f0

and

$0xfffffff0,%esp

intel-and-befehl

Intel schreibt dazu: Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0. This instruction can be used with a LOCK prefix to allow the it to be executed atomically.

Gibt man das Bytemuster als hex-Zahl ein, so bekommt man:

BITTE HEX-ZAHL: 83e4f0
HEX: 83e4f0 ==> DEC: 8643824 = BIN: 0000-0000-1000-0011-1110-0100-1111-0000-

Opcode						D	W
1	0	0	0	0	0	1	1

D = 1 := Daten fliessen von REG zu R/M

W = 1 := Datengrösse ist ein Wort oder ein Doppelwort

MOD		REG			R/M
1	1	1	0	0	1	0	0

MOD = 11 := Register Addressing Mode; dann spezifiziert das R/M-Feld ein Register und keinen Speicherbereich.

Sign-extension := Das Bit für das Vorzeichen (0 oder 1) wird bis zum 16-Bit Rahmen aufgefüllt, aus '00h' wird '0000h' bzw. aus '80h' wird 'ff80h'

Im Falle von MOD = 11 lautet die Registerzuweisung für das Reg- und R/M-Feld in Abhängigkeit von W wie folgt:

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

START

3.4 move (2)

Das Kodewort 0xb8 steht für den Befehl mov und laut Tabelle (s.o.) handelt es sich um den Befehl mov immediate to register.

8048365:

b8 00 00 00 00

mov

    $0x0,%eax

 
BITTE HEX-ZAHL: b8000000
HEX: b8000000 ==> DEC: -1207959552 = BIN: 1011-1000-0000-0000-0000-0000-0000-0000-

Opcode						D	W
1	0	1	1	1	0	0	0

D = 0 := Daten fliessen vonR/M zu REG

W = 0 := Datengrösse ist ein Byte

MOD		REG			R/M
0	0	0	0	0	0	0	0

MOD = 00 := No Displacement; Adressierungs Modus

Sign-extension := Das Bit für das Vorzeichen (0 oder 1) wird bis zum 16-Bit Rahmen aufgefüllt, aus '00h' wird '0000h' bzw. aus '80h' wird 'ff80h'

Im Falle von MOD = 00 kodiert das R/M-Feld einen Speicheraddressierungs-Modus:

R/M Code	16-Bit Addressing Mode
000	DS:[BX+SI]
001	DS:[BX+DI]
010	SS:[BP+SI]
011	SS:[BP+SI]
100	DS:[SI]
101	DS:[DI]
110	SS:[BP]
111	DS:[BX]

R/M Code	32-Bit Addressing Mode
000	DS:[EAX]
001	DS:[ECX]
010	DS:[EDX]
011	DS:[EBX]
100	uses scaled index byte
101	SS:[EBP]*
110	DS:[ESI]
111	DS:[EDI]

Nicht ganz klar ist einerseits die Interpretation der Tabelle und andererseits die angabe, dass direkte Daten in das Register kopiert werden.

START

3.6 Debugging

Der Programmlauf aus Sicht des Debuggers:


gerd@kant:~/public_html/fh/II-INF3/II-INF3-TH/VL6> gdb bsp1
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) b 5
Breakpoint 1 at 0x804835c: file bsp1_main.c, line 5.
(gdb) r
Starting program: /home/gerd/public_html/fh/II-INF3/II-INF3-TH/VL6/bsp1

Breakpoint 1, main () at bsp1_main.c:12
12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef7c       0xbfffef7c
ebp            0xbfffefd8       0xbfffefd8
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804835c        0x804835c
eflags         0x200246 2097734
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) si
0x0804835d      12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef78       0xbfffef78
ebp            0xbfffefd8       0xbfffefd8
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804835d        0x804835d
eflags         0x200346 2097990
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) si
0x0804835f      12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef78       0xbfffef78
ebp            0xbfffef78       0xbfffef78
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804835f        0x804835f
eflags         0x200346 2097990
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) si
0x08048362      12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef70       0xbfffef70
ebp            0xbfffef78       0xbfffef78
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x8048362        0x8048362
eflags         0x200382 2098050
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) si
0x08048365      12      int main(){
(gdb) info register
eax            0xbffff004       -1073745916
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef70       0xbfffef70
ebp            0xbfffef78       0xbfffef78
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x8048365        0x8048365
eflags         0x200382 2098050
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) si
0x0804836a      12      int main(){
(gdb) info register
eax            0x0      0
ecx            0x400474c5       1074033861
edx            0x1      1
ebx            0x40143bd0       1075067856
esp            0xbfffef70       0xbfffef70
ebp            0xbfffef78       0xbfffef78
esi            0x400168c0       1073834176
edi            0x80483a0        134513568
eip            0x804836a        0x804836a
eflags         0x200382 2098050
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb)

START

4. Benutzte Befehle in gcc/gdb

gcc Befehle	Bedeutung
Editor datei.{c,s}	Editiere eine {C,Assembler}-Datei
gcc -S datei.c ...	Erzeuge eine Assembler-Datei aus einer C-Datei
gcc -g -o ziel datei.s ...	Erzeuge eine Binärdatei mit Namen ziel(-o) aus Assemblerdateien, die von gdb gedebugt werden kann (-g)

gdb-Befehl	Bedeutung

(gdb) disas 0x32c4 0x32e4	Disassambliere den Speicherbereich von 0x32c4 bis 0x32e4
x/i Adresse	Disassembliere Adresse im Assemblerformat
x/x Adresse	Disassembliere Adresse im Hex-format

p/x ......	Drucke den Inhalt von ... hexadizimal auf den Bildschirm
$pc [:= %eip]	Befehlszähler
$sp [:= %esp]	Stack
$ps	Prozessorstatus

stepi [si]	Führe genau einen Befehl (Instruktion) aus

START

5. Testfragen und Übungen

Wie heisst der Operations-Zyklus, durch den Instruktionsbefehle aus dem Speicher zur Verarbeitung durch eine CPU gelangen?
Welche Elemente aus dem allgemeinen Instruktionsschema wurden in der Vorlesung behandelt? Welche fehlen noch?
Welche Funktion hat das ModR/M-Byte?
Mit welchem SW-Werkzeug kann man sich unter Linux ein im Speicher befindliches Binärprogramm in Form von Assemblerbefehlen und Bytekode anzeigen lassen?
Was bewirkt der gdb-Befehl disas 0x32c4 0x32e4?
Was bewirkt der gdb-Befehl x/i Adresse?
Was bewirkt der gdb-Befehl x/x Adresse?
Was bewirkt der gdb-Befehl p/x ......?
Was bewirkt der gdb-Befehl $pc ?
Was bewirkt der gdb-Befehl $sp ?
Was bewirkt der gdb-Befehl $ps ?
Was bewirkt der gdb-Befehl stepi?

START

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

II-INFORMATIK3 WS04 VL7-11: Programmierung mit den allgemeinen Befehlen der IA32-Architektur

1. Einführung

2. Allgemeine Ausführungsumgebung

3. Analyse eines Beispiels

3.1 push

3.2 mov

3.2.1 hex2bits

3.3 sub

3.4 and

3.4 move (2)

3.6 Debugging

4. Benutzte Befehle in gcc/gdb

5. Testfragen und Übungen

II-INFORMATIK3 WS04
VL7-11: Programmierung mit den allgemeinen Befehlen der IA32-Architektur

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI

Code	W = 0 (Byte)	W = 1 (Word)	W = 1 (Doubleword)
000	AL	AX	EAX
001	CL	CX	ECX
010	DL	DX	EDX
011	BL	BX	EBX
100	AH	SP	ESP
101	CH	BP	EBP
110	DH	SI	ESI
111	BH	DI	EDI