:class: tip
This lecture will cover contents from Chapter 6 and Chapter 7 of the book.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
## Intel x86 processors
- Dominate laptop/desktop/server market
- Evolutionary design
- Backwards compatible up until 8086, introduced in 1978
- Added more features as time goes on
- x86 is a Complex Instruction Set Computer (CISC)
- Many different instructions with many different formats
- But, only small subset encountered with Linux programs
- Compare: Reduced Instruction Set Computer (RISC)
- RISC: *very few* instructions, with *very few* modes for each
- RISC can be quite fast (but Intel still wins on speed!)
- Current RISC renaissance (e.g., ARM, RISC V), especially for low-power
- Building blocks modern electronics
- Two transistors used to design an AND gate:
<figure
>
<picture>
<!-- Auto scaling with imagemagick -->
<!--
See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->
<source
class="responsive-img-srcset"
srcset="/assets/img/courses/csc231/04-machine/TransistorANDgate-480.webp 480w,/assets/img/courses/csc231/04-machine/TransistorANDgate-800.webp 800w,/assets/img/courses/csc231/04-machine/TransistorANDgate-1400.webp 1400w,"
type="image/webp"
sizes="95vw"
>
<img
src="/assets/img/courses/csc231/04-machine/TransistorANDgate.png"
width="50%"
height="auto"
alt="AND gate created using transistors"
data-zoomable
loading="lazy"
onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>
</picture>
</figure>
:::::{tab-set}
::::{tab-item} Intel
| Name | Date | Transistor Counts |
| --------------- | ---- | ----------------- |
| 386 | 1985 | 0.3M |
| Pentium | 1993 | 3.1M |
| Pentium/MMX | 1997 | 4.5M |
| Pentium Pro | 1995 | 6.5M |
| Pentium III | 1999 | 8.2M |
| Pentium 4 | 2000 | 42M |
| Core 2 Duo | 2006 | 291M |
| Core i7 | 2008 | 731M |
| Core i7 Skylake | 2015 | 1.75B |
- Added features
- Instructions to support multimedia operations
- Instructions to enable more efficient conditional operations (**!**)
- Transition from 32 bits to 64 bits
- More cores
::::
::::{tab-item} AMD
| Name | Date | Transistor Counts |
| --------------- | ---- | ----------------- |
| AMD K5 | 1996 | 4.3M |
| AMD K6 | 1997 | 8.8M |
| AMD K6/III | 1998 | 21.3M |
| AMD K7 | 1999 | 22.0M |
| AMD K8 | 2003 | 105.9M |
| AMD Opteron | 2009 | 904M |
| AMD Bulldozer | 2012 | 1.2B |
| AMD Ryzen 5 | 2017 | 4.8B |
| AMD Epyc | 2017 | 19.2B |
- x86 clones: Advanced Micro Devices (AMD)
- Historically
- AMD has followed just behind Intel
- A little bit slower, a lot cheaper
- Then
- Recruited top circuit designers from Digital Equipment Corp.
and other downward trending companies
- Built Opteron: tough competitor to Pentium 4
- Developed x86-64, their own extension to 64 bits
- Recent Years
- Intel got its act together
- 1995-2011: Lead semiconductor “fab” in world
- 2018: #2 largest by $$ (#1 is Samsung)
- 2019: reclaimed #1
- AMD fell behind
- Relies on external semiconductor manufacturer GlobalFoundaries
- ca. 2019 CPUs (e.g., Ryzen) are competitive again
- 2020 Epyc
::::
:::::
Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code Machine Code: The byte-level programs that a processor executesAssembly Code: A text representation of machine codeMicroarchitecture: Implementation of the architecture %rip (instruction pointer register)if/while)
csc231, create another directory called 04-machine and change into this directory.mstore.c with the following contents:
1
2
3
4
$ gcc -Og -S mstore.c
$ cat mstore.s
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
. are directirves to the assembler and linker.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## Assembly language
- Symbolic coding
- Very strong correspondence between the language syntax and the
microarchitecture's machine code instructions
- ../figure from **Programming the IBM 1401 Manual** (1962)
<figure
>
<picture>
<!-- Auto scaling with imagemagick -->
<!--
See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->
<source
class="responsive-img-srcset"
srcset="/assets/img/courses/csc231/04-machine/assembly-01-480.webp 480w,/assets/img/courses/csc231/04-machine/assembly-01-800.webp 800w,/assets/img/courses/csc231/04-machine/assembly-01-1400.webp 1400w,"
type="image/webp"
sizes="95vw"
>
<img
src="/assets/img/courses/csc231/04-machine/assembly-01.png"
width="50%"
height="auto"
alt="Programming the IBM 1401"
data-zoomable
loading="lazy"
onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>
</picture>
</figure>
:::::{tab-set} ::::{tab-item} Intel data type
| C data type | Intel data type | Assembly-code suffix | Size |
|---|---|---|---|
| char | Byte | b | 1 |
| short | Word | w | 2 |
| int | Double word | l | 4 |
| long | Quad word | q | 8 |
| char * | Quad word | q | 8 |
| float | Single precision | s | 4 |
| double | Double precision | l | 8 |
:::: :::::
general purpose registers storing 64-bit values.%ax through %sp. %ax: accumulate%cx: counter%dx: data%bx: base%si: source index%di: destination index%sp: stack pointer%bp: base pointer%eax through %esp.%rax through %rsp. Eight new registered were added: %r8 through %r15.
Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
## Data movement
- Example: `movq Source, Dest`
- Note: This is ATT notation. Intel uses `mov Dest, Source`
- Operand Types for `Source` and `Dest`:
- Immediate (Imm): Constant integer data.
- `$0x400`, `$-533`.
- Like C constant, but prefixed with `$`.
- Encoded with 1, 2, or 4 bytes.
- Register (Reg): One of 16 integer registers
- Example: `%rax`, `%r13`
- `%rsp` reserved for special use.
- Others have special uses in particular instructions.
- Memory (Mem): 8 (`q` in `movq`) consecutive bytes of memory at
address given by register.
- Example: `(%rax)`
- Various other **addressing mode** (See textbook page 181, ../figure 3.3).
- Other `mov`:
- `movb`: move byte
- `movw`: move word
- `movl`: move double word
- `movq`: move quad word
- `moveabsq`: move absolute quad word
:::::{tab-set}
::::{tab-item}
| `movq` | Source | Dest | Src, Dest | C Analog |
| ------ | ------ | ----- | ------------------- | ------------ |
| | Imm | Reg | `movq $0x4, %rax` | tmp = 0x4; |
| | Imm | Mem | `movq $-147,(%rax)` | *p = -147; |
| | Reg | Reg | `movq %rax,%rdx` | tmp2 = tmp1; |
| | Reg | Mem | `movq %rax,(%rdx)` | *p = tmp; |
| | Mem | Reg | `movq (%rax),%rdx` | tmp = *p; |
::::
:::::
- Normal: (R) Mem[Reg[R]]
- Register R specifies memory address
- Aha! Pointer dereferencing in C
- `movq (%rcx),%rax`
- Displacement D(R) Mem[Reg[R]+D]
- Register R specifies start of memory region
- Constant displacement D specifies offset
- `movq 8(%rbp),%rdx`
[Brown University - Dr. Doeppner](https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf)
- Create a file named `swap.c` in `04-machine` with the following contents:
<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=swap.c"></script>
- Run the following commands
~~~bash
$ gcc -Og -c swap.c
$ objdump -d swap.o
~~~
<figure
>
<picture>
<!-- Auto scaling with imagemagick -->
<!--
See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->
<source
class="responsive-img-srcset"
srcset="/assets/img/courses/csc231/04-machine/07-480.webp 480w,/assets/img/courses/csc231/04-machine/07-800.webp 800w,/assets/img/courses/csc231/04-machine/07-1400.webp 1400w,"
type="image/webp"
sizes="95vw"
>
<img
src="/assets/img/courses/csc231/04-machine/07.png"
width="50%"
height="auto"
alt="swapping via single-valued pointers"
data-zoomable
loading="lazy"
onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>
</picture>
</figure>
- [Why `%rsi` and `%rdi`?](http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html)
- Procedure Data Flow:
- First six parameters of a function will be placed into
`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`.
- The remaining parameters will be pushed on to the stack of the calling function.
- Create a file named `swap_dsp.c` in `04-machine` with the following contents:
<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=swap_dsp.c"></script>
- Run the following commands
~~~
$ gcc -Og -c swap_dsp.c
$ objdump -d swap_dsp.o
~~~
<figure
>
<picture>
<!-- Auto scaling with imagemagick -->
<!--
See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->
<source
class="responsive-img-srcset"
srcset="/assets/img/courses/csc231/04-machine/08-480.webp 480w,/assets/img/courses/csc231/04-machine/08-800.webp 800w,/assets/img/courses/csc231/04-machine/08-1400.webp 1400w,"
type="image/webp"
sizes="95vw"
>
<img
src="/assets/img/courses/csc231/04-machine/08.png"
width="50%"
height="auto"
alt="swapping position in an array via pointers"
data-zoomable
loading="lazy"
onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>
</picture>
</figure>
- What is the meaning of `0x190`?
- Most General Form
- `D(Rb,Ri,S)`: `Mem[Reg[Rb]+S*Reg[Ri]+ D]`
- D: Constant **displacement** 1, 2, or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for `%rsp`
- S: Scale: 1, 2, 4, or 8
- Special Cases
- `(Rb,Ri)`: `Mem[Reg[Rb]+Reg[Ri]]`
- `D(Rb,Ri)`: `Mem[Reg[Rb]+Reg[Ri]+D]`
- `(Rb,Ri,S)`: `Mem[Reg[Rb]+S*Reg[Ri]]`
- `(,Ri,S)`: `Mem[S*Reg[Ri]]`
- `D(,Ri,S)`: `Mem[S*Reg[Ri] + D]`
lea: load effective addressmovq intsruction lea S, D: Write &S to D.m12.c in 04-machine with the following contents:
1
2
$ gcc -Og -c m12.c
$ objdump -d m12.o
m12.c: %rdi: x(%rdi, %rdi,2) = x + 2 * x%rdx with lea.0x0(,%rdx,4) = 4 * (x + 2 * x) = 12*x%rax with lea.Src: S Dest: D :::::{tab-set} ::::{tab-item} Arithmetic operations | Format | Computation | Description | | ————— | ———– | —————— |
| add Src,Dest | D <- D + S | add |
| sub Src,Dest | D <- D - S | subtract |
| imul Src,Dest | D <- D * S | multiply |
| ————— | ———– | —————— |
| shl Src,Dest | D <- D « S | shift left |
| sar Src,Dest | D <- D >S | arith. shift right |
| shr Src,Dest | D <- D >S | shift right |
| sal Src,Dest | D <- D « S | arith. shift left |
| ————— | ———– | —————— |
| xor Src,Dest | D <- D ^ S | exclusive or |
| and Src,Dest | D <- D & S | and |
| or Src,Dest | D <- D | S | or |
| ————— | ———– | —————— |
| inc Src | D <- D + 1 | increment |
| dec Src | D <- D - 1 | decrement |
| neg Src | D <- -D | negate |
| not Src | D <- -D | complement |
:::: :::::
sar) where the significant bit is retained.scale.c in 04-machine with the following contents:
1
2
$ gcc -Og -c scale.c
$ objdump -d scale.o
:::{admonition} Solution
%rdi: x%rsi: y%rdx: z%rax contains the final return value.:::
arith.c in 04-machine with the following contents:
1
2
$ gcc -Og -c arith.c
$ objdump -d arith.o
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
## Logical operations
- Information about currently executing program
- temporary data (`%rax`,...)
- location of runtime stack (`%rsp`)
- location of current code control point (`%rip`,...)
- status of recent tests (`CF`, `ZF`, `SF`, `OF` in `%EFLAGS`)
<figure
>
<picture>
<!-- Auto scaling with imagemagick -->
<!--
See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->
<source
class="responsive-img-srcset"
srcset="/assets/img/courses/csc231/04-machine/12-480.webp 480w,/assets/img/courses/csc231/04-machine/12-800.webp 800w,/assets/img/courses/csc231/04-machine/12-1400.webp 1400w,"
type="image/webp"
sizes="95vw"
>
<img
src="/assets/img/courses/csc231/04-machine/12.png"
width="50%"
height="auto"
alt="processor state"
data-zoomable
loading="lazy"
onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>
</picture>
</figure>
- Single-bit registers
- `CF`: the most recent operation generated a carry out of the most significant bit.
- `ZF`: the most recent operation yielded zero.
- `SF`: the most recent operation yielded negative.
- `OF`: the most recent operation caused a two's-complement overflow.
- Implicitly set (as side effect) of arithmetic operations.
- Exlicit setting by Compare instruction
- `cmpq Src2, Src1`
- `cmpq b, a` like computing `a - b` without setting destination
- `CF` set if carry/borrow out from most significant bit (unsigned comparisons)
- `ZF` set if `a == b`
- `SF` set if `(a - b) < 0`
- `OF` set if two's complement (signed) overflow
- `(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)`
- Jump to different part of code depending on condition codes
- Implicit reading of condition codes
:::::{tab-set}
::::{tab-item} Jump operations
| jX | Condition | Description |
| ----- | -------------- | -------------------- |
| `jmp` | 1 | direct jump |
| `je` | ZF | equal/zero |
| `jne` | ~ZF | not equal/not zero |
| `js` | SF | negative |
| `jns` | ~SF | non-negative |
| `jg` | ~(SF^OF) & ~ZF | greater |
| `jge` | ~(SF^OF) | greater or equal to |
| `jl` | SF^OF | lesser |
| `jle` | SF^OF \| ZF | lesser or equal to |
| `ja` | ~CF & ~ZF | above |
| `jb` | CF | below |
::::
:::::
- Create a file named `jump.c` in `04-machine` with the following contents:
<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=jump.c"></script>
- Run the following commands
~~~
$ gcc -Og -c jump.c
$ objdump -d jump.o
~~~
- Understand how the Assembly code enables jump across instructions to support conditional workflow.
><iframe width="560" height="315" src="https://www.youtube.com/embed/OSZggdT9hgY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
- In the next video, we will look at how `cmp` and `jle` of `absdiff` really behave in an actual execution.
<iframe width="560" height="315" src="https://www.youtube.com/embed/mOwufRcZS8M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
- Create a file named `factorial.c` in `04-machine` with the following contents:
<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=factorial.c"></script>
- Run the following commands
~~~
$ gcc -Og -c factorial.c
$ objdump -d factorial.o
~~~
- Understand how the Assembly code enables jump across instructions to support loop.
<iframe width="560" height="315" src="https://www.youtube.com/embed/0N2srmuKcVs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
- Create `factorial_2.c` and `factorial_3.c` from `factorial.c`.
- Modify `factorial_2.c` so that the factorial is implemented with a `while` loop. Study the
resulting Assembly code.
- Modify `factorial_3.c` so that the factorial is implemented with a `for` loop. Study the
resulting Assembly code.
- Behavior of `factorial` Assembly instructions inside GDB
<iframe width="560" height="315" src="https://www.youtube.com/embed/2VdCt8hhiRY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
P calls procedure Q.Q P after Q P passes one or more parameters to Q.Q returns a value back to P.Q needs to allocate space for local variables and free that storage once finishes.Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).
%rsp contains lowest stack address.
pushq Src Src %rsp by 8%rsp popq Dest %rsp %rsp by 8
-Og flag often combines/reduces these steps.$rbp) and a stack pointer ($rsp). $rbp to that value) to the current address pointed to by the stack pointer.push $rbpmov $rsp, $rbpsub N, $rspmov $rbp, $rsppop $rbpretmult.c in 04-machine with the following contents:-g flag and run gdb on the resulting executable.
1
2
$ gcc -g -o mult mult.c
$ gdb mult
main and start running.si: executing the next instruction (machine or code instruction). code section.ni if we don’t want to step into that instruction.endbr64 is a new instruction to help enforce Control Flow Technology to prevent potential stitching of malicious Assembly codes.
K bytes must have an address that is multiple of K: 1 for char, 2 for short, 4 for int and float, and 8 for long, double, and char *.```