Machine language

:class: tip

This lecture will cover contents from Chapter 6 and Chapter 7 of the book.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
## Intel x86 processors


- Dominate laptop/desktop/server market
- Evolutionary design
  - Backwards compatible up until 8086, introduced in 1978
  - Added more features as time goes on
- x86 is a Complex Instruction Set Computer (CISC)
  - Many different instructions with many different formats
  - But, only small subset encountered with Linux programs
- Compare: Reduced Instruction Set Computer (RISC)
  - RISC: *very few* instructions, with *very few* modes for each
  - RISC can be quite fast (but Intel still wins on speed!)
  - Current RISC renaissance (e.g., ARM, RISC V), especially for low-power


- Building blocks modern electronics 
- Two transistors used to design an AND gate:





<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc231/04-machine/TransistorANDgate-480.webp 480w,/assets/img/courses/csc231/04-machine/TransistorANDgate-800.webp 800w,/assets/img/courses/csc231/04-machine/TransistorANDgate-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc231/04-machine/TransistorANDgate.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
        alt="AND gate created using transistors"
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>




:::::{tab-set}
::::{tab-item} Intel

| Name            | Date | Transistor Counts |  
| --------------- | ---- | ----------------- |  
| 386             | 1985 | 0.3M              |   
| Pentium         | 1993 | 3.1M              |  
| Pentium/MMX     | 1997 | 4.5M              |  
| Pentium Pro     | 1995 | 6.5M              |  
| Pentium III     | 1999 | 8.2M              |  
| Pentium 4       | 2000 | 42M               |  
| Core 2 Duo      | 2006 | 291M              |  
| Core i7         | 2008 | 731M              |  
| Core i7 Skylake | 2015 | 1.75B             |  


- Added features
  - Instructions to support multimedia operations
  - Instructions to enable more efficient conditional operations (**!**)
  - Transition from 32 bits to 64 bits
  - More cores
::::
::::{tab-item} AMD

| Name            | Date | Transistor Counts |  
| --------------- | ---- | ----------------- |  
| AMD K5          | 1996 | 4.3M              |   
| AMD K6          | 1997 | 8.8M              |  
| AMD K6/III      | 1998 | 21.3M             |  
| AMD K7          | 1999 | 22.0M             |  
| AMD K8          | 2003 | 105.9M            |  
| AMD Opteron     | 2009 | 904M              |  
| AMD Bulldozer   | 2012 | 1.2B              |  
| AMD Ryzen 5     | 2017 | 4.8B              |  
| AMD Epyc        | 2017 | 19.2B             | 

- x86 clones: Advanced Micro Devices (AMD)
- Historically
  - AMD has followed just behind Intel
  - A little bit slower, a lot cheaper
- Then
  - Recruited top circuit designers from Digital Equipment Corp. 
  and other downward trending companies
  - Built Opteron: tough competitor to Pentium 4
  - Developed x86-64, their own extension to 64 bits
- Recent Years
  - Intel got its act together
    - 1995-2011: Lead semiconductor “fab” in world
    - 2018: #2 largest by $$ (#1 is Samsung)
    - 2019: reclaimed #1
- AMD fell behind
  - Relies on external semiconductor manufacturer GlobalFoundaries
  - ca. 2019 CPUs (e.g., Ryzen) are competitive again
  - 2020 Epyc
::::
:::::

Machine programming: levels of abstraction

Level of abstraction
Assembly programmer
1
2
3
4
$ gcc -Og -S mstore.c
$ cat mstore.s
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
Assembly code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66


## Assembly language


- Symbolic coding
- Very strong correspondence between the language syntax and the 
microarchitecture's machine code instructions
- ../figure from **Programming the IBM 1401 Manual** (1962)





<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc231/04-machine/assembly-01-480.webp 480w,/assets/img/courses/csc231/04-machine/assembly-01-800.webp 800w,/assets/img/courses/csc231/04-machine/assembly-01-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc231/04-machine/assembly-01.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
        alt="Programming the IBM 1401"
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>


:::::{tab-set} ::::{tab-item} Intel data type

C data type Intel data type Assembly-code suffix Size
char Byte b 1
short Word w 2
int Double word l 4
long Quad word q 8
char * Quad word q 8
float Single precision s 4
double Double precision l 8

:::: :::::

General purpose registers

Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
## Data movement


- Example: `movq Source, Dest`
- Note: This is ATT notation. Intel uses `mov Dest, Source`
- Operand Types for `Source` and `Dest`:
  - Immediate (Imm): Constant integer data. 
     - `$0x400`, `$-533`. 
     - Like C constant, but prefixed with `$`.
     - Encoded with 1, 2, or 4 bytes. 
  - Register (Reg): One of 16 integer registers
     - Example: `%rax`, `%r13`
     - `%rsp` reserved for special use. 
     - Others have special uses in particular instructions. 
  - Memory (Mem): 8 (`q` in `movq`) consecutive bytes of memory at 
  address given by register. 
     - Example: `(%rax)`
     - Various other **addressing mode** (See textbook page 181, ../figure 3.3). 
- Other `mov`:
  - `movb`: move byte
  - `movw`: move word
  - `movl`: move double word
  - `movq`: move quad word
  - `moveabsq`: move absolute quad word



:::::{tab-set}
::::{tab-item} 

| `movq` | Source | Dest  | Src, Dest           |  C Analog    |
| ------ | ------ | ----- | ------------------- | ------------ |
|        | Imm    | Reg   | `movq $0x4, %rax`   | tmp = 0x4;   |
|        | Imm    | Mem   | `movq $-147,(%rax)` | *p = -147;   |
|        | Reg    | Reg   | `movq %rax,%rdx`    | tmp2 = tmp1; |
|        | Reg    | Mem   | `movq %rax,(%rdx)`  | *p = tmp;    |
|        | Mem    | Reg   | `movq (%rax),%rdx`  | tmp = *p;    |
::::
:::::



- Normal:	(R)	Mem[Reg[R]]
  - Register R specifies memory address
  - Aha! Pointer dereferencing in C
  - `movq (%rcx),%rax`
- Displacement	D(R)	Mem[Reg[R]+D]
  - Register R specifies start of memory region
  - Constant displacement D specifies offset
  - `movq 8(%rbp),%rdx`



[Brown University - Dr. Doeppner](https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf)



- Create a file named `swap.c` in `04-machine` with the following contents:

<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=swap.c"></script>

- Run the following commands 

~~~bash
$ gcc -Og -c swap.c
$ objdump -d swap.o
~~~





<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc231/04-machine/07-480.webp 480w,/assets/img/courses/csc231/04-machine/07-800.webp 800w,/assets/img/courses/csc231/04-machine/07-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc231/04-machine/07.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
        alt="swapping via single-valued pointers"
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>


- [Why `%rsi` and `%rdi`?](http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html)
- Procedure Data Flow:
  - First six parameters of a function will be placed into 
  `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`. 
  - The remaining parameters will be pushed on to the stack of the calling function.




- Create a file named `swap_dsp.c` in `04-machine` with the following contents:

<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=swap_dsp.c"></script>

- Run the following commands 

~~~
$ gcc -Og -c swap_dsp.c
$ objdump -d swap_dsp.o
~~~





<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc231/04-machine/08-480.webp 480w,/assets/img/courses/csc231/04-machine/08-800.webp 800w,/assets/img/courses/csc231/04-machine/08-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc231/04-machine/08.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
        alt="swapping position in an array via pointers"
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>


- What is the meaning of `0x190`?



- Most General Form
  - `D(Rb,Ri,S)`: `Mem[Reg[Rb]+S*Reg[Ri]+ D]`
  - D: 	Constant **displacement** 1, 2, or 4 bytes
  - Rb: Base register: Any of 16 integer registers
  - Ri:	Index register: Any, except for `%rsp`
  - S: Scale: 1, 2, 4, or 8 
- Special Cases
  - `(Rb,Ri)`:	`Mem[Reg[Rb]+Reg[Ri]]`
  - `D(Rb,Ri)`: `Mem[Reg[Rb]+Reg[Ri]+D]`
  - `(Rb,Ri,S)`: `Mem[Reg[Rb]+S*Reg[Ri]]`
  - `(,Ri,S)`: `Mem[S*Reg[Ri]]`
  - `D(,Ri,S)`: `Mem[S*Reg[Ri] + D]`

Arithmetic operations

1
2
$ gcc -Og -c m12.c
$ objdump -d m12.o
demonstrating load effective address

:::::{tab-set} ::::{tab-item} Arithmetic operations | Format | Computation | Description | | ————— | ———– | —————— |
| add Src,Dest | D <- D + S | add |
| sub Src,Dest | D <- D - S | subtract |
| imul Src,Dest | D <- D * S | multiply |
| ————— | ———– | —————— |
| shl Src,Dest | D <- D « S | shift left |
| sar Src,Dest | D <- D >S | arith. shift right |
| shr Src,Dest | D <- D >S | shift right |
| sal Src,Dest | D <- D « S | arith. shift left |
| ————— | ———– | —————— |
| xor Src,Dest | D <- D ^ S | exclusive or |
| and Src,Dest | D <- D & S | and |
| or Src,Dest | D <- D | S | or |
| ————— | ———– | —————— |
| inc Src | D <- D + 1 | increment |
| dec Src | D <- D - 1 | decrement |
| neg Src | D <- -D | negate |
| not Src | D <- -D | complement |
:::: :::::

1
2
$ gcc -Og -c scale.c
$ objdump -d scale.o
Scale.c

:::{admonition} Solution

:::

1
2
$ gcc -Og -c arith.c
$ objdump -d arith.o
arith.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163


## Logical operations


- Information about currently executing program
  - temporary data (`%rax`,...)
  - location of runtime stack (`%rsp`)
  - location of current code control point (`%rip`,...)
  - status of recent tests (`CF`, `ZF`, `SF`, `OF` in `%EFLAGS`)





<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc231/04-machine/12-480.webp 480w,/assets/img/courses/csc231/04-machine/12-800.webp 800w,/assets/img/courses/csc231/04-machine/12-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc231/04-machine/12.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
        alt="processor state"
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>





- Single-bit registers
  - `CF`: the most recent operation generated a carry out of the most significant bit. 
  - `ZF`: the most recent operation yielded zero.
  - `SF`: the most recent operation yielded negative. 
  - `OF`: the most recent operation caused a two's-complement overflow. 
- Implicitly set (as side effect) of arithmetic operations. 




- Exlicit setting by Compare instruction
  - `cmpq Src2, Src1`
  - `cmpq b, a` like computing `a - b` without setting destination
- `CF` set if carry/borrow out from most significant bit (unsigned comparisons) 
- `ZF` set if `a == b`
- `SF` set if `(a - b) < 0` 
- `OF` set if two's complement (signed) overflow
  - `(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)`




- Jump to different part of code depending on condition codes
- Implicit reading of condition codes

:::::{tab-set}
::::{tab-item} Jump operations

| jX    | Condition      |  Description         |
| ----- | -------------- | -------------------- |  
| `jmp` | 1              | direct jump          |   
| `je`  | ZF             | equal/zero           |  
| `jne` | ~ZF            | not equal/not zero   |  
| `js`  | SF             | negative             |  
| `jns` | ~SF            | non-negative         |  
| `jg`  | ~(SF^OF) & ~ZF | greater              |  
| `jge` | ~(SF^OF)       | greater or equal to  |  
| `jl`  | SF^OF          | lesser               |  
| `jle` | SF^OF \| ZF    | lesser or equal to   |  
| `ja`  | ~CF & ~ZF      | above                |    
| `jb`  | CF             | below                |  

::::
:::::




- Create a file named `jump.c` in `04-machine` with the following contents:

<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=jump.c"></script>

- Run the following commands 

~~~
$ gcc -Og -c jump.c
$ objdump -d jump.o
~~~

- Understand how the Assembly code enables jump across instructions to support conditional workflow. 

><iframe width="560" height="315" src="https://www.youtube.com/embed/OSZggdT9hgY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

- In the next video, we will look at how `cmp` and `jle` of `absdiff` really behave in an actual execution. 

<iframe width="560" height="315" src="https://www.youtube.com/embed/mOwufRcZS8M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>



- Create a file named `factorial.c` in `04-machine` with the following contents:

<script src="https://gist.github.com/linhbngo/d1e9336a82632c528ea797210ed0f553.js?file=factorial.c"></script>

- Run the following commands 

~~~
$ gcc -Og -c factorial.c
$ objdump -d factorial.o
~~~

- Understand how the Assembly code enables jump across instructions to support loop. 

<iframe width="560" height="315" src="https://www.youtube.com/embed/0N2srmuKcVs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

- Create `factorial_2.c` and `factorial_3.c` from `factorial.c`. 
- Modify `factorial_2.c` so that the factorial is implemented with a `while` loop. Study the 
resulting Assembly code. 
- Modify `factorial_3.c` so that the factorial is implemented with a `for` loop. Study the 
resulting Assembly code. 
- Behavior of `factorial` Assembly instructions inside GDB

<iframe width="560" height="315" src="https://www.youtube.com/embed/2VdCt8hhiRY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Mechanisms in procedures (functions)

Stack frames
Stack push and pop
1
2
$ gcc -g -o mult mult.c
$ gdb mult

```