OS: Note of Chapter 4
This note is missing something, I shall fix the note days later.
How Does Operating System Run?
- CPU can only run the instructions from RAM, but the OS is in the disk. So the hardware manufactures have made a piece of program in ROM, to allow the system be loaded to RAM. This is called BIOS (Basic Input/Output System).
- BIOS is the first program to run when the power key is pressed. It will be loaded to RAM, and provide basic configurations for the hardware, and then load the OS to RAM.
- BIOS can firstly check the hardware, and then load the first 512B of OS to RAM address 0x7C00, an agreed address, and then put PC to 0x7C00 to run the OS.
A valid final 2 bytes of the first 512B of the OS is 0x55AA, in binary 0101010110101010, which is a signature for the BIOS to check if the OS is valid.
When a Bochs virtual machine is launched, the first instruction is BIOS’, and then hardware check, finally the OS is loaded to RAM.
OS: From Assembly
There are 2 files in a minimal operating system: boot.s
and head.s
. boot.s
is the first 512B of the OS, is written in instruction-set-related assembly, and head.s
is the rest of the OS, is written in GNU assembly.
The first instructions from 0x7C00 on is
jmpi 0x07c0:0005
go: mov ax, 0x07C0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, #0x400
This instruction jumps the PC to 0x7C05, which is 0x7C00’s next instruction, it seems to be insane, but the CS:IP is 0x0:7C00, after jumping, it becomes 0x7C0:0005. This is factually implicitly change the value of CS and IP.
A physical address is unique, but the CS:IP can vary.
The boot.s factually set the CS, DS, ES, SS all to 0x07C0, and SP sets the stack pointer.
Then, the head.s haven’t be loaded to the RAM. So the boot.s should load it. Since 512B is too small for boot.s to load head.s, but boot.s is already loaded, so this part, it’s realistic that load head.s by boot.s instructing the BIOS.
The next step is moving parameters to some registers, and then call the 0x13 interrupt, which is the BIOS’ function to read disk to RAM.
The 0x13 interrupt’s function is written to the interrupt vector table when BIOS is running, before giving PC to 0x7C00.
Then the 0x13 interrupt will find the interrupt vector table, at the position 0x0000, and 0x13 is decimal 19, the twentieth 4 bytes in 0x0000. It gets “FE E3 00 F0” at 0x4C, which is giving to CS:IP. But, which is tricky, the CS:IP is 0x0000:0xFEE3, because the high 2 bytes are CS, and the low 2 bytes are IP, and little endian is necessary to take in account.
When the boot.s is copied to the RAM, the BIOS is no longer necessary. Then due to the RAM in the early age is too expensive, so the head.s is moved to 0x0000, where is formally the BIOS’ position.
Why do not move the boot.s to 0x0000 when using BIOS?
- If there’s an interrupt occurs, the BIOS should search the interrupt vector table, and the interrupt vector table is already be replaced by boot.s which may cause exceptions or even crash randomly, which is really terrible and hard to troubleshoot.
But why it’s OK to move the head.s from 0x10000 to 0x0000?
- Before moving, there’s a
cli
instruction, which disables all the interrupts.
After the boot.s is almost done, the operating system is now loaded to the RAM, also known as, booted. We can see the OS is not loaded once, but multiple times. This kind of loading a part, then a larger part, and on and on and finally the entire OS, is called bootstrapping.
Segment and Segment Descriptor
The only thing last is switch from real mode to protected mode.
mov ax, #BOOTSEG
mov ds, ax
lidt idt_48
lgdt gdt_48
Let’s focus on the lidt
instruction and lgdt
instruction. They are used to load the IDT (Interrupt Descriptor Table) and GDT (Global Descriptor Table), respectively. IDT is set with idt_48
48 bits of 0, a fake IDT due to limited space, and GDT is set with gdt_48
, which is 0x7FF, 0x7C00+gdt, 0
, the 0x7ff refers to the length, and the 0x7C00+gdt refers to the position of the gdt label.
In the protected mode, there’re more complexity of the memory locating. The CS:EIP is no longer the CS << 4 + EIP
, but should query the selector in the GDT, and then find the base address and limit of the segment, and then calculate the physical address.
How does the hardware know if the system is in real mode or protected mode? The answer is the control registers. There are some “protocols” in the CR0 register, and some certain bits will indicate which mode is the system in. So after the lidt
and lgdt
instructions runs, the CR0 will be set by mov ax, #0x0001
, and then jmpi 0, 8
, flushing the CR0 data. Then the system is now in protected mode.
In the protected mode, the CS:IP is no longer pointed to the “CS « 4 + IP”, but using the index in the segment selector.
Then the segment descriptor works for the address mapping. It is 00 C0 9A 00
and 00 00 07 FF
, They let the CS:IP 0x0008, 0x0000 be 0x0000, the head.s’ first address.
timer_interrupt:
push %ds
pushl %eax
movl $0x10, %eax
mov %ax, %ds
movb $0x20, %al
outb %al, $0x20
movl $1, %eax
cmpl %eax, current
je 1f
movl %eax, current
ljmp $TSS1_SEL, $0
jmp 2f
1: movl $0, current
ljmp $TSS0_SEL, $0
2: popl %eax
pop %ds
iret
task0:
movl $0x17, %eax
movw %ax, %ds
movl $65, %al # Move A (ASCII: 65) to a register
int $0x80 # System interrupt (print)
movl $0xfff, %ecx
1: loop 1b
jmp task0
task1:
movl $66, %al # Move B (ASCII: 66) to a register
int $0x80 # System interrupt (print)
movl $0xfff, %ecx
1: loop 1b
jmp task1