*bf: memory address containing data to be written to the file represented by fd.
count: number of bytes to be written.
A successful write will return the number of bytes written or a -1 if there is an error.
Non-sequential read/write: lseek
read() and write() only do sequential read/write, there is no way to specify a particular read/write position (offset).
For each file opened by a process, the OS keeps track of a current offset, which determines where the next read/write happens, read() and write() updates current offset according to how many bytes are read/written
cannot update current to wherever they want
If want to update current to wherever you want, use lseek(): off_t lseek(int fd, off_t offset, int whence)
ext2, ext3, ext4, ReiserFS, XFS, JFS used by Linux
We will study a very simple example file system named VSFS: Very Simple File System
Analogy
How to implement a simple file system?
What structures are needed on the disk?
What do they need to track?
How are they accessed?
Mental model of a file system
What on-disk structures store the file system’s data and metadata?
What happens when a process opens a file?
Which on-disk structures are accessed during a read or write?
By working on and improving your mental model, you develop an abstract understanding of what is going on, instead of just trying to understand the specifics of some file-system code (though that is also useful, of course!).
Unformatted raw diskOverall organization
The whole disk is divided into fixed-sized blocks.
Block size: 4KB
Number of blocks: 64
Total size: 256KB
Data region
Most of the disk should be used to actually store user data, while leaving a little space for storing other things like metadata
In VSFS, we reserve the last 56 blocks as data region.
Metadata: inode table
The FS need to track information about each file.
In VSFS, we keep the info of each file in a struct called inode. And we use 5 blocks for storing all the inodes.
Maximum number of inodes it can hold: 5 * 4KB / 128B = 160, i.e., this VSFS can store at most 160 files.
Allocation structure
For both data region and inode region, we need to keep track of which blocks are being used and which blocks are free.
We use a data structure called bitmap for this purpose, which is just a sequence of bits, and each bit indicates whether one block is free (0) or in-use (1).
We have one bitmap for the data region and one bitmap for the inode region, and reserve one block for each bitmap. (4KB = 32K bits, can keep track of 32K blocks)
Superblock
Superblock contains information about this particular file system, e.g.,
What type of file system it is (“VSFS” indicated by a magic number)
how many inodes and data blocks are there (160 and 56)
Where the inode table begins (block 3)
etc.
When mounting a file system, the OS first reads the superblock, identify its type and other parameters, then attach the volume to the file system tree with proper settings.
12KB + 32 * 128B = 16K So we have the inode, but which blocks have the data?
Data structure: the inode
Short for index node
In vsfs (and other simple file systems), given an i-number, you should directly be able to calculate where on the disk the corresponding inode is located.
An extent is a disk pointer plus a length (in blocks), i.e., it allocates a few blocks in a row.
Instead of requiring a pointer to every block of a file, we just need a pointer to every several blocks (every extent).
It is less flexible than the pointer-base approach, but uses smaller amount of metadata per file, and file allocation is more compact.
Adopted by ext4, HFS+, NTFS, XFS.
Yet anther approach: linked-based
Instead of having pointers to all blocks, the inode just has one pointer pointing to the first data block of the file, then the first block points to the second block, etc.
Works poorly if want to access the last block of a big file. To help, use an in memory file allocation table which is indexed by address of data block, so finding a block can be faster.
This is the FAT file system, used by Windows before NTFS.