Seen the Matrix? Where the ‘real world’ is just the manifestation of complex digital code. Your hard disk is just like that. What data on your hard disk really looks like – Disk Structures explained.
Few may actually wonder how the operating system, Windows for example, keeps track of all your data, your files. How does Windows find things on a disk? What does the data on a disk really look like? For the few taking the red pill, I write this. Let’s discover what’s underneath the world of stylish and polished icons. Welcome to the Matrix.
For the operating system a disk (HDD, SSD, SD memory card, USB etc.) is nothing else than a big pool of numbered sectors. Sectors have been 512 bytes in size for years. Modern hard disks use 4 KB sectors. A sector is the smallest addressable unit on a hard disk.
One byte equals 8 bits. A bit ‘contains’ either the value 1 or 0. It’s the smallest piece of information a computer can work with: On or off, yes or no etc.. By grouping 8 bits into one byte we can turn zeros and ones into meaningful information. We can for example agree on 1100001 representing character ‘a’ and 1100010 ‘b’.
And on hard disks, we agree on a group of bytes being a ‘sector’. These agreements are often agreed upon in standards. For a long time we agreed on 1 sector = 512 bytes, but now we have a new agreement which says sectors can be 4 KB (4096 bytes).
These standards are often agreed upon by committees in which large companies work together. Some times a company is so large and influential that it can set the standards by itself.
Back to our hard disk:
To be able to organize matters the operating system uses something called a ‘file system’. In order to be able to create a file system, a pool of sectors needs to be set aside for this file system. It’s called a partition. A partition is nothing else than a large row of sectors.
Partitions – Starting point for all disk structures
Your hard disk contains at least one partition but there can be more than one partition. The first sector of the hard disk is called the master boot record (MBR) and contains a partition table. It’s the starting point for a chain of disk structures eventually pointing to your files.
To locate the data on your disk, the operating system has to start reading the disk here. Here it finds where partitions start and what their sizes are. To define one partition, we need 16 bytes. In total 64 bytes are available in the MBR for the partition table, so 4 partitions can be defined here.
It’s here where the accidents happen that will cause the entire disk to appear empty or unallocated as ‘Windows Disk Management’ calls it.
The File system
In order to be able to store files and find files on a partition, it needs to contain a file system. An example of a file system is NTFS. A file system is created when we format a partition. Formatting creates file system structures. Where those file system structures are is written in the first sector of the volume (volume = partition containing a file system). We call this sector the boot record. Little accidents that happen here can result in a so called ‘RAW file system’.
The smallest addressable unit on a volume is called a cluster. A cluster consists of one or more sectors. The cluster size is also stored in the boot record as this value is needed to calculate the sector address. In the end the hard disk is nothing than a pool of sectors and to actually read data from the disk, we need the sector address. So, while file systems store the locations of files as cluster values, we need to convert that to sector numbers to know where these files actually are on the hard disk.
The Master File Table
The Master File Table (MFT) is the most important disk structure in the NTFS file system. It contains information about all files on the volume and even about itself. The MFT is ‘self referencing’ and we only know it’s first cluster from the boot record. To determine all clusters allocated to the MFT, the operating system needs to interpret the first file record in the MFT. It’s the file record where the MFT ‘describes’ itself.
Final step — Your files
The clusters allocated to the MFT, or any file for that matter as described on so called ‘runs’. For each run a logical start cluster (LCN) and a length is recorded. This allows the operating system to process the entire MFT.
And that’s how we finally get to our files. By decoding individual file records we get file names, file attributes, creation dates etc.. And the data runs to determine where the file’s data is actually stored.
A modern operating system hides all the RAW hex data from the user. The user finds her or him self in a world of polished icons and symbols.
When accidents happen she or he can rely on powerful, polished file recovery software that without the help of the operating system, descends in this world of raw hex to retrieve the lost data. An example of a modern, powerful yet easy to use file recovery software:
To locate your data, your files, the operating system basically has to interpret and process a chain of on disk structures. Of course it doesn’t have to start at the first sector of the hard disk for each individual file. It only has to interpret the partition table once. The same goes for the boot record. The location of the volume doesn’t change while the operating system is running. The same goes for the data in the boot record.
Other disk structures like the MFT are highly dynamic. As files are created, modified or deleted the MFT needs to be constantly updated.