Why partition alignment on disk matters (Linux)

Print Friendly, PDF & Email

Linux has been pretty good with and for storage. The sheer volume of options w.r.t. filesystems, volume-managers, access methods (FC, iSCSI, NFS, DAS etc), multi-pathing  but also the very broad support of the hardware ecosystem is something to be proud of. The issue with storage support is that you ALWAYS have to maintain a massive backward-compatibility string with previous generations of technology. Not only from a hardware perspective but also the soft-side needs to retain the older technology. I saw a video featuring Linus, Greg Kroah-Hartman,  Sarah Sharp and Ted Ts’o over here where Ted mentioned that the KVM feature helped him massively with regression testing for the storage projects he’s involved in. (As you may know Ted maintains the ext(2/3/4) filesystem among other things). That brings me to the bottleneck of history in a technology environment and why the topic I described in the subject is important.

Since storage deals with pretty important stuff (your data) in addition of not being an ephemeral entity (as opposed to networking) you have to be extremely careful the way you handle things. You can rip-and-replace a network switch without any problem but try to to migrate 4 PB of your data from ext3 to a btrfs filesystem makes many people in an organisation pretty nervous.

The same thing goes for storage hardware. Since the beginning of dawn the storage world has used a disk format size of 512 bytes per sector. (As you know a disk drive has platters divided into cylinders and tracks and these are carved up in sectors) Given the fact the capacity of a disk drive became fairly limited with this addressing capability (2^32*512bytes = ~ 2TB) and ECC (error correction capabilities) became a burden. The diskdrive industry decided to move to AF formatted disk layouts which means that the sector size changed from 512 bytes to 4KB. This didn’t happen overnight. The AF (4KB sector size) format has been in the making for over 10 years and you don’t move from one technology to another overnight so since the disk sector sizes have always been 512 bytes a transition period was need where the diskdrive would internally address on a 4KB boundary but the projection to the operating system would still be 512 bytes. The firmware would map the 8*512 bytes onto the 4K sector. This meant however that the OS would need to address IO’s on these 4K boundaries otherwise it would take the disk two operations to read/write ie 7*512 bytes from physical sector x and 1*512 bytes from physical sector y. It is therefore important that at creation of the partition alligns with the 4K sector or on the cylinder boundary.

Below I have two examples of the same disk-types in a Fedora 19 system with kernel 3.11.10-200. On disk /dev/sdd you see that the partition is not optimally aligned to the sector boundary whilst on /dev/sde this is the case.

Model: ATA WDC WD20EARS-00M (scsi)
Disk /dev/sdd: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start  End     Size    Type     File system  Flags
1      512B   2000GB  2000GB  primary  ext4

Using /dev/sdd
(parted) align-check
alignment type(min/opt)  [optimal]/minimal? min
Partition number? 1
1 aligned
(parted) align-check
alignment type(min/opt)  [optimal]/minimal? op
Partition number? 1
1 not aligned                   <<<<<<<<<<<<<

For drive /dev/sde you’ll see the difference:

(parted) print
Model: ATA WDC WD20EARS-00M (scsi)
Disk /dev/sde: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type     File system  Flags
1      1049kB  2000GB  2000GB  primary  ext4




(parted) align-check
alignment type(min/opt)  [optimal]/minimal? min
Partition number? 1
1 aligned
(parted) align-check
alignment type(min/opt)  [optimal]/minimal? op
Partition number? 1
1 aligned

The consequences of an unaligned partition can be significant especially on write IO’s. As you can see the /dev/sde drive overall has a better throughput where some spikes are observed on certain file-sizes.

The below charts are created by IOzone with the following command:

iozone -a -S 6144k -g 8M -b sde.xls /dev/sde1

It is not intended to represent a performance comparison between devices and you may or may not achieve the same results. The partitions were totally empty and only a default ext4 filesystem was generated on them.sdd-writessde-writes

On reads you’ll see less difference. The reason is that most diskdrives do a read-ahead anyway in order to pre-fetch data from the next sector in order to pre-fill the cache. This overall negates the alignment issue a bit.

sde-reads sdd-readsSo in short make sure that the partitions are aligned onto the disk geometry for best performance. The “Parted ” version I used (3.1) will notify you if a misalignment is observed before creating the partition. You can also check existing alignments with the align-check option.

There are numerous very detailed posts around 4KB sector sizes on disk format.

Hope this helps a bit.

Cheers,

Erwin

About Erwin van Londen

Master Technical Analyst at Hitachi Data Systems
Linux , , ,

Comments are closed.