Tag Archives: capacity

Help, my Thin Provisioning is not working

On many occasions I’ve seen posts from storage administrator who mapped some luns to hosts and on the first use the entire pool got whacked with all bells and whistles going off. (Yes, we can control bells and whistles.:-))


The administrator did nothing wrong however he should have communicated with the server admin what the luns were for and how they were going to be used. As I mentioned in my previous post around Thin Provisioning is that the array doesn’t really know what’s going on from a host perspective. It know, due to HMO (port group settings) which type of host is connected and adjusts some internal knobs to accommodate for the commands from that particular host or application.
What it does not know is how that application is using the array.

Remember that a storage array just knows about reads and writes (besides the special commands specific for management).

In normal occasions a lun is mapped and on the host this lun is then formatted to a specific filesystem. Some filesystems use only the first couple of sectors of a disk to outline mapping of the blocks so if the application want to write a chuck of data the filesystem creates the inode, registers the mapping in the filesystem table in the beginning of the disk and away we go.

When we look at the disk from this perspective when formatted it looks like this:

————————————————————————————
|************   |            |               |             |            |
————————————————————————————

Only the first sector is written and the rest is still empty.

The same would happen if this lun was mapped out of a thin provisioned pool. Only the first couple of sectors on the virtual disk would be written, and therefore only the page occupying these sectors, would be marked as used in the pool, and the rest would still be empty and thus the array would not allocate them to this particular lun.

So far all is well.

The problem begins when the same lun is formatted with a filesystem which does interleaved formatting. The concept here is that the filesystem mapping table is spread over the entire disk which might improve performance if you do this on a single physical disk.

————————————————————————————

|**          | **           | **           | **            | **           | **
————————————————————————————

On writes the chances that you’re able to update the mapping table, create the inodes and write the data in one stroke is fairly good.

Now compare the interleaved method to the one I described before and you will be able to figure out why this is really rendering This Provisioning useless. Since the chance is near 100% that all pages from that pool will be “touched” at least once, the entire page will be marked as used in that pool even though the net written data is next to nothing.

No you might think: “OK, I choose a filesystem which is TP friendly and I’m sorted”.

Well, not quite. Server administrator very often like to have their own “storage management tool” in the likes of volume managers. This allows them to virtualise  “physical” luns mapped out of an array to a single entity in their systems.
The problem with this is that it will behave the same as the TP unfriendly filesystem with that difference that it’s not the filesystems doing the interleaving of metadata but now it’s the volumemanagers doing the same thing.

In both cases a TP pool will fill up pretty quickly without having an application write a single bit.

All storage vendors have whitepapers and instructions available how to plan for all these occasions. If you don’t want to run into surprises I suggest you have a look at them.

Regards,
Erwin van Londen

Why disk drives have become slower over the years

What is the first question vendors get (or at least used to get) when a customer (non-technical) calls???
I’ll spare you the guesswork: “What does a TB of disks do at your place ??”. Usually I’ll goofle around for the cheapest disk at a local PC store and say “Well Sir, that would be about 80 dollars”. I then hear somebody falling of the chair, trying to get up again, reach for the phone and with a resonating voice asking “Why are your competitors so expensive then?”. “They most likely did not gave a direct answer to your question.”, I reply.
The thing is an HDD should be evaluated on multiple factors and when you spend 80 bucks on a 1TB disk you get capacity and that’s about it. Don’t expect performance or extended MTBF figures let alone all the stuff than comes with enterprise arrays like large caches, redundancy in every sense and a lot more. This is what makes up the price per GB.


“Ok, so why have disk drives become so much slower in the past couple of years?”. Well, they haven’t. The RPM, seek time and latency have remained the same over the last couple of years. The problem is that the capacity has increased so much that the so called “access density” has increased linearly so the disk has to service a massive amount of bytes with the same nominal IOPS capability.


I did some simple calculations which shows the decrease in performance on larger disks. I didn’t assume any raid or cache accelerators.


I first calculated a baseline based on a 100GB disk drive (I know, they don’t exist but it just for the calculations) with 500GB of data that I need to read or write.


The assumption was to have a 100% random read profile. Although the host can read or write in increments of 512 bytes IO size theoretically this doesn’t mean the disk will write this IO in one sequential stroke. An 8K host IO can be split up in the smallest supported sector size on disk which is currently around 512 bytes. (Don’t worry, every disk and array will optimize this but again this is just to show the nominal differences)


So when I have 100GB disk drive this translates to a little over 190 million sectors.  In order to read 500 GB of data this would take a theoretical 21.7 minutes. The number of disks are calculated based on the capacity required for that 500GB (Also remember that disks use a base10 capacity value whereas operating systems,memory chips and other electronics use a base2 value so that’s 10^3 vs 2^10.)

Baseline
Sectors RPM Avrg delay in ms Max IOPS Disks required 6
100 190,734,863 10000 8 125 Num IOPS 750
Time Required 1302
in minutes 21.7

 If you now take this baseline and map this to some previous and current disk types and capacities you can see the differences.

GB Sectors RPM # Disk per A29 Num IOPS Time required in sec in min %pcnt of base line * times base value
9 17,166,138 7200 57 4731 206 3.44 15.83 6.32
18 34,332,275 7200 29 2407 406 6.77 31.19 3.21
36 68,664,551 10000 15 1875 521 8.69 40.02 2.50
72 137,329,102 10000 8 1000 977 16.29 75.04 1.33
146 278,472,900 10000 4 500 1953 32.55 150 0.67
300 572,204,590 10000 2 250 3906 65.1 300 0.33
450 858,306,885 10000 2 250 3906 65.1 300 0.33
600 1,144,409,180 10000 1 125 7813 130.22 600.08 0.17

You can see here that capacity wise to store the same 500GB on 146 GB disks you need less disks but you also get fewer total IOPS. This then translates into slower performance. As an example a 300GB drive with 10000RPM triples the time compared to the baseline disk to read this 500 gigabyte.


Now these a re relatively simple calculations however they do apply to all disks including the ones in your disk array.


I hope this also makes you start thinking about performance as well as capacity. I’m pretty sure your business finds it most annoying when your users need to get a cup of coffee after every database query. 🙂