Tag Archives: VMware

Why Docker is the new VMware (part 2)

Last week a tweet by Duncan Epping based on my post in part 1 referred to a page from Massimo Re Ferre and I think Duncan’s intent was to downplay the effect of my post and emphasize the technological differences between the solutions VMware offers (with the bold mark on vSphere) and the new technologies that are emerging on the horizon (as Massimo mentions in his post and I highlighted before in the likes of Docker (or containerization of applications) and, even more abstracted, Lambda model as AWS does. The latter plays even more into the virtualization model I described in an even earlier post I wrote around 5 years ago (over here). (Talk about long term vision)

Continue reading

Why Docker is the new VMware

5 Years ago wrote this article:

Server virtualisation is the result of software development incompetence

Yes, it has given me some grief given the fact 99% of respondents did not read beyond the title and made false assumptions saying I accused these developers for being stupid. Tough luck. You should have read the entire article.

Anyway, in that article I did outline that by using virtualization with the methodology of isolating entire operating systems in a container is a massive waste of resources and the virtualisation engine should have focused on applications and/or business functionality. It took a while for someone to actually jump into this area but finally a new tool has come to life which does exactly that.

Continue reading

SCSI UNMAP and performance implications

When listening to Greg Knieriemens’ podcast on Nekkid Tech there was some debate on VMWare’s decisision to disable the SCSI UNMAP command on vSphere 5.something. Chris Evans (www.thestoragearchitect.com) had some questions why this has happened so I’ll try to give a short description.

Be aware that, although I work for Hitachi, I have no insight in the internal algorithms of any vendor but the T10 (INCITS) specifications are public and every vendor has to adhere to these specs so here we go.

With the introduction of thin provisioning in the SBC3 specs a whole new can of options, features and functions came out of the T10 (SCSI) committee which enabled applications and operating systems to do all sorts of nifty stuff on storage arrays. Basically it meant you could give a host a 2 TB volume whilst in the background you only had 1TB physically available. The assumption with thin provisioning (TP) is that a host or application wont use that 2 TB in one go anyway so why pre-allocate it.

So what happens is that the storage array will provide the host with a range of addressable LBA’s (Logical Block Addresses) which the host is able to use to store data. In the back-end on the array these LBA’s are then only allocated upon actual use. The array has one or more , so called, disk pools where it can physically store the data. The mapping from the “virtual addressable LBA” which the host sees and the back-end physical storage is done by mapping tables. Depending on the implementation between the different vendor certain “chunks” out of these pools are reserved as soon as one LBA is allocated. This prevents performance bottlenecks from a housekeeping perspective since it doesn’t need to manage each single LBA mapping. Each vendor has different page/chunks/segment sizes and different algorithms to manage these but the overall method of TP stay the same.

So lets say the segment size on an array is 42MB (:-)) and an application is writing to an LBA which falls into this chunk. The array updates the mapping tables, allocates cache-slots and all the other housekeeping stuff that is done when a write IO is coming in.  As of that moment the entire 42 MB is than allocated to that particular LUN which is presented to that host. Any subsequent write to any LBA which falls into this 42MB segment is just a regular IO from an array perspective. No additional overhead is needed or generated w.r.t. TP maintenance. As you can see this is a very effective way of maintaining an optimum capacity usage ratio but as with everything there are some things you have to consider as well like over provisioning and its ramifications when things go wrong.

Lets assume that is all under control and move on.

Now what happens if data is no longer needed or deleted. Lets assume a user deletes a file which is 200MB big (video for example) In theory this file had occupied at least 5 TP segments of 42MB. But since many filesystems are very IO savvy they do not scrub the entire 42MB back to zero but just delete the FS entry pointer and remove the inodes from the inode table. This means that only a couple of bytes effectively have been removed on the physical disk and array cache.
The array has no way of knowing that these couple of bytes, which have been returned to 0, represent an entire 200MB file and as such these bytes are still allocated in cache, on disk and the TP mapping table. This also means that these TP segments can never be re-mapped to other LUN’s for more effective use if needed. To overcome this there have been some solutions to overcome this like host-based scrubbing (putting all bits back to 0), de-fragmentation to re-align all used LBA’s and scrub the rest and some array base solutions to check if segments do contain on zero’s and if so remove them from the mapping table and therefore make the available for re-use.

As you can imagine this is not a very effective way of using TP. You can be busy clearing things up on a fairly regular basis so there had to be another solution.

So the T10 friends came up with two new things namely “write same” and “unmap”. Write same does exactly what it says. It issues a write command to a certain LBA and tells the array to also write this bit stream to a certain set of LBA’s. The array then executes this therefore offloading the host from keeping track of all the write commands so it can do more useful stuff than pushing bits back and forth between himself and the array. This can be very useful if you need to deploy a lot of VM’s which by definition have a very similar (if not exactly) the same pattern. The other way around it has a similar benefit that if you need to delete VM’s (or just one) the hypervisor can instruct the array to clear all LBA’s associated with that particular VM and if the UNMAP command is used in conjunction with the write same command you basically end up with the situation you want. The UNMAP command instructs the array that a certain LBA (LBA’s) are no longer in use by this host and therefore can be re-used in the free pool.

As you can imagine if you just use the UNMAP command this is very fast from a host perspective and the array can handle this very quickly but here comes the catch. If the host instructs the array to UNMAP the association between the LBA and the LUN it is basically only a pointer from the mapping table that is removed. the actual data does still exist either in cache or on disk. If that same segment is then re-allocated to another host in theory this particular host can issue a read command to any given LBA in that segment and retrieve the data that was previously written by the other system. Not only can this confuse the operating system but it also implies a huge security risk.

In order to prevent this the array has one or more background threads to clear out these segments before they are effectively returned to the pool for re-use. These tasks normally run on a pretty low priority to not interfere with normal host IO. (Remember that it still is (or are) the same CPU(s) who have to take care of this.) If CPU’s are fast and the background threads are smart enough under normal circumstances you hardly see any difference in performance.

As with all instruction based processing the work has to be done either way, being it the array or the host. So if there is a huge amount of demand where hypervisors move around a lot of VM’s between LUN’s and/or arrays, there will be a lot of deallocation (UNMAP), clearance (WRITE SAME) and re-allocation of these segments going on. It depends on the scheduling algorithm at what point the array will decide to reschedule the background and frontend processes so that the will be a delay in the status response to the host. On the host it looks like a performance issue but in essence what you have done is overloading the array with too many commands which normally (without thin provisioning) has to be done by the host itself.

You can debate if using a larger or smaller segment size will be beneficial but that doesn’t matter at all. If you use a smaller segment size the CPU has much more overhead in managing mapping tables whereas using bigger segment sizes the array needs to scrub more space on deallocation.

So this is the reason why VMWare had disabled the UNMAP command in this patch since a lot of “performance problems” were seen across the world when this feature was enabled. Given the fact that it was VMWare that disabled this you can imagine that multiple arrays from multiple vendors might be impacted in some sense otherwise they would have been more specific on array vendors and types which they haven’t done.

Beyond the Hypervisor as we know it

And here we are again. I’ve busy doing some internal stuff for my company so the tweets and blogs were put on low maintenance.

Anyway, VMware launched its new version of vSphere and the amount of attention and noise it received is overwhelming both from a positive as well as negative side. Many customers feel they are ripped off by the new licensing schema whereas from a technical perspective all admins seem to agree the enhancements being made are fabulous. Being a techie myself I must say the new and updated stuff is extremely appealing and I can see why many admins would like to upgrade right away. I assume that’s only possible after the financial hurdles have been taken.

So why this subject? “VMware is not going to disappear and neither does MS or Xen” I hear you say. Well, probably not however let take a step back why these hypervisors were initially developed. Basically what they wanted to achieve is the option to run multiple applications on one server without having any sort of library dependency which might conflict and disturb or corrupt another application. VMware hasn’t been the initiator of this concept but the birthplace of this all was IBM’s mainframe platform. Even back in the 60’s and 70’s they had the same problem. Two or more applications had to run on the same physical box however due to conflicts in libraries and functions IBM found a way to isolate this and came up with the concept of virtual instances which ran on a common platform operating system. MVS which later became OS/390 and now zOS.

When the open systems guys spearheaded by Microsoft in the 80’s and 90’s took off they more or less created the same mess as IBM had seen before. (IBM did actually learn something and pushed that into OS/2 however that OS never really took off).
When Microsoft came up with so called Dynamic Link Libraries this was heaven for application developers. They could now dynamically load a DLL and use its functions. However they did not take into account that only one DLL with a certain function could be loaded as any one particular point. And thus when DLL got new functionality and therefore new revision levels sometimes they were not backward compatible and very nasty conflict would surface. So we were back to zero.

And along came VMware. They did for the Windows world what IBM had done many years before and created a hypervisor which would let you run multiple virtual machines each isolated from each other with no possibility of binary conflicts. And they still make good money of it.

However also the application developers have not been pulling things out of their nose and sit still. They also have seen that they no longer can utilize the development model they used for years. Every self respecting developer now programs with massive scalability and distributed systems in mind based on cloud principles. Basically this means that applications are almost solely build on web technologies with javascript (via node.js), HTML 5 or other high level languages. These applications are then loaded upon distributed systems like openstack, hadoop and one or two others. These platforms create application containers where the application is isolated and has to abide by the functionality of the underlying platform. This is exactly what I wrote almost two years ago where the application itself should be virtualised instead of the operating system. (See here)

When you take this into account you can imagine that the hypervisors, as we know them now, at some point in time will render themselves useless. The operating system itself is not important anymore and is doesn’t matter where these cloud systems run on. The only thing that is important is scalability and reliability.  Companies like VMware, Microsoft, HP and others are not stupid  and see this coming. This is also the reason why they start building these massive data centres to accommodate the customers who adopt this technology and start hosting these applications.

Now here come the problems with this concept. SLA’s. Who is going to guarantee you availability when everything is out of your control. Examples like outages with Amazon EC2, Microsoft’s cloud email service BPOS, VMware’s Cloud Foundry outage or Google GMAIL service show that even these extremely well designed systems at some point in time run into Murphy and the question is do you want to depend on these providers for business continuity. Be aware you have no vote how and were your application is hosted. That is totally at the discretion of the hosting provider. Again, its all about risk assessment versus costs versus flexibility and other arguments you can think of so I leave that up to you.

So where does this take you? Well, you should start thinking about your requirements. Does my business need this cloud based flexibility or should I adopt a more hybrid model where some applications are build and managed by myself/my staff.

In any way you will see more and more applications being developed for both internal, external and hybrid cloud models. This then brings us back to the subject line that the hypervisors as we know them today will cease to exist. It might take a while but the software world is like a diesel train, it starts slowly but when it´s on a roll its almost impossible to stop so be prepared.

Kind regards,
Erwin van Londen