What are we up against, what are the companies problems and challenges in 5 to 10 years, how are we as a storage industry going to cope with these challenges and how can we help the customer to overcome these.
In Vincent Franceschini last blog (SOKU) on HDS's website a very high level overview was presented however as we all know it’s almost impossible to get all levels of the industry change to a SOKU in the near future. It’s like asking the auto industry to change to square tires in 6 months. We have to go a long way for adopting a SOKU infrastructure so I think we need to approach this in a modular fashion with options to expand the functionality so it is ready to integrate into a SOKU environment.
Some 5 years ago I was working for a small company which had its little storage expert group. During that time we looked at the market and saw a lot of different solutions, products and services from many vendors and so called Storage Consultancy Service Company’s however we couldn't’t find a visionary look behind all these different point solutions and the thing that bothered us the most was the lack of intelligence of all these products working together. It was then when I developed my so called 1-D (D stands for Decade) storage vision which became somewhat an extraterrestrial view of what the storage landscape might look like in 10 years. I’ll try to summarize the mind-spins that are going around when philosophizing about storage and how to handle the burden of a company that has to cope with all these difficult tasks.
Company’s tomorrow will face an enormous challenge handling their company’s information. The burden they have to carry is becoming more and more a question of who has the most money to store the enormous amount of data to fulfill their needs. Beside the hunger for information, secondary and maybe third party influences like legislation and regulations are becoming more and more important. When top executives are responsible for the numbers they better should be aware that in future, and sometimes even today, their information is correct before handing them over to the proper agencies or they might be prosecuted for the nastiest things. Unfortunately today they still have problems finding the correct information at the right time and if they do find it they probably spent a fortune on something that wouldn’t even be necessary. An example is the case of a company who lost a trial in court because they couldn't’t retrieve an email sent a couple of months before and they had to come up with a big money suitcase. The lawyers had some good times then by the way. Another example is a guy who nearly died in a hospital because he was given medication he was allergic to. The reason for that was that the doctors’ assistants couldn't’t find his medical records. Talk about bad luck.
As you can see information processing and retrieval is of the utmost importance to all levels of the society. Not only is it important to have the correct information at the right time at the right place but also when information is no longer needed it should be destroyed the proper way.
So how can we, as a storage industry, help with that? Well, basically, we can’t because we are dependent on the ones that create the information. Storage infrastructures currently do not have the knowledge about the type of information that’s residing in the environment. The thing we have to do is create the right infrastructure for them to handle the data that is coming our way. This infrastructure should have autonomous intelligence based on multi-tier internal and external policies. Information metadata is crucial in these environments because that’s the only way we can handle this kind of information to put it in the right place in the right time with all the normal secondary tags like security, expiration time etc.etc. This way we should be able to accept, process, store and destroy information whenever and wherever it’s needed. A real life example is i.e. a patient that has a medical checkup once a year. Should his medical records be on high performance/high costs media all year? Basically no, but when the annual appointment approaches it should because doctors do not have time to wait for his records to be retrieved when it’s on some tape in a vault at an offsite location. The framework should anticipate his appointment and retrieve it on the appropriate tier and place in the infrastructure before it is actually needed so when the doctor is seeing the patient he is able to help him right away with the correct information at the right time. As always there are exceptions. What happens when the same patient is having heart problems and comes in through the ER department or he is brought in at another hospital? Even then the infrastructure should be able to anticipate this. A doctor should be able to mark such patients as i.e. “high risk” Metadata information could contain levels of severity which match these tags so a policy could be executed which triggers the infrastructure to keep this patient’s records always on immediate retrievable media.
Another example is when a CEO signs a multi-billion dollar contract. Should that file be in his home folder next to the grocery shopping list of last week? I don’t think so however today we cannot distinguish the value between these two. The CEO should have the ability to set certain characteristics on both of them so the multi-billion dollar contract is stored on protected drives, copied between two or more locations, have read-only attributes set on the finalized version, have draft copies and/or spin-offs be removed or linked to the finalized version and set the expiration date so it can be removed/destroyed after the lifetime of the contract or other dependencies. The same goes for the grocery shopping list. When he flags is as obsolete after date XYZ, it is of no importance anymore so this one can be destroyed right away without blinking an eye.
As seen in the examples, there are multiple ways to process the data. The most important is of course the value of it. Question is how do we define the value of data. Is it business related like the amount of money or, as in the example, health related criticallity. I bet the security agency’s have other criteria of “value”. What this means is that the framework should not base the value of data to a certain instance but be able to interpret the metadata based on the criteria and policies the customers define. Like I said in the beginning of this article we cannot define the value of the data, the customers have to do that themselves with or without the help of specialized consultancy firms.
From a manufacturer and technical point of view the creation of such a framework is not so difficult. Existing standards like SMI-S and the recently introduced XAM can help with that and enable these companies to build pluggable components for this framework whether it is from an infrastructure or business software point of view. The real problem is the legacy information that still resides everywhere in the existing environment. Of course this will create some challenges but I’m sure customers are willing to take the effort so eventually they have a real costs saving, self healing and stable infrastructure which enables them to have the correct information in the right place at the right time.