It’s comes to no surprise that a couple of technologies really struck in 2012. Flash disk drives, and specifically in flash arrays, have gone mainstream. One more technology still clinging on is converged networking and of course Big Data.
Big Data has become such a hype-word that many people have different opinions and descriptions for it. What is basically boils down to is that too many people have too much stuff hanging around which they never clean up or remove. This undeniably causes a huge burden on many IT departments who only have one answer: Add more disks……..
So where do we go from here. There is no denial that exabyte type storage environments become more apparent in many companies and government agencies. The question is what is being done with all these “dead” bytes. Will they even be used again. What is being done to safeguard this information?
Some studies show that the cost of managing this old data outgrows the benefit one could obtain from it. The problem is there are so many really useful and beneficial pieces of data in this enormous pile of bits but none of them are classified and tagged as such. This makes the “delete all” option a no-go but the costs of actually determining what needs to be kept can run side-by-side with keeping it all. We can be fairly certain that neither of the two options can hack it in the long run. Something has to be done to actually harvest the useful information and finally get rid of the old stuff.
The process of classification needs to be via heuristic mathematical deterministics. A mouth full but what it actually means is that every piece of information needs to be tagged with a value. Lets call this value X. This X is generated based upon business requirements related to the type of business we’re actually in. Whilst indexing the entire information base certain words, values, and other pieces of information appear more often than others. These indicators can cause a certain information type to obtain a higher value then others and there ranks higher (ie the X value increases). Of course you can have a multitude of information streams where one is by definition larger and causes data to appear more frequent in which case it rank higher even though the actual business value is not that great whilst you might have a very small project going on that could generate a fair chunk of your annual revenue. To identify those these need to be tagged with a second value called Y. And last but not least we have age. Since all data loses its accuracy and therefore value the data needs to be tagged with a third value called Z.
Based upon these three values we can create 3 dimensional value maps which can be projected on different parts of the organization. This outlines and quantifies where the most valuable data resides and where the most savings can be obtained. This allows for a far more effective process of data elimination and therefore huge cost savings. Different mathematical algorithms already exist however have not been applied in this way and therefore such technologies do not exist yet. Maybe something for someone to pick up. Good luck.
As for the logical parts of the Big Data question in 2013 we will will see a bigger shift towards object based storage. If you go back to one of my first articles you will see that I predicted this shift 6 years ago. Data objects need to get smarter and more intelligent by nature in order to increase value and manageability. By doing this we can think of all sorts of smarts to utilize the information to the fullest extend.
As for the other, more tangible technologies my take on them is as follows.
Flash technology will continue to evolve en price erosion will, at some point, will cause it to compete with normal disks but that is still a year or two away. R&D costs will still have a major burden on the price point of these drives/arrays so as the uptake of flash continues it will level out. Reliability has mostly been tackled by advances in redundancy and cell technology so that argument can be mostly negated. My take on dedicated flash arrays is that these are too limited in their functions and therefore overpriced. The only benefit they provide is performance but that is easily countered by the existing array vendors by adding dedicated flash controllers and optimized internal data-paths in their equipment. The benefit is that these can utilize the same proven functions that have been available for years. One of the most useful and cost-effective is of course auto-tiering which allows to have optimum usage is gives the most bang for your buck.
Well, what can I say. If designed and implemented correctly it just works but many companies are just not ready from a knowledge standpoint to adopt it. There are just too many differentiation in processes, knowledge and many other point which conflicts between the storage and networking folks. The arguments I ventilated in my previous post have still not been countered by anyone and as such my standpoint has not changed. If reliability and uptime is one of your priorities than don’t start with converged networking. Of course there are some exceptions. If for instance use want to buy a Cisco UCP then this system runs converged networking internally from front-to-back but there is not really much than is configurable so the “Oeps” factor is significantly minimized.
Processor and overall system requirements
More and more focus will be placed upon power requirements and companies will be forcing vendors to the extreme to reduce the amount of watts their systems suck from the wall socket. Software developers are strongly encouraged (and that’s an understatement) to sift through their code and check if optimizations can be achieved in this area.
Hope you all have a good 2013 and we’ll see if some of these will gain some uptake.