This article is somewhat a successor to my first blog “The future of storage”. I discussed my article with Vincent Franceschini personally a while ago and although we have some different opinions on some topics, in general we agree on the setting we have to get more insight on the business value of data. This is the only way we can shift the engineering world to a more business focused mindset. Unfortunately today the engineering departments of all the major storage vendors still rely on old protocols like SCSI, NFS, CIFS which all have some sort of limitation which generally is address space.
To put this in perspective it’s like building a road with a certain amount of length and width which has a capacity for a certain number of cars per hour. This means it cannot adapt dynamically to a higher load i.e. more cars. You have to build new roads, or construct new lanes to existing ones if possible at all, to cater for more cars. With the growth of data and the changes companies are facing today it’s time to come up with something new. Basically this means we have to step away from technologies which have limitations build into their architecture. Although this might look like boiling the ocean I think we cannot afford the luxury of trying to improve current standards while the “data boom” is running like an avalanche.
Furthermore it is becoming too hard for IT department to keep up with the knowledge needed in every segment.
Question is “How do we accomplish this”. In my opinion the academic world together with the IT industry have huge potential in developing the next generation of IT. In current IT environments we run into barriers of all sorts. Performance, capacity, energy supply, etc etc.
So here’s an idea. Basically every word known to mankind has been written millions of times. So why do we need to write it over and over again. Basically what can be done is reference these words to compose an article. This leads to both a reduction of storage capacity needed as well as a reference-able index which can be searched on. The information of the index can be in a SNIA XAM format which also enables storage systems to leverage this information and dynamically allocate the required capacity or put business values to these indexes. This way the only thing that needs to be watched for is the integrity of the indexes and the words catalog. Another benefit of this is when a certain word changes it’s spelling the only thing that needs to be changed is that same word in the catalog. Since all articles just have references to this word the spelling is adjusted accordingly. (I’ll bet I will get some comments about that. :-))
As you can see this kind of information storage and retrieval totally eliminates the use of de-duplication, everything is written once anyway, which in turn has a major benefit on storage infrastructures, data integrity, authority etc etc. Since the indexes itself don’t have to grow because of auto elimination based on business value the concept of Dynamic Allocation has been achieved. OK, there are some caveats on the different formats, languages and overlapping context issues however these can be taken care of by linguists.