Abstract
Data storage efficiency mainly by de-duplicating and compressing data is very popular and indeed it is one of the unique selling points for storage vendors. Such feature has some inherent problem, which results in running them as scheduled applications on the storage systems when CPU load is low, because of the amount of workload generated, due to the CPU’s intensive scanning operations. Despite of multiple de-dupe and compression techniques present today, it has become difficult to handle data efficiently in production environment including clustered and standalone storage appliances. In clustered storage environment, however, there is scope of leftover CPU bandwidth which can be used to improve data efficiency ratio without affecting other operation significantly. We will be presenting the proof of concept on the development of storage efficiency framework in clustered storage environment to better use leftover CPU and maximize overall cluster efficiency
Learning Objectives
Understanding storage efficiency techniques
Study of different present techniques of storage efficiency
Study of Mapreduce and similar algorithms as a distributed system algorithm
Use of distributed algorithm to enhance the existing storage efficiency features