Abstract
Scientific data is mostly stored in linear bytes in files but it almost always has hidden structure that resembles records with keys and values, often times in multiple dimensions. Further, the bandwidths required to service HPC simulation workloads will soon approach tens of terabytes/sec with single data files surpassing a petabyte and single sets of data from a campaign approaching 100 petabytes. Multiple tasks from distributed analytical/indexing functions to data management tasks like compression, erasure, encoding, dedup, are all potentially more efficiently and economically performed near storage devices.