Abstract
Intel Design environment heavily depends on a large scale NFS infrastructure with 10s of PBs of data. Global Name space helps to navigate this large environment in a uniform way from 60,000 compute servers.
But what if a user doesn't know where the piece of data he is looking for is located?
Our customers used to spend hours waiting for recursive ""grep"" commands' completion - or preferred not to bother with some less critical queries.
In this talk, we'll cover how Intel IT has identified an opportunity to provide a faster way to look for an information within this large-scale NFS environment. We'll review various open source solutions which were considered, and how we've decided to implement a mix of home-grown scalable NFS crawler with open source ElasticSearch engine to index parts of our NFS environment.
As part of this talk we'll discuss various challenges and our ways to mitigate them, including:
crawler scalability required to index large amounts of dynamically changing data within pre-defined indexing SLA
Index scalability and performance requirements
Relevancy of the results presented in search queries by customers
User interface considerations
Security aspects of the index access control
This might be an interesting conversation for both storage vendors - covering a useful feature which might be implemented as a part of NFS environment, and for storage customers who may benefit from such capability.
Learning Objectives
How to implement scalable indexing and search on top of large scale NFS
Scalable crawling with controlled performance impact on shared file servers
Security aspects of data index and search representation