Abstract
Abstract Multiple deep learning approaches collect analytics for semantic image retrieval in different formats. Some approaches may just collect the analytics in form of tag, some might collect the analytics in form of caption and recently a few sophisticated scene graph generation algorithms are collecting the analytics in form of triplets (Subject- Verb- Object). The lack of uniformity in the analytics format leads to usage of multiple types of schemas and databases to retrieve the images using unstructured user queries. For example, a tag-based search uses a Key Value DB, Scene graph-based retrieval uses a graph DB.
Having dependency on a database increases the computational requirements for searching images on low compute storage devices like SSDs. Further Since every database uses its own query language, the performance of the image retrieval framework is highly dependent on how efficient these queries are written.
Learn how to design an embedded machine learning standard framework inside SSD using an advanced transformer model to convert analytics from various deep learning approaches to uniform embedding format suitable for fast and accurate image retrieval. Any form of analytics (tag, sentence, paragraph, triplets) will be converted into a N-dimensional vector. For faster lookup on SSD, the framework stores the extracted N-dimensional vectors as an index tree and processes the user queries also as a N-dimensional vector by feeding the user input to SBERT transformer model.
Learn framework more as in how to design an interface to capture the intent behind a user query using clustering techniques. For ex. “woman riding a bike” and “Girl driving a scooter” have the same intent, the intent based interface increases image search accuracy with low number of false positives.