Last month, the SNIA Cloud Storage Technologies (CST) Community covered one of the fastest-moving, most disruptive AI technologies – Agentic AI. During our live webinar, “Agentic AI: Use Cases, Benefits, Risks," SNIA CST member Erin Farr explained how Agentic AI works, discussed its benefits and risks, and showed a live demonstration of Agentic AI in action. If you missed the live webinar, you can watch it and download the slides in the SNIA Educational Library. The audience asked several interesting questions. Erin has answered them here:
Q: The use case you chose seems fairly innocuous but what are the risks we should be thinking about?
A: This data validation use case is really a "do less harm" situation. If data is not valid, you don't want the AI agent to tell you it is, and vice versa. However, I felt like this scenario was a good starting point for Agentic AI because when talking to customers, even though some were using a data validation tool, it wasn’t enough to instill confidence. Also, when I looked at how to validate a MongoDB database, it had about six other ways I could have validated it. So, if I were to build all of them in, and if the agent makes a mistake interpreting one of the results, but the rest do not, that can increase your confidence in the overall results.
This use case was also a good candidate because it's an area where folks are struggling to accomplish robust data validation today as part of testing their cyber resiliency. Finally, we're not taking any automated action on the results. We're initially just trying to understand if the data is valid or not.
Q: What are some of the attack vectors specific to Agentic AI?
A: Memory Poisoning is an attack vector that involves exploiting an AI's memory systems to introduce malicious data and exploit the agent’s context (effectively, its working memory), which can lead to incorrect decision-making and unauthorized operations.
There's also Tool Misuse, which occurs when attackers manipulate AI agents with deceptive prompts to abuse the agent’s tools. This includes Agent Hijacking, a type of indirect prompt injection where an AI agent ingests manipulated data with additional instructions, causing it to execute unintended actions, such as malicious tool interactions
These attack vectors and others, along with their mitigations, are well-described by OWASP in their Agentic AI – Threats and Mitigations document.
Q: What are other open areas that the industry hasn’t solved yet?
A: One area is evaluating an AI agent’s success. Specifically, being able to test that the plan built by the large language model (LLM) and executed by the agent is providing results that meet the user’s intent. Generative AI can use LLM-as-a-judge for evaluation, which uses a second LLM to judge the first LLM's results. However, I've not yet found evaluation models that validate execution plans. Plus, the tool calling needs to be validated as well. It's possible I may have just not seen it yet, but I suspect it has yet to be developed.
Q: Production environments for recovery testing will likely not have external internet access, but your use case accesses the internet for both the web search tool and the LLM processing, making me wonder what is the likely acceptance rate of this use case?
A: For on-premises recovery environments without external internet access, you would probably want to use an LLM hosted locally. Regarding the web search tool, that was used as one particular way to determine common enterprise workloads, though, as you may have seen in the demo, the agent used information the model was trained with (and I found those results just as good and often better.) Ideally, this PoC can be changed to a more enterprise-robust implementation by swapping out the web search for APIs that connect to an enterprise’s IT asset inventory application, which would more definitively help determine the applications being used.
Q: You mentioned the ability to improve upon past actions. How much training is needed initially vs. longer term, with regards to the user being able to tweak the model and dial it in to get the right accuracy?
A: The ability to improve upon past actions isn’t so much about training. It’s more about the context window, both its size and the amount of information, you fill it with. Think of the context window as the working memory of the LLM. If you fill that context window with a bunch of tool information you will push out the information about past actions. So, it’s more of a context window tuning problem than a training problem.
Q: Can you provide some reference links for someone starting new in Agentic AI?
A: I used BeeAI (Open Source) which is useful for trying out agents with LLMs locally on your laptop.
- BeeAI Agentic AI framework
- While I’ve not had a chance to try these out myself, here are a number of beginner classes from Microsoft that were highly recommended by a subject matter expert:
https://github.com/microsoft/ai-agents-for-beginners
Here are additional links that may be useful:
- Model Context Protocol (MCP)
https://modelcontextprotocol.io/introduction
- OWASP Gen AI Security Project – Agentic Threats Navigator
https://genai.owasp.org/resource/owasp-gen-ai-security-project-agentic-threats-navigator/
- Agentic AI – Threats and Mitigations
https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
And finally, a link to the PoC code I demonstrated:
- Link to PoC demo Code:
https://github.com/IBM/agentic-ai-cyberres
There’s so much to say about Agentic AI, and as Erin mentioned during this webinar, the technology is moving incredibly fast. SNIA CST will continue to provide vendor-neutral information on what’s happening in this space. We are actively working on a follow-up webinar. Follow us on LinkedIn for announcements.
Leave a Reply