Skip to content

Reflexive Memory: When AI Agents Remember How to Work

When we launched Opnova, I wrote about Adaptive Reliability as one of the core  principles of the agentic AI system we are building: computer-use agents that continually adapt to changing enterprise environments without sacrificing consistent performance.

 

Easy to say, hard to build!

 

In this post, I want to share how we are actually delivering on that vision with reflexive memory: a new approach to multimodal procedural memory. It enables computer-use agents to remember not just what they did, but what they saw, solving a reliability-adaptability tradeoff that has long been a challenge for this technology.

post_d-2

A Real-World Challenge: Accounts Payable in Banking

 

Accounts payable (AP) workflows in banking are deceptively complex. They involve navigating and orchestrating data across multiple systems (e.g. SAP, Fiserv, Ariba) while relying on "the usual suspects" Excel and e-mail to bridge gaps. Additionally, there is no document standardization and almost every case is an "edge" case to be validated by dozens of  business rules (think regulatory and internal controls) layered on top of each other.

 

When we set out to automate these processes at a large European bank, our goal wasn't just to make it work. It was to make it work reliably, thousands of times over, while remaining adaptive to the inevitable variations in real-world banking operations.

 

The result? Our computer-use agents are set to execute a complex AP workflow over 20,000 times annually with consistent accuracy, zero hallucinations, and the flexibility to adapt when environments change.

 

Why This is Hard

 

Everyone knows AI agents are the future, but everyone is also skeptical that this technology is enterprise-ready today. Our prospects and customers are well aware that traditional RPA can't keep up with complex, dynamic workflows and that computer-use agents can, in theory. In practice, most remain too unpredictable for production. The real question is who figures out how to make them reliable enough to meet the requirements for mission-critical workflows at scale.

 

Traditional RPA is deterministic but brittle. These systems were designed for predictable, structured processes, not workflows with dozens of business rules, branching logic, non-standardized documents, and constant exceptions. Each conditional path and exception handler is a potential breaking point. Change a form field, update a UI element, modify a business rule, or introduce an edge case, and the entire automation breaks. Now multiply that fragility across hundreds of workflows, and you understand why enterprises employ entire teams just to maintain their RPA investments.

 

Pure LLM-powered computer-use agents make fresh inference requests for every action. This is computationally expensive, introduces latency, and most critically, opens the door to non-deterministic behavior. The same UI and input can yield different outputs; the same workflow can succeed ten times and fail on the eleventh. This is the reliability-adaptability tradeoff that has long plagued computer-use agents: they're flexible, but that flexibility comes at the cost of predictability. When you're processing invoices worth millions of dollars, "usually correct" isn't good enough.

 

Despite the hype, this is why computer-use agents remain confined to demos, pilots or very small and simple workflows, they're not production-ready for enterprise mission-critical, high-volume complex  workflows. We don't know of any others running complex workflows in production at this scale, with dozens of business rules, non-standardized unstructured data, and significant reasoning requirements across thousands of executions.

 

What if computer-use agents could learn to be reliable without sacrificing their adaptability?

 

The Innovation: Reflexive Memory

 

Our breakthrough came from a simple observation: a seasoned AP clerk doesn’t re-learn how to use SAP every morning. They rely on "muscle memory" for the routine, saving their cognitive energy for the exceptions.

 

We built Reflexive Memory (patent pending) to give our computer-use agents that same ability. Instead of treating every invoice as a brand-new reasoning problem, the agent recognizes recurring UI patterns and structural workflows. It knows when to "act" based on experience and when to "think" using the LLM. For the more tech savy, this is a type of multimodal procedural memory particularly suited for long multi-turn computer-use workflows.

 

This is a core component of our broader agentic architecture: a full-stack system designed specifically for enterprise scale, including learning from human-in-the-loop demonstrations, to model fine-tuning and secure remote desktop control.

 

The Mechanics

 

The method functions by recording the visual state of the screen alongside every action. Instead of a blind click command, the agent binds that action to the specific visual context. It learns exactly what the SAP 'Invoice Overview' looks like before a button is pressed. To make this scalable, we strip away the "noise" of the specific transaction (the dynamic content) such as dates, totals, or vendor names to focus on the underlying UI structure. This abstraction allows the agent to recognize the each step of the same workflow across thousands of different executions for multiple vendors and invoices.

 

Execution then becomes a hybrid process. At every step, the agent checks the current screen against its known patterns. If it finds a match, it executes the action instantly using its reflexive memory. If it hits an unfamiliar layout, an unexpected pop-up message or a specific error code, it seamlessly makes a fresh inference request to the LLM to reason through the "anomaly". When it returns to a "known" state, reflexive memory kicks in again. This approach acts as a physical guardrail: since the agent follows validated reflexes for known paths, the risk of LLM hallucinations is essentially designed out of the system. Even when a workflow update is required, human-in-the-loop feedback triggers a re-mapping of the path and the memory is updated.

 

The Impact

 

Our AP workflow automation demonstrates the power of this approach through production-scale execution with unique inputs every time. This is not a simple point-wise evaluation or a curated pilot.

 

It is a measure of consistent reliability across thousands of samples.

 

We see an approximately 85% reduction in inference requests compared to pure LLM approaches, which leads to significant cost savings on computational resources and ensures zero hallucinations in critical financial operations.

 

Bypassing fresh inference for known steps saves an average of 10 minutes per invoice compared to standard always-inference agents. At a scale of 20,000 invoices annually, this returns over 200,000 minutes of capacity to the bank.  While this volume could technically be reached by simply adding more agents, reflexive memory allows us to achieve these results without the massive overhead of managing a massive fleet of virtual machines and SAP functional users.

 

By optimizing the efficiency of each individual agent, we avoid the infrastructure complexity of a brute-force approach while maintaining the ability to scale horizontally when needed. The system maintains this level of accuracy and speed while providing full adaptability when systems or requirements change.

 

Why This Matters

 

Reflexive memory is just one component of what makes Opnova different. Our purpose-built, full-stack agentic architecture can be fully deployed on-premises, a unique requirement of regulated enterprises. This system is designed specifically to deliver on the promise of Adaptive Reliability. These are agentic AI systems that do not just learn but continually adapt to changing enterprise environments while ensuring consistent robustness over time.

 

The implications extend far beyond accounts payable. For enterprises, this means deploying computer-use agents that actually possess the "muscle memory" required for high-volume production. Unlike standard computer-use agents that must re-evaluate every screen from scratch, our architecture allows for a system that is both intelligent and reliable rather than one or the other.

 

This proves that we can build agentic AI capable of navigating complex interfaces without the instability or high costs usually associated with continuous LLM inference. Opnova is ready to take on complex workflows previously deemed impossible or impractical to automate. We are building for the 20,000th execution and the 2,000,000th after that. Reflexive memory is a stepping stone in that vision.

 

Repeat. Repeat. Repeat. Solved.