Our offices

  • United States
    2332 Beach Avenue
    Venice, CA 90291
  • Singapore
    L39, Marina Bay Financial Centre Tower
    10 Marina Boulevard

Follow us

LAAF: Logic-Layer Attacks on Agentic LLM Systems

A case study on LAAF, a lifecycle-aware red-teaming framework for Logic-layer Prompt Control Injection (LPCI) in agentic LLM systems, and why it matters for memory, RAG, and tool-connected AI in production.

Sarah BurtonSarah BurtonProduct / Strategy
12 min
LAAF research — automated red-teaming for LPCI in agentic LLM systems
From taxonomy to lifecycle-aware attack simulation — aligning red-teaming with real persistence, triggers, and escalation

Red-Teaming Logic-Layer Attacks on Agentic LLM Systems

This case study is about the LAAF: Logic-layer Automated Attack Framework paper, published as an arXiv preprint (March 2026): a systematic red-teaming methodology for Logic-layer Prompt Control Injection (LPCI) in agentic large language model systems.

Paper: LAAF: Logic-layer Automated Attack Framework — arXiv:2603.17239

What you get from this case study

This case study is for security leaders, AI engineers, platform teams, and product decision-makers building agentic systems in production. It focuses on four practical questions:

  1. What Logic-layer Prompt Control Injection (LPCI) is, and why it differs from standard prompt injection.
  2. Why memory, RAG, and tool connectors create a larger and more persistent attack surface.
  3. What LAAF adds beyond generic prompt-injection scanning and basic multi-turn testing.
  4. What the paper’s evaluation implies for defense prioritization, validation strategy, and production readiness.

Primary source: LAAF: Logic-layer Automated Attack Framework — arXiv:2603.17239.

Why this research matters

Agentic LLM systems are no longer limited to a single prompt-response cycle. In production, many systems now combine:

  • Persistent memory
  • RAG pipelines
  • External tool connectors

That architectural shift introduces a broader attack class: Logic-layer Prompt Control Injection (LPCI).

LPCI is not just about manipulating one active prompt. It involves payloads that can be:

  • Stored in memory or retrieval layers
  • Encoded to evade simple filters
  • Triggered later by conditions such as a keyword, tool call, or future interaction
  • Reintroduced across sessions without appearing in the latest user message

This makes LPCI a system-level security problem, not just an inference-time prompt problem.

What LAAF introduces

LAAF is presented in the paper as an automated red-teaming framework designed specifically for LPCI. Its contribution is not merely automation by itself, but the combination of two capabilities tailored to LPCI behavior:

1. LPCI-specific taxonomy

LAAF defines a 49-technique taxonomy across six categories:

  • Encoding
  • Structural
  • Semantic
  • Layered
  • Trigger
  • Exfiltration

These techniques are designed to reflect how LPCI attacks actually appear in persistent and retrieval-connected systems.

2. Stage-sequential seed escalation

LAAF introduces a Persistent Stage Breaker (PSB) that models adversarial escalation across a six-stage lifecycle.

Instead of treating each attempt as isolated, LAAF mutates successful payloads and uses them to seed later stages. This makes the testing process closer to how a determined attacker behaves in practice.

LPCI lifecycle in the paper

The paper defines a six-stage lifecycle:

  • S1 Reconnaissance
  • S2 Logic-Layer Injection
  • S3 Trigger Execution
  • S4 Persistence/Reuse
  • S5 Evasion/Obfuscation
  • S6 Trace Tampering

This matters because LPCI attacks are not always immediate. They may remain dormant, survive filters, and activate only when the right condition is met later in the system lifecycle.

What the evaluation shows

The paper evaluates LAAF on five production LLM endpoints accessed via OpenRouter across three independent runs.

The reported result is a mean aggregate breakthrough rate of 84%, with a range of 83–86% across runs. The paper also reports that LAAF achieved higher stage-breakthrough efficiency than random single-technique testing.

The evaluation further indicates that:

  • Layered combinations were among the most effective categories
  • Semantic reframing remained highly effective
  • On more strongly defended platforms, layered payloads could outperform simpler encoding-only approaches

The practical takeaway is clear: defenses that focus only on obvious strings, plaintext patterns, or isolated prompts are not enough for agentic systems with persistence and retrieval.

Scope and interpretation

The paper is careful about scope.

LAAF targets single-agent deployments that match the LPCI definition:

  • Memory-persistent
  • RAG-integrated
  • Tool-connected

The paper explicitly notes that multi-agent propagation across orchestrator and subagent boundaries is out of scope and identified as future work.

That distinction matters for responsible interpretation. The paper supports a strong conclusion about lifecycle-aware red-teaming for this class of agentic systems, but it should not be overstated as a universal claim about every possible agent architecture.

Why this matters for production builders

For teams shipping agentic AI, the message is important:

Security cannot stop at prompt filtering.

Once a system includes memory, retrieval, and tool execution, the attack surface expands into the surrounding architecture. That means effective validation must include:

  • Memory-aware testing
  • Retrieval-aware validation
  • Trigger-based attack simulation
  • Tool-layer control and authorization checks
  • Persistence-focused red-teaming across sessions and stages

In other words, the relevant unit of security is no longer just the model response. It is the full execution path.

Skytells perspective

This case study is relevant to Skytells because it aligns with a core engineering reality: modern AI systems must be assessed as runtime systems, not just models.

Hazem Ali, CEO of Skytells, Inc., is listed as a co-author on the paper. That matters not as a branding claim, but as an indicator that leadership participation is grounded in production-facing AI security, red-teaming, and system-level risk. For organizations building orchestration, memory, retrieval, and tool-connected workflows, this is the kind of research that helps move security from vague concern to measurable engineering practice.

Paper Authors

  • Hammad Atta — AI Security Researcher, Qorvex Consulting; Corresponding author
  • Ken Huang — AI Security Researcher at DistributedApps.AI, Co-Author of OWASP Top 10 for LLMs, and Contributor to NIST GenAI
  • Hazem Ali — Microsoft MVP, Distinguished AI/ML Engineer and Architect, CEO, Skytells, Inc.
  • Kyriakos “Rock” Lambros — CEO at RockCyber, Core Team Member of the OWASP GenAI Agentic Security Initiative, and Project Author, OWASP AI Exchange
  • Yasir Mehmood — Independent Researcher, Germany
  • Zeeshan Baig — AI Security Advisor, Australia
  • Mohamed Abdur Rahman — Professor and Head of Cyber Security / Forensic Computing, College of Computer & Cyber Sciences, University of Prince Mugrin
  • Manish Bhatt — with OWASP / Project Kuiper
  • M. Aziz Ul Haq — Research Fellow at Skylink Antenna
  • Muhammad Aatif — Senior Consultant, Agentic AI Security, Italy
  • Nadeem Shahzad — Independent Researcher, Canada
  • Kamal Noor — Senior Manager at Deloitte, Enterprise Risk, Internal Audit & Technology GRC
  • Vineeth Sai Narajala — with OWASP
  • Jamel Abed — Microsoft MVP, Senior Developer and CEO, AI Community Days
Hazem Ali is a co-author of the paper and is listed as Microsoft MVP, AI/ML Engineer and Architect, CEO, Skytells, Inc. This designation reflects his individual contributions and expertise, and does not imply Microsoft endorsement of Skytells or this publication.
N
Note

Share this article

Sarah Burton

Sarah Burton

Product and strategy lead at Skytells, with a background in AI product management and development.

Last updated on

More Articles

How Multi-Vendor Infrastructure Saved Enterprise Operations During the Gulf Region Crisis

A technical case study on how a multi-vendor cloud strategy enabled Skytells to recover from a Gulf region data center disruption in under 60 minutes—while single-vendor platforms experienced prolonged outages.

Read more

The Hidden Memory Architecture of LLMs

From prefill and decode to paging and trust boundaries — how memory determines GenAI reliability in complex production conditions.

Read more

AI Bias and Skytells' Debiasing Solutions

A detailed case study on the risks of biased data in AI decision-making, using the COMPAS system as an example, and how Skytells' debiasing tools help ensure fairness.

Read more