In my previous post, I discussed two broad execution models for Agentic Systems:
Programmatically Orchestrated Systems – where an application/framework controls execution.
Prompt-Orchestrated Systems – where the LLM itself plans and executes based on prompts, agent definitions, skills and workflows.
Once you understand this distinction, the next obvious question is:
What exactly are Guardrails?
Many people equate guardrails with AI safety:
Prevent prompt injection
Block harmful content
Protect secrets
Prevent data leakage
These are certainly guardrails, but they represent only one category.
For Agentic SDLC, guardrails are much broader.
Guardrails are constraints, policies, validations, or controls that influence or verify an agent's behavior before, during, or after execution.
Think of them across the entire lifecycle.
1. Input Guardrails
Applied before execution begins.
Examples:
Validate that requirements are complete.
Detect ambiguous instructions.
Require missing acceptance criteria.
Verify referenced files exist.
Reject unsupported technologies.
Purpose:
Improve the quality of the work before the agent starts reasoning.
2. Execution Guardrails
Applied while the agent is working.
Examples:
Restrict tool usage.
Limit file modifications.
Require approvals before destructive actions.
Control workflow sequencing.
Enforce retry and timeout policies.
Purpose:
Control how the agent operates.
3. Knowledge Guardrails
Control what information the agent can use.
Examples:
Use only approved documentation.
Prefer enterprise coding standards.
Restrict external internet access.
Use only approved APIs.
Require citations for generated documentation.
Purpose:
Improve consistency and reduce hallucinations.
4. Engineering Quality Guardrails
This is the category I believe is currently underrepresented in most AI coding discussions.
Examples:
Enforce modular architecture.
Apply SOLID principles.
Minimize coupling.
Maximize cohesion.
Keep components focused.
Require unit tests.
Consider performance.
Consider accessibility.
Consider observability.
Apply secure coding practices.
Purpose:
Improve long-term software quality rather than simply producing working code.
5. Output Guardrails
Applied after generation.
Examples:
Linting.
Schema validation.
API contract validation.
Static analysis.
Security scanning.
Test execution.
Formatting.
Purpose:
Verify that generated artifacts meet expected standards.
6. Governance Guardrails
Control organizational compliance.
Examples:
Human approval before merge.
Audit trails.
Traceability.
Compliance validation.
License checks.
Architecture review.
Purpose:
Ensure organizational governance is maintained even when AI performs much of the work.
The implementation depends on the execution model
This is where the discussion becomes interesting.
In Programmatically Orchestrated Systems, many guardrails are executable.
The framework can literally prevent an action from happening.
Examples:
Block tool execution.
Reject invalid input.
Stop the workflow.
Require approval.
Validate outputs before continuing.
The application enforces the rules.
In Prompt-Orchestrated Systems (for example, many GitHub Copilot + VS Code agent workflows), the situation is different.
Most guardrails are instructions given to the LLM.
The model is expected to:
follow the workflow,
apply engineering standards,
perform self-review,
ask clarification questions,
generate tests.
But unless an external mechanism validates the result, these guardrails are largely advisory rather than enforceable.
This distinction explains why simply adding more instructions to an agent often produces inconsistent behavior.
The challenge is no longer writing better prompts.
The challenge is designing guardrails that can be consistently applied and, wherever possible, independently validated.
In the final post, I'll discuss practical ways to implement guardrails for AI-native SDLC, especially for prompt-orchestrated environments such as GitHub Copilot, where deterministic enforcement is limited.
Comments
Post a Comment