AI safety guards are not about fear. They are about not being stupid with probability!

AI safety guards are not about fear. They are about not being stupid with probability machines.

A lot of companies right now are still treating AI adoption in engineering like it is just a shinier autocomplete.

It is not.

It is a new production input with a completely different failure profile.

And that means the old SDLC by itself is no longer enough.

Not because the old controls were bad.

Because they were designed for humans writing code with human intent, human context, and human accountability.

LLMs do not work like that.

They generate plausible output from patterns.

They can be useful as hell.

They can also confidently manufacture garbage.

Both are true at the same time.

So if your plan is:

Give everyone Copilot / Cursor / whatever,
Celebrate velocity,
And hope your existing PR template, unit tests, and staging box somehow absorb the blast radius,

then congratulations, you do not have an AI strategy.

You have a future incident report.

The right question is not:

“How do we use AI?”

The right question is:

“What control system do we build around AI so the business gets the upside without eating stupid risk?”

That is the real job.

The old SDLC is not dead. But on its own, it is no longer sufficient.

Pre-AI, most serious engineering organizations already had some mix of:

Code review before merge
Automated test suites before merge or deploy
Canarying / staged rollout
Static analysis and policy checks
SLOs, monitors, alerts, incident response
Intention checks through design reviews / RFCs / architecture reviews
Multiple environments with different gates and blast-radius policies

That stack is still valid.

In fact, for AI-heavy engineering, it becomes more important, not less.

This is not a controversial opinion.

It is exactly the kind of risk-management mindset NIST pushes in the AI Risk Management Framework: govern, map, measure, manage. Not vibes. Not marketing. Controls. [1][2]

And if you have spent any real time in serious systems, this should be obvious.

A fintech payment flow should not have the same verification stack as a low-risk UI tweak in a toy app.

A healthcare workflow should not be guarded the same way as a cosmetic content page.

A model-generated schema migration should not be treated the same way as a comment cleanup.

Context matters. Criticality matters. Reversibility matters.

That was true before AI.

It is even more true now.

Not all gates are equal

One of the most important mistakes people make is talking about “guardrails” as if all controls have the same cost and the same value.

They do not.

You should evaluate every gate on at least four axes:

1) Cost

Some controls are cheap to implement.

Some are operationally expensive.

A static secret scanner or linter is cheap compared to designing a useful reliability and observability system.

A basic policy gate is cheap compared to building a review-routing engine with risk scoring.

2) Damage prevention point

Catching a bad change before merge is cheaper than catching it in staging.

Catching it in staging is cheaper than catching it in prod.

Catching it in prod is cheaper than explaining to Legal why customer data just leaked.

This is not deep philosophy.

This is just how reality works.

3) Complexity

Some controls are trivial to add and cheap to maintain.

Others require coordination across engineering, product, security, SRE, compliance, and leadership.

SLO design is harder than adding a linter.

Meaningful canary analysis is harder than flipping on a static checker.

Risk-based review routing is harder than “all PRs need 1 approval”.

4) Relevance to domain

The right stack depends on product type, regulatory profile, user harm, data sensitivity, and deployment model.

A client-only calculator app is not a banking platform.

A marketing site is not a medical records system.

A throwaway internal script is not a payments ledger.

So yes, the answer is still “it depends.”

But it depends in a structured way, not in a hand-wavy consultant way.

My baseline view: with AI, you do not get to skip controls anymore

This is the part some people will hate.

If you are serious about AI in software engineering, the answer is not to throw away the old controls.

The answer is to keep them and re-tune them for AI-generated work.

Because AI introduces a nasty asymmetry:

It is very good at producing volume.

Humans are still the bottleneck on verification.

So if you only optimize code generation and do not equally optimize verification, you have built a bigger pipe into the same drain.

That is not acceleration.

That is backlog with extra steps.

Also, this part matters for anyone still drunk on benchmark screenshots:

Recent work from METR found that AI tools can actually slow down experienced open-source developers working in repositories they know well, by about 19% in that study. [3]

That does not mean AI is useless.

It means ROI is not magic.

It is conditional.

If your review burden, regression rate, rework, and coordination costs go up, your fancy AI rollout can absolutely make you slower overall. [3]

So here is the baseline:

Use the full stack of existing engineering controls. Then redesign some of them specifically for AI.

Not optional.

1) Code review must become risk-based, not ideological

I see two stupid extremes in the wild:

Extreme A: “Don’t review AI code, it’s faster”

This is clown behavior.

Extreme B: “Review every single AI-generated change in full detail forever”

This also breaks down.

At scale, that can erase the economic benefit and overload senior engineers with review sludge.

The answer is neither.

The answer is risk-based review routing.

My take:

Split review into mandatory human review lanes and sampled / AI-assisted lanes.

Lane 1: Mandatory human review

Anything touching high-risk surfaces gets full human review.

No debate.

Examples:

Payments
Auth / identity
Security boundaries
Permissioning
Regulated workflows
Health or financial sensitive data
Irreversible migrations
Infrastructure / reliability-critical code
Anything with high blast radius or low reversibility

Lane 2: Lower-risk changes

For lower-risk changes, let AI review first, then route only some portion to humans using dynamic sampling.

That sample rate should not be random nonsense.

It should be driven by signals such as:

Engineer tenure in the codebase
Historical rework rate
Number of review rounds per PR
Regression history
Rollback / hotfix linkage
Area criticality
Novelty of change type
Model confidence or policy flags if you track them

So yes, a new engineer touching an unfamiliar service gets a much higher human review sample rate than a domain expert making low-risk changes in a well-observed area.

That is not punishment.

That is basic control theory.

And yes, this can be implemented as an internal policy service sitting in front of the repo workflow.

PR opens.

Service evaluates user, area, policy, risk tags.

Workflow returns:

Continue to automated gates
Require human approval
Require senior approval
Block pending design artifact
Escalate to security / platform / data review

That is what a real AI guard looks like.

Not a Notion page saying “please use AI responsibly”.

2) Test automation with AI is useful, but only if you stop letting it grade its own homework

This one is huge.

Tests are one of the most obvious places to use AI.

And also one of the easiest places to lie to yourself.

Because a model can generate tests that look impressive, increase coverage, and still test absolutely nothing meaningful.

It can optimize for “green”.

Not for truth.

That is why benchmark wins do not automatically map to real engineering value.

METR and related work are already pointing at the broader problem: passing a benchmark or producing superficially valid patches is not the same as getting maintainers to accept work in real codebases. [4]

So the rule here is simple:

AI must not be allowed to invent the testing standard from scratch.

Humans define the standard.

AI helps instantiate it.

That means every serious product area needs structured, explicit testing guidance before you let AI loose.

And yes, that means tribal knowledge is no longer acceptable.

If your product logic lives mostly in people’s heads, your AI system will simply hallucinate around the holes.

Minimum artifact quality before AI-generated tests should be trusted

For each feature / area / service, your documentation needs to cover at least:

Business problem - why this exists
Product behavior - what should happen
Technical design - how it works
User personas and usage modes
Decision history and tradeoffs - why this approach was chosen over alternatives
Reliability expectations - SLOs, latency targets, durability constraints, recovery expectations
Business / product metrics - what actually matters
Critical user journeys - primary and secondary
Test families and anti-patterns - what must be tested, what must not be faked, what shortcuts are unacceptable

That last one matters a lot.

You do not just tell AI what to test.

You also tell it what bad test behavior looks like.

Examples:

DO NOT mock the thing we are actually trying to verify
DO NOT rewrite assertions to fit broken production behavior
DO NOT generate “happy path only” junk
DO NOT collapse multi-step business workflows into fake unit-level theater
DO NOT silently weaken invariants just to make CI green

If you do not explicitly state that, the model will often drift toward convenience.

Because that is what these systems do.

They optimize for producing plausible completions, not for protecting your intent.

3) Canarying still matters. Probably more than before.

Canary releases still work.

Keep them.

Google’s SRE guidance is explicit about why canarying matters: test changes on a small portion of real traffic to gain confidence before wider rollout. [5]

That logic gets even stronger with AI-generated changes because the error mode is often “looks fine locally, explodes under real-world conditions.”

So no, canarying is not obsolete.

It is one of the few controls that directly observes production reality before full blast radius.

For now, I would keep the core mechanism as-is.

What I would change is the decision intelligence around it:

Stronger rollback thresholds for AI-heavy deployments
Tighter anomaly windows on newly AI-touched critical paths
Automatic correlation between generated-change labels and canary outcomes
More aggressive gating for areas with weak historical model performance

4) Static analysis remains useful, but do not pretend it solves the problem

Static analysis, secret scanning, dependency policies, IaC policy checks, secure coding rules, all of that still has value.

Keep it.

But let’s not romanticize it.

Static analysis catches classes of issues.

It does not understand full business intent.

It does not know whether the model just implemented the wrong workflow perfectly.

So yes:

KEEP the linters
KEEP the security scanners
KEEP the policy engines
KEEP the typed interfaces and contract checks

But do not confuse syntax-level safety with product-level correctness.

Those are not the same thing.

5) Observability must expand to measure AI-specific failure patterns

Existing SLOs, monitors, and alerts stay.

But that is not enough.

You now also need instrumentation for the AI-assisted development process itself.

Examples:

Review rounds per PR
Reopen rate
Rollback rate
Hotfix rate
Regression rate
Escaped defect rate
Post-merge revert rate
Change fail rate
Deployment rework rate
Time to restore
AI-generated change ratio by service / team / engineer / risk class
Human review override rate
Model suggestion acceptance rate
Test flake rate on AI-generated changes
Rate of policy escalation triggered by AI-authored work

This is not random bureaucracy.

DORA’s delivery metrics already give you a strong baseline for throughput and stability, including deployment frequency, lead time, change fail rate, and reliability. [6][7]

If AI is truly helping, you should see net improvement without hidden instability tax.

If AI is increasing rework, regressions, or recovery burden, your ROI story is fake.

Measure that properly or stop pretending you know whether the rollout is working.

6) Intent must be documented. Tribal knowledge is dead.

This one is probably the biggest organizational shift.

Before AI, teams got away with a disgusting amount of undocumented intent because strong engineers carried context in their heads.

That was already fragile.

With AI, it becomes a full-on liability.

If the system you are building depends on unwritten assumptions, undocumented tradeoffs, hallway conversations, and “Dave knows why that service behaves weirdly,” then AI will amplify the chaos.

Because your AI stack, whether direct prompting or RAG or internal context pipelines, can only reason over what it can access.

No intent artifact means no intent grounding.

So:

New features should have RFC-level documentation
Risky changes should have explicit design notes
Bugs should not be “fixed from Jira title alone”
Operational exceptions should be recorded
Tradeoffs should be discoverable
Policy decisions should be machine-readable where possible

This is not about paperwork fetish.

This is about giving both humans and models an authoritative source of truth.

Without that, the model guesses.

And when the model guesses inside a production system, the business eventually pays.

7) Multiple environments still matter

Dev, test, staging, pre-prod, prod.

Keep them.

The main change is not the existence of environments.

The main change is how you use them.

AI-heavy workflows should increase the discipline around:

Synthetic test realism
Pre-prod data fidelity
Release scoring
Environment-specific policy thresholds
Provenance of generated artifacts
Deployment labels that let you trace outcomes back to AI-assisted changes

Again, the old control is still good.

You just need to wire it to the new risk.

Then I would add three more rules

Rule 8: Some work should be human-only. Full stop.

Every company needs a clearly defined category of changes that are not delegated to AI beyond maybe research or draft assistance.

Not because AI is evil.

Because not all risk is acceptable.

This category should include work that is:

Highly critical
Hard to verify
Low tolerance for hallucination
Niche enough that model prior knowledge is weak
Legally or regulatorily sensitive
Architecturally foundational
Safety-critical
High-blast-radius and low-reversibility

A lot of teams are weirdly allergic to saying this because they think it sounds anti-AI.

It is not anti-AI.

It is pro-adulthood.

You do not hand a probabilistic system unrestricted authority over the exact places where correctness matters most and verification is hardest.

That is not innovation.

That is negligence wearing a hoodie.

Rule 9: AI usage is a controlled capability, not a default entitlement

This will also annoy people, but tough.

Not everyone should have the same level of AI autonomy in every area from day one.

AI usage should be governed by:

Role
Domain expertise
System familiarity
Historical quality signals
Policy training completion
Risk area certification where applicable

A new engineer should not be free-roaming with AI in critical systems they do not understand.

A strong Java engineer should not automatically be trusted to AI-drive a complex C++ system just because both have braces.

An engineer who does not understand the product domain should not be AI-accelerating changes in that domain.

You want to use AI well?

Earn the right to use it in that area.

That means:

Mandatory onboarding on internal AI policy
Explicit constraints on what areas someone can use AI in
Temporary restrictions when quality signals degrade
Requalification where needed
Different thresholds for different risk tiers

And yes, if someone keeps producing high-recall, high-rework, high-regression AI-assisted PRs, you should absolutely reduce or remove their AI autonomy until they get it under control.

AI is a force multiplier.

If the underlying operator is weak in that domain, congratulations, you just multiplied the weakness.

Rule 10: ROI must be measured continuously, or you are just roleplaying transformation

This is the one executives love to talk around.

Everyone wants the upside slide.

Nobody wants the denominator.

AI value is not “people said they felt faster.”

AI value is the net business outcome after accounting for:

Tooling costs
Model costs
Review costs
Regression costs
Incident costs
Rework costs
Coordination tax
Compliance overhead
Latency introduced by new controls
Domain constraints
Opportunity cost of wrong automation

Your measurement has to include both:

Quantitative outcomes

Lead time
Deployment frequency
Change fail rate
Rework rate
Rollback rate
Defect escape rate
Incident volume
Cycle time
Review latency
Throughput under stable quality thresholds

Qualitative outcomes

Maintainability
Clarity of design artifacts
Confidence in releases
Auditability
Onboarding ease
Resilience of systems and teams
Product correctness, not just code shape

If those numbers go the wrong way for long enough, then the answer is not “believe harder.”

The answer is:

stop, analyze, retune, rerun.

Because AI adoption is not religion.

It is an operating model.

If the model is producing negative return, you fix the model.

My current bottom line

AI in software engineering should not be treated as “coding but faster.”

That framing is already too small and too dumb.

The real problem is this:

How do you redesign your engineering control system for a world where a non-human producer can generate enormous amounts of plausible work, at high speed, with uneven correctness, weak intent fidelity, and highly variable domain reliability?

That is the actual problem statement.

And the answer is not:

Ban it
Trust it blindly
Add one more PR checkbox
Brag about prompt engineering

The answer is:

Layered controls.
Human-only zones.
Risk-based review routing.
Specification-first test generation.
Explicit intent artifacts.
AI-specific observability.
Continuous ROI measurement.
And policy enforcement that reflects real business criticality, not generic hype.

Because the companies that win here will not be the ones that generate the most code.

They will be the ones that build the best verification system around generated work.

Everything else is just speedrunning technical debt, regressions, and expensive self-delusion.

Would be interested to hear how others are designing this, especially around:

Dynamic human review sampling
AI-specific release metrics
Human-only change classes
Policy enforcement in repo workflows
How you are measuring real ROI rather than demo theater

Sources

[1] NIST AI Risk Management Framework 1.0, NIST, 2023: https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

[2] NIST Generative AI Profile to the AI RMF, NIST, 2024: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

[3] METR, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, 2025: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

[4] METR / related benchmark discussion on real-world software task evaluation limits, Measuring AI Ability to Complete Long Software Tasks, arXiv 2025: https://arxiv.org/html/2503.14499v3

[5] Google SRE Workbook, Canarying Releases: https://sre.google/workbook/canarying-releases/

[6] DORA metrics guide, 2026: https://dora.dev/guides/dora-metrics/

[7] DORA Accelerate / State of DevOps research on delivery performance and reliability metrics: https://dora.dev/research/2021/dora-report/

AI safety guards are not about fear. They are about not being stupid with probability!

The old SDLC is not dead. But on its own, it is no longer sufficient.

Not all gates are equal

1) Cost

2) Damage prevention point

3) Complexity

4) Relevance to domain

My baseline view: with AI, you do not get to skip controls anymore

1) Code review must become risk-based, not ideological

Extreme A: “Don’t review AI code, it’s faster”

Extreme B: “Review every single AI-generated change in full detail forever”

My take:

Lane 1: Mandatory human review

Lane 2: Lower-risk changes

2) Test automation with AI is useful, but only if you stop letting it grade its own homework

Minimum artifact quality before AI-generated tests should be trusted

3) Canarying still matters. Probably more than before.

4) Static analysis remains useful, but do not pretend it solves the problem

5) Observability must expand to measure AI-specific failure patterns

6) Intent must be documented. Tribal knowledge is dead.

7) Multiple environments still matter

Then I would add three more rules

Rule 8: Some work should be human-only. Full stop.

Rule 9: AI usage is a controlled capability, not a default entitlement

Rule 10: ROI must be measured continuously, or you are just roleplaying transformation

Quantitative outcomes

Qualitative outcomes

My current bottom line

Ready to find the right
mentor for your goals?

Explore

Support

AI safety guards are not about fear. They are about not being stupid with probability!

The old SDLC is not dead. But on its own, it is no longer sufficient.

Not all gates are equal

1) Cost

2) Damage prevention point

3) Complexity

4) Relevance to domain

My baseline view: with AI, you do not get to skip controls anymore

1) Code review must become risk-based, not ideological

Extreme A: “Don’t review AI code, it’s faster”

Extreme B: “Review every single AI-generated change in full detail forever”

My take:

Lane 1: Mandatory human review

Lane 2: Lower-risk changes

2) Test automation with AI is useful, but only if you stop letting it grade its own homework

Minimum artifact quality before AI-generated tests should be trusted

3) Canarying still matters. Probably more than before.

4) Static analysis remains useful, but do not pretend it solves the problem

5) Observability must expand to measure AI-specific failure patterns

6) Intent must be documented. Tribal knowledge is dead.

7) Multiple environments still matter

Then I would add three more rules

Rule 8: Some work should be human-only. Full stop.

Rule 9: AI usage is a controlled capability, not a default entitlement

Rule 10: ROI must be measured continuously, or you are just roleplaying transformation

Quantitative outcomes

Qualitative outcomes

My current bottom line

Ready to find the rightmentor for your goals?

Ready to find the right
mentor for your goals?