AI safety guards are not about fear. They are about not being stupid with probability!

How to safely use AI in production coding while maintaining high ROI. Both qualitative and quantitative methods
Mohamed Moshrif
Chaos to strategy, ship products that grow | Engineers & founder mentor | Google/Amazon/Microsoft | 23+ yrs AI
Get in touch

AI safety guards are not about fear. They are about not being stupid with probability machines.

A lot of companies right now are still treating AI adoption in engineering like it is just a shinier autocomplete.

It is not.

It is a new production input with a completely different failure profile.

And that means the old SDLC by itself is no longer enough.

Not because the old controls were bad.

Because they were designed for humans writing code with human intent, human context, and human accountability.

LLMs do not work like that.

They generate plausible output from patterns.

They can be useful as hell.

They can also confidently manufacture garbage.

Both are true at the same time.

So if your plan is:

  • Give everyone Copilot / Cursor / whatever,
  • Celebrate velocity,
  • And hope your existing PR template, unit tests, and staging box somehow absorb the blast radius,

then congratulations, you do not have an AI strategy.

You have a future incident report.

The right question is not:

“How do we use AI?”

The right question is:

“What control system do we build around AI so the business gets the upside without eating stupid risk?”

That is the real job.

The old SDLC is not dead. But on its own, it is no longer sufficient.

Pre-AI, most serious engineering organizations already had some mix of:

  1. Code review before merge
  2. Automated test suites before merge or deploy
  3. Canarying / staged rollout
  4. Static analysis and policy checks
  5. SLOs, monitors, alerts, incident response
  6. Intention checks through design reviews / RFCs / architecture reviews
  7. Multiple environments with different gates and blast-radius policies

That stack is still valid.

In fact, for AI-heavy engineering, it becomes more important, not less.

This is not a controversial opinion.

It is exactly the kind of risk-management mindset NIST pushes in the AI Risk Management Framework: govern, map, measure, manage. Not vibes. Not marketing. Controls. [1][2]

And if you have spent any real time in serious systems, this should be obvious.

A fintech payment flow should not have the same verification stack as a low-risk UI tweak in a toy app.

A healthcare workflow should not be guarded the same way as a cosmetic content page.

A model-generated schema migration should not be treated the same way as a comment cleanup.

Context matters. Criticality matters. Reversibility matters.

That was true before AI.

It is even more true now.

Not all gates are equal

One of the most important mistakes people make is talking about “guardrails” as if all controls have the same cost and the same value.

They do not.

You should evaluate every gate on at least four axes:

1) Cost

Some controls are cheap to implement.

Some are operationally expensive.

A static secret scanner or linter is cheap compared to designing a useful reliability and observability system.

A basic policy gate is cheap compared to building a review-routing engine with risk scoring.

2) Damage prevention point

Catching a bad change before merge is cheaper than catching it in staging.

Catching it in staging is cheaper than catching it in prod.

Catching it in prod is cheaper than explaining to Legal why customer data just leaked.

This is not deep philosophy.

This is just how reality works.

3) Complexity

Some controls are trivial to add and cheap to maintain.

Others require coordination across engineering, product, security, SRE, compliance, and leadership.

SLO design is harder than adding a linter.

Meaningful canary analysis is harder than flipping on a static checker.

Risk-based review routing is harder than “all PRs need 1 approval”.

4) Relevance to domain

The right stack depends on product type, regulatory profile, user harm, data sensitivity, and deployment model.

A client-only calculator app is not a banking platform.

A marketing site is not a medical records system.

A throwaway internal script is not a payments ledger.

So yes, the answer is still “it depends.”

But it depends in a structured way, not in a hand-wavy consultant way.

My baseline view: with AI, you do not get to skip controls anymore

This is the part some people will hate.

If you are serious about AI in software engineering, the answer is not to throw away the old controls.

The answer is to keep them and re-tune them for AI-generated work.

Because AI introduces a nasty asymmetry:

It is very good at producing volume.

Humans are still the bottleneck on verification.

So if you only optimize code generation and do not equally optimize verification, you have built a bigger pipe into the same drain.

That is not acceleration.

That is backlog with extra steps.

Also, this part matters for anyone still drunk on benchmark screenshots:

Recent work from METR found that AI tools can actually slow down experienced open-source developers working in repositories they know well, by about 19% in that study. [3]

That does not mean AI is useless.

It means ROI is not magic.

It is conditional.

If your review burden, regression rate, rework, and coordination costs go up, your fancy AI rollout can absolutely make you slower overall. [3]

So here is the baseline:

Use the full stack of existing engineering controls. Then redesign some of them specifically for AI.

Not optional.

1) Code review must become risk-based, not ideological

I see two stupid extremes in the wild:

Extreme A: “Don’t review AI code, it’s faster”

This is clown behavior.

Extreme B: “Review every single AI-generated change in full detail forever”

This also breaks down.

At scale, that can erase the economic benefit and overload senior engineers with review sludge.

The answer is neither.

The answer is risk-based review routing.

My take:

Split review into mandatory human review lanes and sampled / AI-assisted lanes.

Lane 1: Mandatory human review

Anything touching high-risk surfaces gets full human review.

No debate.

Examples:

  • Payments
  • Auth / identity
  • Security boundaries
  • Permissioning
  • Regulated workflows
  • Health or financial sensitive data
  • Irreversible migrations
  • Infrastructure / reliability-critical code
  • Anything with high blast radius or low reversibility

Lane 2: Lower-risk changes

For lower-risk changes, let AI review first, then route only some portion to humans using dynamic sampling.

That sample rate should not be random nonsense.

It should be driven by signals such as:

  • Engineer tenure in the codebase
  • Historical rework rate
  • Number of review rounds per PR
  • Regression history
  • Rollback / hotfix linkage
  • Area criticality
  • Novelty of change type
  • Model confidence or policy flags if you track them

So yes, a new engineer touching an unfamiliar service gets a much higher human review sample rate than a domain expert making low-risk changes in a well-observed area.

That is not punishment.

That is basic control theory.

And yes, this can be implemented as an internal policy service sitting in front of the repo workflow.

PR opens.

Service evaluates user, area, policy, risk tags.

Workflow returns:

  • Continue to automated gates
  • Require human approval
  • Require senior approval
  • Block pending design artifact
  • Escalate to security / platform / data review

That is what a real AI guard looks like.

Not a Notion page saying “please use AI responsibly”.

2) Test automation with AI is useful, but only if you stop letting it grade its own homework

This one is huge.

Tests are one of the most obvious places to use AI.

And also one of the easiest places to lie to yourself.

Because a model can generate tests that look impressive, increase coverage, and still test absolutely nothing meaningful.

It can optimize for “green”.

Not for truth.

That is why benchmark wins do not automatically map to real engineering value.

METR and related work are already pointing at the broader problem: passing a benchmark or producing superficially valid patches is not the same as getting maintainers to accept work in real codebases. [4]

So the rule here is simple:

AI must not be allowed to invent the testing standard from scratch.

Humans define the standard.

AI helps instantiate it.

That means every serious product area needs structured, explicit testing guidance before you let AI loose.

And yes, that means tribal knowledge is no longer acceptable.

If your product logic lives mostly in people’s heads, your AI system will simply hallucinate around the holes.

Minimum artifact quality before AI-generated tests should be trusted

For each feature / area / service, your documentation needs to cover at least:

  1. Business problem - why this exists
  2. Product behavior - what should happen
  3. Technical design - how it works
  4. User personas and usage modes
  5. Decision history and tradeoffs - why this approach was chosen over alternatives
  6. Reliability expectations - SLOs, latency targets, durability constraints, recovery expectations
  7. Business / product metrics - what actually matters
  8. Critical user journeys - primary and secondary
  9. Test families and anti-patterns - what must be tested, what must not be faked, what shortcuts are unacceptable

That last one matters a lot.

You do not just tell AI what to test.

You also tell it what bad test behavior looks like.

Examples:

  • DO NOT mock the thing we are actually trying to verify
  • DO NOT rewrite assertions to fit broken production behavior
  • DO NOT generate “happy path only” junk
  • DO NOT collapse multi-step business workflows into fake unit-level theater
  • DO NOT silently weaken invariants just to make CI green

If you do not explicitly state that, the model will often drift toward convenience.

Because that is what these systems do.

They optimize for producing plausible completions, not for protecting your intent.

3) Canarying still matters. Probably more than before.

Canary releases still work.

Keep them.

Google’s SRE guidance is explicit about why canarying matters: test changes on a small portion of real traffic to gain confidence before wider rollout. [5]

That logic gets even stronger with AI-generated changes because the error mode is often “looks fine locally, explodes under real-world conditions.”

So no, canarying is not obsolete.

It is one of the few controls that directly observes production reality before full blast radius.

For now, I would keep the core mechanism as-is.

What I would change is the decision intelligence around it:

  • Stronger rollback thresholds for AI-heavy deployments
  • Tighter anomaly windows on newly AI-touched critical paths
  • Automatic correlation between generated-change labels and canary outcomes
  • More aggressive gating for areas with weak historical model performance

4) Static analysis remains useful, but do not pretend it solves the problem

Static analysis, secret scanning, dependency policies, IaC policy checks, secure coding rules, all of that still has value.

Keep it.

But let’s not romanticize it.

Static analysis catches classes of issues.

It does not understand full business intent.

It does not know whether the model just implemented the wrong workflow perfectly.

So yes:

  • KEEP the linters
  • KEEP the security scanners
  • KEEP the policy engines
  • KEEP the typed interfaces and contract checks

But do not confuse syntax-level safety with product-level correctness.

Those are not the same thing.

5) Observability must expand to measure AI-specific failure patterns

Existing SLOs, monitors, and alerts stay.

But that is not enough.

You now also need instrumentation for the AI-assisted development process itself.

Examples:

  • Review rounds per PR
  • Reopen rate
  • Rollback rate
  • Hotfix rate
  • Regression rate
  • Escaped defect rate
  • Post-merge revert rate
  • Change fail rate
  • Deployment rework rate
  • Time to restore
  • AI-generated change ratio by service / team / engineer / risk class
  • Human review override rate
  • Model suggestion acceptance rate
  • Test flake rate on AI-generated changes
  • Rate of policy escalation triggered by AI-authored work

This is not random bureaucracy.

DORA’s delivery metrics already give you a strong baseline for throughput and stability, including deployment frequency, lead time, change fail rate, and reliability. [6][7]

If AI is truly helping, you should see net improvement without hidden instability tax.

If AI is increasing rework, regressions, or recovery burden, your ROI story is fake.

Measure that properly or stop pretending you know whether the rollout is working.

6) Intent must be documented. Tribal knowledge is dead.

This one is probably the biggest organizational shift.

Before AI, teams got away with a disgusting amount of undocumented intent because strong engineers carried context in their heads.

That was already fragile.

With AI, it becomes a full-on liability.

If the system you are building depends on unwritten assumptions, undocumented tradeoffs, hallway conversations, and “Dave knows why that service behaves weirdly,” then AI will amplify the chaos.

Because your AI stack, whether direct prompting or RAG or internal context pipelines, can only reason over what it can access.

No intent artifact means no intent grounding.

So:

  • New features should have RFC-level documentation
  • Risky changes should have explicit design notes
  • Bugs should not be “fixed from Jira title alone”
  • Operational exceptions should be recorded
  • Tradeoffs should be discoverable
  • Policy decisions should be machine-readable where possible

This is not about paperwork fetish.

This is about giving both humans and models an authoritative source of truth.

Without that, the model guesses.

And when the model guesses inside a production system, the business eventually pays.

7) Multiple environments still matter

Dev, test, staging, pre-prod, prod.

Keep them.

The main change is not the existence of environments.

The main change is how you use them.

AI-heavy workflows should increase the discipline around:

  • Synthetic test realism
  • Pre-prod data fidelity
  • Release scoring
  • Environment-specific policy thresholds
  • Provenance of generated artifacts
  • Deployment labels that let you trace outcomes back to AI-assisted changes

Again, the old control is still good.

You just need to wire it to the new risk.

Then I would add three more rules

Rule 8: Some work should be human-only. Full stop.

Every company needs a clearly defined category of changes that are not delegated to AI beyond maybe research or draft assistance.

Not because AI is evil.

Because not all risk is acceptable.

This category should include work that is:

  • Highly critical
  • Hard to verify
  • Low tolerance for hallucination
  • Niche enough that model prior knowledge is weak
  • Legally or regulatorily sensitive
  • Architecturally foundational
  • Safety-critical
  • High-blast-radius and low-reversibility

A lot of teams are weirdly allergic to saying this because they think it sounds anti-AI.

It is not anti-AI.

It is pro-adulthood.

You do not hand a probabilistic system unrestricted authority over the exact places where correctness matters most and verification is hardest.

That is not innovation.

That is negligence wearing a hoodie.

Rule 9: AI usage is a controlled capability, not a default entitlement

This will also annoy people, but tough.

Not everyone should have the same level of AI autonomy in every area from day one.

AI usage should be governed by:

  • Role
  • Domain expertise
  • System familiarity
  • Historical quality signals
  • Policy training completion
  • Risk area certification where applicable

A new engineer should not be free-roaming with AI in critical systems they do not understand.

A strong Java engineer should not automatically be trusted to AI-drive a complex C++ system just because both have braces.

An engineer who does not understand the product domain should not be AI-accelerating changes in that domain.

You want to use AI well?

Earn the right to use it in that area.

That means:

  • Mandatory onboarding on internal AI policy
  • Explicit constraints on what areas someone can use AI in
  • Temporary restrictions when quality signals degrade
  • Requalification where needed
  • Different thresholds for different risk tiers

And yes, if someone keeps producing high-recall, high-rework, high-regression AI-assisted PRs, you should absolutely reduce or remove their AI autonomy until they get it under control.

AI is a force multiplier.

If the underlying operator is weak in that domain, congratulations, you just multiplied the weakness.

Rule 10: ROI must be measured continuously, or you are just roleplaying transformation

This is the one executives love to talk around.

Everyone wants the upside slide.

Nobody wants the denominator.

AI value is not “people said they felt faster.”

AI value is the net business outcome after accounting for:

  • Tooling costs
  • Model costs
  • Review costs
  • Regression costs
  • Incident costs
  • Rework costs
  • Coordination tax
  • Compliance overhead
  • Latency introduced by new controls
  • Domain constraints
  • Opportunity cost of wrong automation

Your measurement has to include both:

Quantitative outcomes

  • Lead time
  • Deployment frequency
  • Change fail rate
  • Rework rate
  • Rollback rate
  • Defect escape rate
  • Incident volume
  • Cycle time
  • Review latency
  • Throughput under stable quality thresholds

Qualitative outcomes

  • Maintainability
  • Clarity of design artifacts
  • Confidence in releases
  • Auditability
  • Onboarding ease
  • Resilience of systems and teams
  • Product correctness, not just code shape

If those numbers go the wrong way for long enough, then the answer is not “believe harder.”

The answer is:

stop, analyze, retune, rerun.

Because AI adoption is not religion.

It is an operating model.

If the model is producing negative return, you fix the model.

My current bottom line

AI in software engineering should not be treated as “coding but faster.”

That framing is already too small and too dumb.

The real problem is this:

How do you redesign your engineering control system for a world where a non-human producer can generate enormous amounts of plausible work, at high speed, with uneven correctness, weak intent fidelity, and highly variable domain reliability?

That is the actual problem statement.

And the answer is not:

  • Ban it
  • Trust it blindly
  • Add one more PR checkbox
  • Brag about prompt engineering

The answer is:

  • Layered controls.
  • Human-only zones.
  • Risk-based review routing.
  • Specification-first test generation.
  • Explicit intent artifacts.
  • AI-specific observability.
  • Continuous ROI measurement.
  • And policy enforcement that reflects real business criticality, not generic hype.

Because the companies that win here will not be the ones that generate the most code.

They will be the ones that build the best verification system around generated work.

Everything else is just speedrunning technical debt, regressions, and expensive self-delusion.

Would be interested to hear how others are designing this, especially around:

  • Dynamic human review sampling
  • AI-specific release metrics
  • Human-only change classes
  • Policy enforcement in repo workflows
  • How you are measuring real ROI rather than demo theater

Sources

[1] NIST AI Risk Management Framework 1.0, NIST, 2023: https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

[2] NIST Generative AI Profile to the AI RMF, NIST, 2024: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

[3] METR, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, 2025: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

[4] METR / related benchmark discussion on real-world software task evaluation limits, Measuring AI Ability to Complete Long Software Tasks, arXiv 2025: https://arxiv.org/html/2503.14499v3

[5] Google SRE Workbook, Canarying Releases: https://sre.google/workbook/canarying-releases/

[6] DORA metrics guide, 2026: https://dora.dev/guides/dora-metrics/

[7] DORA Accelerate / State of DevOps research on delivery performance and reliability metrics: https://dora.dev/research/2021/dora-report/

Ready to find the right
mentor for your goals?

Find out if MentorCruise is a good fit for you – fast, free, and no pressure.

Tell us about your goals

See how mentorship compares to other options

Preview your first month