Master your next Terraform interview with our comprehensive collection of questions and expert-crafted answers. Get prepared with real scenarios that top companies ask.
Prepare for your Terraform interview with proven strategies, practice questions, and personalized feedback from industry experts who've been in your shoes.
Thousands of mentors available
Flexible program structures
Free trial
Personal chats
1-on-1 calls
97% satisfaction rate
Choose your preferred way to study these interview questions
They serve different points in the workflow.
terraform refresh updates Terraform state to match real infrastructure, without changing resources. I have used it when drift happened, like someone changed a tag or security group rule in AWS outside Terraform.terraform plan compares configuration, state, and actual infrastructure, then shows what Terraform would add, change, or destroy. I use it on every change, especially in CI, to review impact before touching anything.terraform apply executes the planned changes and updates both infrastructure and state. In practice, I use apply only after plan review, often with saved plans in higher environments.One nuance, newer Terraform workflows rely less on standalone refresh because refresh is built into plan and apply by default.
State locking prevents multiple Terraform operations from modifying the same state file at the same time. In a team setup, that matters because Terraform state is the source of truth for what exists, and concurrent writes can corrupt it or cause conflicting infrastructure changes.
apply or plan runs, Terraform can place a lock on the state.I’d answer this with a quick STAR format, situation, detection, action, result, and keep it specific.
In one project, an AWS security group was changed manually in the console during an incident, but Terraform still thought the old rules were the source of truth. We caught it when a terraform plan in CI showed unexpected in-place updates, and I confirmed it by comparing the state file with the live resource in AWS. I treated it as state drift caused by out-of-band changes.
To resolve it, I first checked whether the manual change was valid. Since it was temporary, I reverted it by applying Terraform so infra matched code again. If the change had been intentional, I would have updated the .tf code and refreshed state before applying. Afterward, I tightened IAM permissions, enabled drift detection in our pipeline, and reinforced the rule that changes go through Terraform only.
Try your first call for free with every mentor you're meeting. Cancel anytime, no questions asked.
Terraform is an infrastructure as code tool that lets you define cloud resources in files instead of clicking around in the console. You describe the desired end state, like VPCs, subnets, VMs, IAM roles, and Terraform figures out what to create, update, or delete.
Its core value is consistency and control:
- Manual setup is fast at first, but it drifts, gets hard to repeat, and depends on tribal knowledge.
- Terraform gives you version-controlled infrastructure, so changes are reviewable, auditable, and reversible.
- It makes environments repeatable, so dev, staging, and prod can be built the same way.
- terraform plan shows what will change before you apply it, which reduces surprises.
- It scales better for teams, because the infrastructure becomes documented in code, not memory.
I would frame it as moving from ad hoc setup to an engineered, repeatable system.
Terraform state is Terraform’s record of what it manages. It maps real infrastructure to resources in your code, stores metadata, and tracks things like resource IDs, dependencies, and outputs. Terraform uses it during plan and apply to know what exists and what needs to change.
Why it’s critical: - It lets Terraform do diffing, so it knows create vs update vs destroy. - It tracks resources Terraform cannot infer from config alone, like cloud-assigned IDs. - It supports team workflows when stored remotely with locking and versioning.
If state is lost or corrupted:
- Terraform may try to recreate existing resources, causing duplicates or failures.
- It can lose track of dependencies, leading to broken or out-of-order changes.
- Drift becomes much harder to detect and fix.
- Sensitive values in state can be exposed if storage is insecure.
- Recovery often means restoring from backup or using terraform import and state repair commands.
The core Terraform workflow is pretty straightforward:
.tf files, defining providers, resources, variables, and outputs.terraform init to download providers, set up modules, and initialize the backend.terraform fmt and terraform validate to clean up syntax and catch config issues early.terraform plan to compare config against the current state and preview proposed changes.terraform apply to execute the plan and create, update, or destroy infrastructure.terraform destroy to remove managed infrastructure safely.In team environments, this is usually wrapped in Git, code review, and CI/CD, with remote state and state locking to avoid conflicts.
Think of Terraform as: providers connect, resources create, data sources read, variables input, outputs return, and modules organize.
aws_instance or azurerm_resource_group.Remote state is about keeping Terraform state in a shared, durable place with locking and access control, so teams do not overwrite each other. In production, I care about encryption, locking, least privilege, and separating state files by environment or workspace.
s3 with DynamoDB locking, versioning enabled, KMS encryption, and restricted IAM policies.azurerm backend with a storage account and blob container, usually with private endpoints and RBAC.remote backend for state storage, locking, runs, and policy integration.gcs works well, usually with bucket versioning and tight service account permissions.Get personalized mentor recommendations based on your goals and experience level
Start matchingI split Terraform when it improves reuse, ownership, or safety, not just to make folders look neat. If a configuration is small, used by one team, and changes together, I keep it simple in a single root module with clear files like network.tf, compute.tf, and outputs.tf.
My rule is cohesion over abstraction. If a module has a clear interface and stable purpose, it is worth it.
A reusable, maintainable Terraform module is opinionated enough to be safe, but flexible enough to fit multiple teams and environments.
terraform validate, plan, and ideally automated integration checks before publishing.I usually optimize for reuse, clear blast radius, and simple CI/CD. The cleanest pattern I have used is shared modules plus separate environment roots.
modules/ holds reusable building blocks like VPC, ECS, RDS, IAM.live/ or envs/ has one root per environment, for example dev/, staging/, prod/.plan and apply per environment, with tighter approvals for staging and prod.If complexity grows, I split by service or region too, so networking, data, and app stacks have separate state files. That keeps changes targeted and reduces the risk of one apply touching everything.
The big idea is, Terraform is not a secrets manager. Treat it as a consumer of secrets, not the place that stores or generates long lived credentials unless you really have to.
sensitive = true, so Terraform redacts them in CLI output and plans.tfvars, env vars, or CI variables.sensitive only hides display output. It does not prevent storage in state, logs from external tools, or provider side exposure.It depends on how hard you need the isolation boundary to be. Workspaces are lightweight and convenient, but they are not the strongest separation model.
dev, stage, and prod with the same module structure.In practice, I usually prefer separate state files per environment, plus shared modules. I use workspaces for simpler setups, not for high-risk production isolation.
count creates multiple instances based on a number, so resources are indexed like resource[0], resource[1]. for_each creates instances from a map or set, so they are addressed by key like resource["web"]. That key-based identity is the biggest difference.
count when instances are basically identical and you just need N copies.for_each when each instance has distinct values, names, or lifecycle.for_each is safer for changes, adding or removing one key usually affects only that instance.count can cause index shifting, which may recreate resources if the list order changes.for_each for anything tied to named environments, users, subnets, or configs, and count for simple toggles like count = var.enabled ? 1 : 0.Provisioners are Terraform’s way to run scripts or commands during resource creation or destruction, like local-exec, remote-exec, or file copy. They are meant as a last resort when a provider cannot model something directly.
They’re discouraged because: - They break Terraform’s declarative model, since side effects are hard to track. - They’re often non-idempotent, so reruns can fail or drift. - They add brittle dependencies on SSH, WinRM, timing, and network access. - Terraform cannot fully reason about or detect changes from provisioner actions.
They’re still justified in narrow cases: - Bootstrapping a system before a config tool like Ansible can take over. - Calling a legacy API or script when no provider exists. - Short-lived glue logic during migrations, ideally temporary and documented.
Best practice is to prefer cloud-init, image baking, provider-native resources, or external automation first.
I’d answer it in two parts: the mechanics, then the real-world pitfalls.
terraform import <address> <real-world-id> to attach state to that resource.terraform plan to see drift, and update the config until the plan is clean.The biggest challenges are incomplete config, wrong IDs, and provider quirks. I’ve seen imports succeed but plans still want to recreate resources because of defaults, tags, or computed fields. Another common issue is dependencies, like importing a subnet before the VPC module is modeled cleanly. My approach is to import small batches, inspect state carefully, and normalize config until Terraform becomes the source of truth.
The clean distinction is: managed resources create or change infrastructure, data sources only read existing information.
aws_instance is in Terraform state because Terraform manages its lifecycle, create, update, destroy.data.aws_ami queries something that already exists and returns attributes for use elsewhere.To avoid confusion, I use naming conventions like data_ for lookups, keep reads and creates in separate sections, and ask one question: "Should Terraform own this object?" If yes, use a resource. If no, use a data source.
Dynamic blocks let you generate nested configuration blocks inside a resource, module, provider, or data source based on a collection. Think of them like a for_each, but for repeated child blocks such as ingress, egress, or setting. You define dynamic "block_name", give it a for_each, and use content {} to describe each generated block.
for_each on resources, or separate resources are clearer.Local values in Terraform are named expressions you define in a locals block, then reuse as local.name throughout the configuration. Think of them as temporary variables for a module. They do not create infrastructure, they just help you avoid repeating logic.
Example, you might define locals { common_tags = { env = var.env, team = "platform" } } and reference local.common_tags across resources.
I use conditionals sparingly and push complexity into locals, so the resource blocks stay readable. The goal is, make the decision once, name it clearly, then reuse it.
var.env == "prod" ? 3 : 1 for small value changes.locals, for example local.instance_count or local.enable_backup.count or for_each with booleans carefully, count = var.enabled ? 1 : 0 is fine for simple cases.validation blocks so conditionals do not need to defend against bad values.A good rule is, if someone cannot understand the condition in a few seconds, it probably belongs in a local or a different design.
Lifecycle meta-arguments let you control how Terraform handles resource changes when the default behavior would be risky or noisy.
create_before_destroy tells Terraform to build the replacement first, then remove the old one, useful for things like load balancers, autoscaling groups, or DNS-backed app servers where downtime matters.prevent_destroy blocks accidental deletion, which I’ve used on production databases, stateful storage, and critical networking resources.ignore_changes tells Terraform to stop managing specific attributes after creation, helpful when something is updated externally, like tags from a policy engine, autoscaling desired counts, or rotated secrets.I’ve used create_before_destroy during instance type migrations, prevent_destroy on RDS and S3, and ignore_changes when external systems or cloud defaults kept causing unnecessary drift in plans.
depends_on creates an explicit dependency, telling Terraform "wait for this resource or module before acting on that one." Normally Terraform infers order from references, like if resource A uses an attribute from resource B, B gets created first automatically.
You need depends_on when the dependency is real but not visible in configuration.
Use it sparingly. Too much depends_on makes plans more conservative and can slow applies by reducing parallelism.
I treat versioning as a guardrail, not an afterthought. The goal is predictable plans across teams, while still allowing controlled upgrades.
required_version in every root module, usually a bounded range like >= 1.6, < 1.9.required_providers, typically ~> for stable minor pinning, like AWS ~> 5.0..terraform.lock.hcl, so every team and CI uses the same provider build checksums.tfenv, asdf, or a pinned Docker image in CI.For upgrades, I do them centrally through PRs, run plans in lower environments first, then promote. That avoids one team silently drifting to a newer Terraform or provider version and breaking everyone else.
.terraform.lock.hcl is Terraform’s dependency lock file. It records the exact provider versions Terraform selected, plus checksums, after terraform init. Think of it like package-lock.json or go.sum, but for Terraform providers.
Why it matters:
- Reproducibility, everyone installs the same provider versions across machines and CI.
- Team consistency, it prevents "works on my laptop" issues caused by silent provider upgrades.
- Supply chain safety, checksums verify the downloaded provider binaries are the expected ones.
- Controlled upgrades, provider versions only change when you intentionally run terraform init -upgrade.
- It should usually be committed to Git, especially for shared modules, environments, and pipelines.
Without it, Terraform may resolve newer acceptable provider versions over time, which can lead to unexpected plan or apply differences.
I treat upgrades like a small migration project, not a casual version bump. The key is to reduce blast radius, test early, and make rollback easy.
terraform init -upgrade, plan, and compare state or diff output carefully.terraform state mv or moved blocks to avoid resource recreation.A lot of surprise recreations come down to Terraform seeing a field as ForceNew, unstable inputs, or state drift. I’d investigate in this order: plan output, provider schema/docs, then current state versus real infrastructure.
ForceNew attributes changed, like names, regions, subnet settings, or immutable IDs, so Terraform must replace.for_each keys.moved blocks makes Terraform think old resources were destroyed and new ones created.To prevent it, keep inputs deterministic, pin provider versions, avoid ephemeral values, use lifecycle carefully like ignore_changes only when justified, use terraform plan in CI, and inspect terraform state show plus provider docs anytime you see must be replaced.
I review a plan in layers: blast radius, change type, and confidence in intent. The goal is to catch destructive or surprising changes before apply.
destroy or replace actions, since replacement can cause downtime or data loss.forces replacement, dependency chains, and whether a rename is really a recreate.terraform state, module inputs, and sometimes run in a lower environment first. If anything feels ambiguous, I stop and clarify before applying.I treat production Terraform like application code, with multiple safety layers before apply.
terraform plan in CI before approval.lifecycle { prevent_destroy = true } for critical resources, and sometimes create_before_destroy.I use outputs as the contract between a module and its consumers. The goal is to expose only stable, useful data, not internal implementation details.
vpc_id, not every subnet attribute unless consumers truly need them.sensitive = true, but remember that only hides CLI display, it does not remove them from state.Big risks are security leakage and brittle dependencies. Sensitive values can end up in state files, logs, CI systems, or downstream modules. Tight coupling happens when consumers depend on low-level outputs, then small internal refactors break multiple stacks.
I’d answer this with a quick STAR structure, context, what I built, the design choices, and the outcome.
I built a reusable Terraform module for AWS application stacks, mainly VPC integration, ALB, ECS service, IAM, autoscaling, and standard observability. It was adopted by about 20 plus teams because we designed it to be flexible without becoming a mess. The big decisions were, opinionated defaults for 80 percent of use cases, but escape hatches for advanced teams via optional variables. We kept the interface small, grouped inputs by concern, and exposed only stable outputs. We also versioned it strictly, wrote example implementations, and added validation, preconditions, and good docs so teams could self-serve. Success came from balancing standardization with flexibility, plus making the safe path the easy path.
I’d answer this with a quick STAR structure, situation, failure point, recovery, and what I changed after.
In one case, an apply failed halfway through while updating AWS networking. Terraform had already created a new security group and changed part of a route table, but then hit an IAM permissions error on a NAT gateway update. First, I stopped all further applies and checked the state versus real AWS resources to see what had actually changed. I used terraform state list, refreshed state, and verified drift in the console. Then I fixed the IAM issue, imported one resource that had been created outside of state, and ran a targeted plan to confirm only the incomplete pieces would change. After that, I ran a full plan and apply. To prevent repeats, I tightened pre-apply permission checks, reduced risky targeted changes, and made sure remote state locking was enforced.
I’d treat that as Terraform drift and handle it in a controlled way, not by blindly applying.
terraform plan and compare with the real resource state to see exactly what changed.terraform apply so code stays the source of truth.terraform import, terraform refresh if appropriate, or state commands carefully to reconcile.In interviews, I’d emphasize source of truth, impact assessment, reconciliation, and prevention.
It’s basically a tradeoff between simplicity and blast radius.
terraform_remote_state or another contract, which can create coupling if not designed carefully.A practical pattern is to split by lifecycle, ownership, or blast radius, like networking, platform, and apps, instead of making states either fully monolithic or excessively granular.
I usually wire Terraform into CI/CD as a gated workflow: PRs handle validation and review, merges to protected branches can trigger apply, and production applies often need manual approval. I keep remote state in S3 plus DynamoDB locking, use separate workspaces or accounts per environment, and inject cloud credentials through OIDC or short-lived secrets, not static keys.
terraform fmt -check, init, validate, and plan.plan shows destructive changes.I’ve used most of that stack in real Terraform workflows, usually layered in CI so each tool catches a different class of issue.
terraform fmt for consistent style, usually enforced pre-commit and in CI.terraform validate for syntax and provider level config checks before planning.tflint for Terraform specific linting, unused declarations, provider best practices, and custom rules.tfsec and Checkov for security scanning, things like open security groups, missing encryption, or weak IAM patterns.Sentinel in Terraform Cloud for org guardrails, like required tags, approved regions, and instance type restrictions.OPA with Conftest when teams want policy-as-code outside Terraform Cloud, especially in mixed toolchains.My usual answer in an interview is, formatting keeps code clean, validation checks correctness, linting catches quality issues, and policy/security tools enforce standards before apply.
I’d enforce it in layers, because Terraform alone is not enough.
precondition/postcondition, and typed inputs to fail fast on bad values like disallowed instance types or invalid names.tflint, tfsec or Checkov, plus mandatory plan reviews.In practice, I usually combine opinionated modules plus CI policy checks, then use cloud policies as the final safety net.
I treat Terraform module testing like a pipeline, starting small and adding confidence at each stage before anyone else depends on it.
terraform fmt, terraform validate, linting via tflint, and security checks like tfsec or Checkov.I’d answer this by splitting it into validation, isolation, and mocking, because Terraform testing is strongest when you combine all three.
terraform fmt -check, terraform validate, and linting like tflint, plus policy checks with Sentinel or OPA if needed.terraform plan in CI to verify the dependency graph and outputs without always applying real infrastructure.I decide boundaries around ownership, lifecycle, and blast radius. A good rule is, if two things change together and are owned by the same team, keep them in the same Terraform root module. If they have different approval paths, release cadence, or failure impact, split them.
In practice, I also define a contract, what a team owns, what they can change, and what inputs/outputs are supported, then enforce it with repo structure, permissions, and CI policy checks.
I’ve used Terraform Cloud for team based workflows, mainly remote state, VCS driven runs, policy checks, and workspace level variable management. I have lighter exposure to Terraform Enterprise, mostly understanding it as the self hosted version for organizations that need private networking, custom integrations, or stricter compliance controls.
They tighten the feedback loop and standardize how teams ship infrastructure.
plan and apply run in a consistent environment, so no one depends on local creds, local Terraform versions, or "works on my laptop" setups.I’d answer this with a quick situation, decision, action, result flow.
On one project, Terraform was managing AWS infrastructure for an app platform, but the team wanted it to also handle app deployment steps, database migrations, and some one-time bootstrap scripts. I recognized it was the wrong fit because Terraform kept showing noisy diffs, retries were awkward, and failures left us with unclear state around procedural tasks. Terraform is strongest at declaring long-lived infrastructure, not orchestrating imperative workflows.
So we split responsibilities. Terraform kept VPCs, IAM, RDS, ECS, and secrets wiring. We moved application deployment to the CI/CD pipeline, used a migration job in the release process, and handled bootstrap logic with a configuration management script. That made plans cleaner, reduced drift, and made failures easier to retry safely.
I reduce the cognitive load for reviewers and make the impact obvious. The goal is not just “review the HCL,” it is “understand what will change, why, and risk level.”
terraform plan output, usually summarized into creates, updates, destroys, plus any risky replacements.If reviewers are unsure, I do a quick walkthrough, explain the plan line by line, and treat review as shared learning, not a gatekeeping exercise.
I have used Terraform in both multi-cloud and platform automation setups, usually by separating concerns by provider and state, then wiring them together with remote outputs or data sources.
kubernetes and helm providers for namespaces, RBAC, ingress, and app releases.The biggest pain points are timing, ownership, and drift. Kubernetes is eventually consistent, so Terraform might try to create a resource before the API, CRD, namespace, or controller is actually ready. Also, Kubernetes controllers mutate objects after apply, which makes Terraform think something changed even when the cluster is healthy.
depends_on, plus readiness checks for CRDs, ingress, or webhooks.lifecycle.ignore_changes for controller-updated metadata or replicas when appropriate.kubectl changes, enforce process, run plan in CI, and use policy or admission controls.I’d use a layered module strategy. Standardize the 80 percent through opinionated modules, then leave controlled extension points for team-specific needs. The trick is to make the default path easy, and the custom path possible, without turning every module into a giant switchboard.
extra_iam_policies.If a module has 60 variables, I usually split it. That is a sign standardization is becoming accidental complexity.
Yes. I usually answer this with a quick STAR format: situation, key problems, actions, and measurable outcome.
At one company, I inherited Terraform spread across a few giant root modules with lots of copy paste, hardcoded ARNs, weak variable typing, and inconsistent naming. State was local in some places, providers were unpinned, and there were no clear module boundaries, so simple changes caused noisy plans and occasional drift. I split reusable pieces into modules, added typed variables and outputs, pinned provider and Terraform versions, moved state to remote backends with locking, and standardized tags and naming. I also added terraform fmt, validate, tflint, and plan checks in CI. The result was smaller plans, safer reviews, faster onboarding, and far fewer surprise changes during apply.
I treat module docs like a product handoff, they should show intent, safe defaults, and the happy path fast.
README in every module, purpose, architecture diagram, required providers, version constraints, inputs, outputs, and at least one real example.terraform-docs, so docs stay aligned with code.examples/basic and examples/production, so engineers can copy working patterns.validation, precondition, sensible defaults, so misuse fails early instead of relying only on docs.At team level, I usually pair this with a short adoption guide in the platform repo and a PR template that links the approved module examples.
I’d judge Terraform maturity by a mix of delivery speed, safety, consistency, and recoverability, not just “do they use modules?”
A mature org also has clear boundaries, app teams self-serve safely, while platform teams provide guardrails.
I’d start by finding the biggest sources of risk and friction, then standardize the minimum set of practices that improves safety without slowing delivery.
fmt, validate, tflint, tfsec or Checkov, plans in CI, approvals, and drift detection.I’d answer this with a tight STAR structure, situation, task, action, result, then keep the metrics sharp.
One strong example is a multi-account AWS landing zone I built with Terraform for a fintech team. - My role was lead IaC engineer, I designed the module structure, remote state strategy, CI/CD workflow, and review standards. - Constraints were strict compliance, zero manual changes in prod, separate dev/stage/prod accounts, and a short migration window. - I built reusable modules for VPC, IAM, EKS, RDS, CloudTrail, GuardDuty, and baseline policies, using Terraform Cloud workspaces and policy checks. - I also introduced drift detection, tagging standards, and a promotion model so the same code moved across environments safely. - Outcome, infra provisioning time dropped from days to under an hour, audit findings went to zero for that scope, and onboarding new apps became much faster and more consistent.
Knowing the questions is just the start. Work with experienced professionals who can help you perfect your answers, improve your presentation, and boost your confidence.
Comprehensive support to help you succeed at every stage of your interview journey
We've already delivered 1-on-1 mentorship to thousands of students, professionals, managers and executives. Even better, they've left an average rating of 4.9 out of 5 for our mentors.
Find Terraform Interview Coaches