The Night I Deleted a Production Database with Terraform
It was 2:17 AM on a Tuesday. I had just run terraform apply on what I thought was a routine security group change. Thirty seconds later, our monitoring dashboard lit up like a Christmas tree. I had accidentally modified a lifecycle block on our primary RDS instance. Terraform, being the obedient tool it is, decided the cleanest path to my desired state was to destroy the database and recreate it. Which it did. In production. With real customer data.
I sat there watching the apply output scrolling, and by the time my sleep-deprived brain registered what was happening, the database was already gone. We had backups. We recovered. But those were the longest four hours of my career, and I learned something that every Terraform user eventually learns the hard way: this tool will do exactly what you tell it to do, and it will not second-guess you.
I am telling you this story not to scare you away from Terraform. I am telling you because after seven years of using it across dozens of projects and three different employers, I still think it is the best infrastructure-as-code tool available. But I think it is wildly irresponsible to review Terraform without being honest about its sharp edges. Too many reviews read like marketing copy. This is not going to be one of those.
Why I Keep Coming Back
Let me be clear about my bias upfront. I have written more HCL than I care to admit. My muscle memory types terraform init before my brain finishes thinking. So take my opinions with that context.
But here is the thing: I have also tried the alternatives. Seriously tried them. I spent three months on a Pulumi project in TypeScript. I have suffered through CloudFormation YAML that made my eyes water. I even kicked the tires on CDK and Crossplane. I keep coming back to Terraform, and the reason is annoyingly simple: it works, and everyone else knows how to use it too.
That second part matters more than people realize. When I am hiring for a platform team, I can find twenty candidates who know Terraform for every one who knows Pulumi. When I onboard a new engineer, they can read our Terraform configs by day two. When something breaks at 3 AM, the person on call can figure out what happened because the terraform state and plan output tell a clear story. The ecosystem effect is real, and it is Terraform's biggest moat. Bigger than any technical feature.
Confession #1: HCL Is Weird and I Kind of Love It
HCL gets a lot of heat. "It is not a real programming language." "Why can't I just use Python?" "Who actually likes writing in a DSL?" I have heard every complaint, and honestly, I used to agree with most of them. Then I spent time debugging a Pulumi program where someone had written a clever abstraction using TypeScript generics to dynamically generate AWS resources, and the resulting stack trace was completely useless.
HCL's limitations are its strength. You cannot write a for loop that calls an API. You cannot hide infrastructure decisions inside three layers of class inheritance. What you see in an HCL file is what you get. Every resource is declared. Every dependency is visible. When a junior engineer reviews a Terraform PR, they can actually understand what it does without being a software architect.
That said, HCL will make you want to throw your laptop sometimes. Try writing conditional logic for a resource that should only exist in certain environments. The ternary-in-a-count-block hack is ugly and everyone knows it. The for_each meta-argument improved things, but complex dynamic blocks still feel like you are fighting the language rather than working with it. I have written Terraform modules where the locals block is longer than the actual resource definitions, just to massage data into the right shape. It is not elegant. But it is readable, and readability wins in infrastructure code because the person reading it at 2 AM is probably panicking.
Confession #2: The Provider Ecosystem Is Unbeatable (and That Is Terrifying)
There are over 4,000 providers in the Terraform Registry. Think about that number. You can manage your AWS infrastructure, your Cloudflare DNS, your Datadog monitors, your PagerDuty escalation policies, your GitHub repository settings, and your Snowflake data warehouse all from the same set of config files. I have a project right now that manages resources across AWS, Azure, and Cloudflare in a single workspace. A Cloudflare DNS record that points to an AWS ALB that serves traffic for an app whose CI/CD pipelines are defined in GitHub Actions -- all in one terraform plan.
Nothing else comes close. Pulumi has decent coverage for the big three clouds but drops off sharply after that. CloudFormation is AWS-only, period. Crossplane is Kubernetes-native, which is great if your entire world is Kubernetes, and limiting if it is not.
But here is the terrifying part. You become dependent on these providers. And provider quality varies wildly. The AWS provider is excellent -- it is maintained by HashiCorp and AWS together, it gets updates within days of new AWS service launches, and its documentation is solid. The Azure provider is good but occasionally lags behind Azure feature releases. Some community providers? You are at the mercy of whoever maintains them. I have had provider bugs that took months to fix because the maintainer had moved on to other projects. When your production infrastructure depends on a community-maintained provider, that is a real risk.
Confession #3: State Management Will Humble You
If HCL is the language of Terraform, state is its soul. And its curse. The state file is a JSON document that maps your config to real infrastructure. It is how Terraform knows what exists, what changed, and what needs to happen next. Without it, Terraform is blind.
Here is my honest take: state management is the single biggest source of Terraform problems in the real world. Corrupted state. Locked state from a failed apply that nobody cleaned up. State that drifted because someone made a manual change in the console. State files that accidentally got committed to git with secrets in them. I have seen all of it, and I have caused some of it.
Remote state backends help. Using S3 with DynamoDB locking, or Azure Blob with lease locking, or just using Terraform Cloud -- these make the multi-person workflow workable. But the operational overhead is real. You need to set up the backend. You need to configure locking. You need a process for when locks get stuck. You need to train your team that terraform state mv and terraform state rm are power tools, not everyday commands. On one project, a well-meaning engineer ran terraform state rm on a resource they thought was unused. It was the NAT gateway for our production VPC. Terraform happily removed it from state, and on the next apply, it tried to create a new one -- which failed because the elastic IP was already allocated. The resulting mess took half a day to untangle.
Terraform Cloud solves a lot of this by managing state for you. And honestly, if you are a team of more than three people, you should probably be using it or something like it (Spacelift, env0, Scalr). The CLI-only workflow with remote state backends works, but it requires more discipline than most teams have.
Confession #4: The License Change Stung
In August 2023, HashiCorp switched Terraform from MPL 2.0 to BSL 1.1. If you are not a licensing nerd, the short version is: Terraform is still free to use, but competitors cannot offer managed Terraform services without HashiCorp's permission. This spawned OpenTofu, a community fork under the Linux Foundation that keeps the original open-source license.
I was genuinely conflicted about this. On one hand, I understand why HashiCorp did it. Companies like Spacelift, env0, and Scalr were building businesses on top of Terraform without contributing much back. On the other hand, the open-source community built a huge part of what makes Terraform valuable. The providers, the modules, the tutorials, the Stack Overflow answers -- all of that was community effort under an open-source contract. Changing the terms felt like a betrayal, even if it was legally within HashiCorp's rights.
Then IBM bought HashiCorp for $6.4 billion in 2024, and the conversation shifted again. IBM's track record with acquisitions is... mixed. Some people I respect have moved their teams to OpenTofu on principle. Others, including me, are taking a wait-and-see approach. The reality is that for most users, the license change has zero practical impact. You can still download Terraform, use it for free, and build whatever you want. The BSL only restricts competing products. But the philosophical damage to the community is real, and it matters in ways that are hard to quantify.
The Pulumi Debate: Let's Actually Have It
Every Terraform review is required by law to compare it with Pulumi. Fine. Let me give you my honest, nuanced take instead of the usual "Pulumi uses real languages, Terraform uses HCL" summary.
Pulumi is a better tool for software engineers who happen to do infrastructure. If your team writes TypeScript or Python every day, Pulumi feels natural. You get real IDE support. You get type checking. You get unit tests that actually test your infrastructure logic. You get the full power of a programming language when you need conditional logic or complex transformations. I built a Pulumi project that dynamically generated infrastructure based on a YAML config file, and the resulting code was clean, testable, and maintainable.
Terraform is a better tool for infrastructure engineers who need to collaborate with everyone else. HCL is readable by non-experts. Terraform plans are auditable by compliance teams. The constraint of a DSL means that configs stay focused on what they are supposed to describe: infrastructure. The massive ecosystem means you will almost never hit a wall where a provider does not exist.
My honest recommendation: if your team is smaller than ten people and has strong software engineering DNA, try Pulumi seriously. You might find it more productive. If your team is larger, or includes people with varying technical backgrounds, or operates in a regulated industry where auditability matters, stick with Terraform. The ecosystem and talent pool advantages are just too significant to ignore.
CloudFormation: The Vendor Lock-In Trap
I am going to be blunt. If you are using CloudFormation in 2025 and you are not exclusively an AWS shop with zero plans to ever touch another cloud, you are making a mistake. CloudFormation's YAML syntax is verbose to the point of cruelty. Its error messages are legendarily bad. When a stack update fails, the rollback behavior can leave you in states that require manual intervention to fix. And the worst part? It only works with AWS.
I will give CloudFormation one thing: zero-day support for new AWS services. When AWS launches something, CloudFormation gets it immediately. Terraform's AWS provider usually follows within days or weeks, but there is a gap. If you are on the bleeding edge of AWS services, that matters. For everyone else, Terraform is the better AWS IaC tool, even if you never use another cloud provider. The syntax alone justifies the switch.
What Terraform Gets Right in 2025
The plan-and-apply workflow. I keep coming back to this because it is the single most important safety feature in any IaC tool. Before Terraform changes anything, it shows you exactly what it will do. Color-coded. Resource by resource. You can pipe that plan output into your PR review. You can require approvals. You can sleep slightly better at night knowing that nobody can accidentally destroy your database without at least seeing the word "destroy" in big red letters first. (They can still ignore it. Ask me how I know.)
Modules that actually work. Terraform's module system lets you package infrastructure patterns into reusable components. I have a VPC module, a Kubernetes cluster module, and an application deployment module that I have used across four different companies. The official AWS VPC module on the registry is genuinely excellent -- it handles the mind-numbing complexity of subnets, route tables, NAT gateways, and internet gateways with sensible defaults that you can override when needed.
The import story has improved dramatically. Getting existing infrastructure into Terraform used to be painful. Terraform 1.5 added import blocks in config files, and the -generate-config flag can auto-generate HCL from imported resources. It is not perfect -- the generated code often needs cleanup -- but it has reduced what used to be a multi-day migration effort to a few hours.
What to Actually Pay for Terraform
The CLI is free. Always has been. For solo projects or small teams that are comfortable managing their own state backend, you never have to pay HashiCorp a cent. Set up an S3 bucket with DynamoDB locking, and you are good to go.
Terraform Cloud's free tier covers up to 500 managed resources with remote state and VCS integration. That is enough for a small startup. The Standard tier at twenty bucks per user per month gets you remote execution, policy enforcement, and team management. It is worth it once your team hits four or five people and the "who has the latest state?" question starts causing problems.
For larger organizations, the Plus tier and Enterprise plans add drift detection, audit logging, SSO, and self-hosted runners. These are sold on a custom pricing basis, which means "call sales." I have negotiated these contracts and they are not cheap, but the alternative -- building and maintaining equivalent functionality yourself -- is more expensive in engineering time. The question is whether you want to pay HashiCorp (now IBM) or pay your own engineers. For most companies, paying HashiCorp is the right call.
Pros and Cons
Pros
- Provider ecosystem is unmatched -- 4,000+ providers covering every major cloud, SaaS tool, and niche service you can think of
- Plan-and-apply workflow is the best safety net in IaC, and it has saved my team from disaster more times than I can count
- HCL is readable by non-experts, which matters enormously for code review and onboarding
- The talent pool is massive -- finding Terraform-experienced engineers is ten times easier than finding Pulumi or CDK engineers
- Module system enables genuine reuse across projects and organizations
- Import capabilities have improved dramatically with 1.5+ features
- Documentation and community resources are the deepest of any IaC tool by a wide margin
Cons
- State management is an operational burden that every team underestimates until it bites them
- BSL license change and IBM acquisition have created real uncertainty about the project's direction
- HCL gets painful for complex conditional logic and dynamic patterns -- you will fight the language eventually
- Provider quality is inconsistent outside the big-three cloud providers
- Large configurations get slow -- plan times of 5-10 minutes for big environments are not unusual
- The learning curve is steep, and the mistakes you make while learning can have production consequences
Who Actually Needs Terraform
If you manage more than a handful of cloud resources and you do not want to do it through the AWS console like an animal, you need some IaC tool. Whether that tool should be Terraform depends on your situation.
You should use Terraform if your team manages multi-cloud or hybrid infrastructure. No discussion. Nothing else handles this as well. You should use Terraform if you work in a regulated industry where auditable infrastructure changes matter. The plan output plus version control creates an audit trail that compliance teams actually accept. You should use Terraform if you are building a platform team and need to create standardized infrastructure patterns for other teams to consume through modules.
You might not need Terraform if you run everything on a single cloud and want the simplest possible setup. CloudFormation or Azure Bicep might be enough. You probably do not need Terraform if your infrastructure is a single Kubernetes cluster and everything runs as pods -- Helm charts and Kustomize might serve you better. And if you are a solo developer running a side project, please do not terraform your hobby. Just click around in the console. Life is short.
Where I Land After Seven Years
Rating: 4.5 / 5
Terraform is not perfect. The state file will betray you. HCL will frustrate you. The license change was disappointing. And there is a very real chance that in five years, the IaC landscape looks completely different.
But right now, in 2025, Terraform is the most practical choice for most teams managing cloud infrastructure. Not because it is the most elegant tool, or the most innovative, or the one with the best pedigree. Because it has the biggest ecosystem, the largest talent pool, the most battle-tested track record, and a workflow that, despite its warts, makes infrastructure changes safer than any alternative.
I have broken production with Terraform. I have also prevented ten times as many production incidents because terraform plan showed me something scary before it happened. That math works out in Terraform's favor, and that is why it gets a 4.5 from me. Just please, for the love of all that is good, read your plan output before you type "yes."
Comments (3)