Case studies

How I Centralized Terraform States for a Healthcare Tech Firm, Improving Scalability and Collaboration

Eliminating Terraform State Sprawl with a Unified AWS S3 and DynamoDB Backend

The Challenge

A healthcare tech firm with 4 AWS accounts and a small team of 12 engineers + 2-3 contractors faced Terraform state sprawl. Over the years, they had built POCs in separate repos, implemented backends inconsistently, and left orphaned infrastructure behind when projects were shut down. This resulted in:

  • Over 100 Terraform state files spread across AWS accounts, covering 30+ projects with 2-3 environments each.
  • 35 separate S3 buckets and 35 DynamoDB tables dedicated to Terraform state management.
  • 205 abandoned state files from decommissioned projects, leaving orphaned infrastructure like old IAM roles, policies, and CloudWatch log groups.
  • Security and compliance concerns, as orphaned infrastructure was not always correctly decommissioned.
  • Operational overhead, with engineers wasting time managing state locations instead of focusing on feature development.
  • Custom scripts are required to monitor the state of infrastructure across AWS accounts, increasing maintenance effort.

With the company growing, it needed a scalable, unified Terraform backend to streamline operations, enforce governance, and reduce wasted cloud resources.

How I Took Ownership

I proposed and led the initiative to consolidate all Terraform state management into a single AWS S3 backend with a shared DynamoDB state-locking table across all AWS accounts. My role included:

  • Conducting a full audit of all Terraform state files across their AWS accounts.
  • Designing a secure, scalable architecture for centralized state storage.
  • Leading the migration process, ensuring minimal disruption to ongoing development.
  • Implementing workspace management strategies to handle environment segregation properly.
  • Training the engineering team on best practices for maintaining Terraform states moving forward.

The Strategy

To transition from 35 fragmented backends to a single centralized one, I structured the migration into key phases:

  1. Assessment and Planning:
    • Conducted a deep dive into all AWS accounts and repositories.
    • Identified all existing Terraform states and mapped dependencies.
    • Prioritized state migrations based on criticality and risk.
  2. Configuration Setup:
    • Created a single S3 bucket shared across all AWS accounts.
    • Set up a single DynamoDB table for state locking across environments.
    • Defined IAM policies to enforce access control and security.
  3. State Migration:
    • Used Terraform state commands to migrate each project.
    • Applied consistent state naming conventions to improve visibility.
    • Updated Terraform configurations across all repositories to use the new backend.
  4. Workspace Management:
    • Implemented Terraform workspaces to maintain environment separation.
    • Created a scalable structure so that future projects followed a standardized state storage model.
  5. Validation and Testing:
    • Conducted thorough validation to ensure no state corruption or loss.
    • Ran automated Terraform plan/apply tests to verify infrastructure was intact.
  6. Monitoring and Maintenance:
    • Integrated AWS CloudWatch alerts for tracking backend performance.
    • Removed 205 abandoned state files, cleaning up old IAM roles, policies, and unused CloudWatch log groups.
    • Eliminated the need for custom infrastructure monitoring scripts, reducing operational complexity.

The Execution

Execution required close coordination with the engineering team to prevent disruption to ongoing work:

  • Inventorying Terraform states: Identified over 100 active state files, covering 30+ projects with 2-3 environments each.
  • Centralizing the backend: Reduced 35 S3 buckets and 35 DynamoDB tables down to 1 shared backend.
  • Migrating in phases: Migrated Terraform states in small, controlled batches to minimize risk.
  • Updating all repositories: Standardized backend configurations to point to the new centralized backend.
  • Running validation tests: Used Terraform plan/apply workflows to confirm successful migration.

Architecture Diagram

The following architecture diagram illustrates our approach to migrating Terraform states to a shared remote backend using AWS S3 and DynamoDB:

The Results

The centralized Terraform backend dramatically improved the firm’s infrastructure management:

  • Reduced Terraform backends from 35 to 1 (single shared backend across accounts).
  • Eliminated 205 abandoned state files, cleaning up old infrastructure.
  • Reduced company-wide S3 bucket usage by 50% and DynamoDB table usage by 80%.
  • Cut debugging time by 40%, as engineers could now easily locate Terraform state files.
  • Improved collaboration, since all engineers worked from a single, well-documented backend.
  • Strengthened security and compliance, ensuring orphaned infrastructure no longer lingered.
  • Reduced operational overhead, allowing engineers to focus on building instead of fixing Terraform state issues.
  • Decreased project startup time to negligible, making it faster to spin up new environments.
  • Eliminated reliance on custom scripts for infrastructure monitoring across AWS accounts.

Roadblocks & How I Overcame Them

  1. Legacy infrastructure uncertainty:
    • Many orphaned state files made it unclear which were still in use.
    • Solution: Conducted dependency mapping and ran terraform state list to confirm active resources before migration.
  2. Minimizing disruption to development workflows:
    • Solution: Used a phased migration strategy and applied changes incrementally.
  3. Ensuring team adoption of the new workflow:
    • Solution: Provided internal documentation and training to ensure engineers understood the new backend structure.

Key Takeaways & Future Applications

This project reinforced several best practices for Terraform infrastructure management:

  • Centralized state management is critical for long-term scalability.
  • Enforcing consistent naming conventions and backend structure reduces technical debt.
  • Regular audits should be conducted to eliminate orphaned infrastructure.
  • Terraform workspaces help maintain environment isolation efficiently.

Moving forward, I will recommend:

  • Automating Terraform state cleanup for old, unused infrastructure.
  • Proactively defining backend standards to prevent future state sprawl.
  • Rolling out AWS Control Tower for improved multi-account governance.

Get in touch

Is your Terraform backend a mess? If you’re dealing with fragmented state files and inefficiencies, let’s talk about how we can simplify your cloud operations with a centralized Terraform backend. Schedule a call today.

Additional Information

LET’S DO THIS!

Ready to get started?

Schedule an intro call today.

Talk to Brian
Case studies
Optimize Home Renovation Business

A home renovation company earning $1.3M annually in Charlotte, NC, struggled with disconnected tools, ineffective ad spending, and unclear performance insights. I optimized their tech stack, integrated their systems, and automated workflows to improve efficiency and reduce costs.

Read More
AWS Control Tower Setup

This logistics tech company struggled with fragmented AWS account management and excessive admin logins. I implemented AWS Control Tower and SSO, reducing security overhead by 50+ hours per month and cutting admin access by 80% in just four months.

Read More
Shared Networking Account

Discover how a shared Virtual Private Cloud (VPC) on AWS improved security, collaboration, and scalability for a company with multiple accounts. This case study delves into the setup of security architectures, resource sharing via AWS RAM, and the impactful results achieved.

Read More