Why immutable servers?
The DevOps community has embraced the immutable server concept, with good reason. Immutable servers are fully provisioned by automated tooling, and an engineer never needs to access the server to configure it manually. This results in predictable systems – the way a server is configured can be predicted near-exactly by what is in your source control. Immutable servers are faster to deploy, because computers can type more quickly than people.
They can also be more secure. When servers are predictable and fast to re-deploy, sysadmins don’t have much incentive to leave them running for long periods of time. If you’re in the habit of re-provisioning your servers daily, (or even hourly, in the case of some companies who deploy multiple times a day) then an intruder who does make it onto your machine won’t have as much time to set up shop and explore. Long running servers are risky: the Equifax breach occurred over a period of 76 days. Immutable servers with a short lifespan are a great addition to a strong defense in depth strategy.
What are the DevOps tools?
There are many approaches to creating immutable servers. When I joined my current project at Tandem, I encountered a new-to-me stack used to create immutable servers in the Amazon Web Services (AWS) GovCloud. While familiar with Terraform, I hadn’t used either Packer or Ansible before. Now that I’ve spent the good part of the year working with this stack, I’m a convert. While the stack is a bit complex to setup (at least compared to the good old “bash script that runs on new nodes” paradigm) the reliability, fast deployment speeds, and visibility afforded by this stack make the initial pain of setup well worth it.
Packer is a Hashicorp product. In their words, “Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration.” We use Packer to take US Government approved Amazon Machine Images (AMIs) running Red Hat 7 and produce new versions of these AMIs that have all the configuration and software we need to run our application securely in AWS.
Packer supports multiple “provisioners,” which handle the actual server configuration. These can be simple shell scripts, or can be a more robust tool like Ansible. Packer handles the creation of the VM and packaging as an AMI, Ansible handles the configuration of the virtual machine.
Ansible was created by Red Hat as a configuration management tool. It automates all software installation, package management, and configuration on our AWS EC2 hosts. Ansible ensures that any software, config file change, or cron job is installed the same way every time.
Ansible is agentless, so your build toolchain has fewer moving parts. (Probably the worst part of DevOps is becoming a sysadmin for the tools you build to replace you as a sysadmin.) The YAML syntax is incredible and well-documented. The Ansible Galaxy ecosystem is well-stocked with playbooks for all kinds of server management tasks, from deploying an Apache server to complying with federal system security guidelines.
Terraform is a Hashicorp product. It allows us to “safely and predictably create, change, and improve infrastructure.” The term infrastructure refers to components like EC2 instances, load balancers, databases, and networking. Terraform integrates with AWS APIs to translate and run configuration code into AWS API calls that provision our architecture – everything from the networking to the app servers and S3 buckets. Once we’ve “baked” an AMI using Packer and Ansible, we use Terraform to deploy that AMI as an EC2 instance into our cloud environment. This results in a live server in under 3 minutes.
Should you choose this stack for your DevOps practice?
- Fast deployments – a code deploy (no server config changes) takes ~5 minutes
- Idempotent – If I want to know how a server is configured, I scan the relevant Ansible playbooks. SSHing into server is often the third or fourth step I take to debug regressions after a configuration deployment. The first step is to check what changed recently in our Ansible / Terraform git logs.
- Easy recovery – Because of fast, idempotent deployments, it’s easy to bring the system back to a healthy state when a regression is introduced.
- Visible – By following trunk-based development policies and asking for code review of our DevOps code bases, we break down silos. Developers focused on app development can still follow along with system changes. Ansible’s DSL is simple and comprehensible. Developers from a wide variety of backgrounds are able to pick up the tooling easily.
- The development cycle can be slow if you don’t have a good local dev setup. Ideally with Ansible, you’re able to test your changes on your development machine with a tool like Vagrant. This creates the fastest feedback loop.
- Terraform, while popular and powerful, is stateful, which inevitably leads to headaches. If Terraform’s view of system’s state is not in sync with the actual state, Terraform will sit down on the ground and refuse to get up and work until you find a way to match its understanding of reality with reality. There are a number of other challenges to using Terraform and while the upsides are really up, in my view it’s the most controversial tool in this stack.
So, should you buy into immutable servers? Absolutely! Should you buy into this stack for building them? It’s not a bad choice! If you have access to cloud services that allow you to go serverless entirely, consider that. For those of us who are restricted to rolling our own EC2 instances, the Packer / Ansible / Terraform stack is worth investigating.