The lessons learned from Automated testing also apply to IaC

We all know the pitfalls of manually testing code when developing a product or a service. Here’s a refresher:

  • We can’t get good code coverage
  • Humans are error-prone especially when we have to repeat the same mechanical steps multiple times
  • It’s not practical to repeat all the tests after every small change, so tests are conducted only after bigger releases, which delays the feedback we get about erroneous code
  • If human testers are unavailable for any period, it slows down the entire development flow or delays feedback even further
  • Making major changes to the codebase/refactoring is scary with manual testing but such changes are necessary to be made to keep technical debt in check

Ham Vocke, software developer at ThoughtWorks, says:

Instead of having myriads of manual software testers, development teams have moved towards automating the biggest portion of their testing efforts. Automating their tests allows teams to know whether their software is broken in a matter of seconds and minutes instead of days and weeks.

I know I’m preaching to the choir here, we’re already sold on why automated tests are crucial to increase the productivity of development teams. The subject of my current post is to emphasise that automating the tasks of provisioning/deploying and managing infrastructure with code has similar benefits as automated testing.

What is IaC?

IaC means provisioning and managing infrastructure through code(like configuration code, programming code, etc.) rather than via interactive tools(like web interfaces, CLI, etc.).

You can also check the Wikipedia link for it.

This is rather terse, so let us understand it in more detail.

Why IaC?

The classic approach to provisioning infrastructure is for a development team or a testing team to open a ticket or request resources via other means like email and the operations team would respond to the ticket by manually going through the process of allocating the resources via some kind of a UI(web interface, CLI, etc.). This works out to an extent if we require only a few resources and the tasks had to be performed sparingly.

Competition drives organisations to build software that has more features, has better performance, has higher reliability, costs less, takes less time, etc.; otherwise, someone else will accomplish the same goals and dominate the market. Today we have cloud environments that are predominantly API/SDK driven which allow us to provision hardware for hours or days instead of having to make long term commitments to data centres.

Organisations get a big edge if they can control how many resources they allocate at a granular level and then change modify the allocation as frequently as they want, as many times as they want, instead of renting hardware for multiple years at once based on rough estimations. Paying by the hour means we want to allocate more resources when the load is higher and tear down the resources when the load is much lower. The elasticity conferred to us by these API driven cloud environments is hard to take advantage of if we have to manually create & destroy resources all the time, multiple times a day.

The goal of IaC is to solve the above dilemma: how do we replace the process of manually interacting via a web interface/CLI/etc. to provision infrastructure with an automated way by running code that accomplishes the same task. Infrastructure as code pushes us to capture the steps needed to provision and manage infrastructure in the form of programming code, which can be written once and then can be run as many times and as frequently as necessary, by any person or any tool as desired.

Armon Dadgar, co-founder and CTO at HashiCorp, says :

The real idea behind infrastructure as code is: How do we take the process — in some sense, the things that we were pointing and clicking to achieve — how do we take that and capture that in a codified way? So if I need to do it one time, ten times, or a thousand times, I can automate that.
That really becomes the value. It’s really the versioning of it, the reusability of the code, and the ability to then do automation on top of it.

Here’s a code snippet that provisions a cluster of five VMs on AWS:

Copy to Clipboard

Above code is written in a language called `HCL` which is used by an IaC tool called `Terraform` and I’ll explain how it works and how you can write such code in a future post.

Some Benefits of IaC

Documentation: Not only does IaC automate the process, but it also serves as a form of documentation of how the infrastructure is supposed to be provisioned. Because code can be version-controlled, IaC allows every change to act as self-documentation, each change to be logged for auditing or debugging purposes.

Quality control: With IaC, server configurations can go through all the well established good practices just like normal code does(i.e. versioning, testing, code reviews, refactoring, etc.).

Experimentation: Just like unit tests, IaC offers you the freedom to make rapid and large scale changes without the fear that you’ll lose a large amount of time for small mistakes.

Rollbacks: If something starts behaving strangely in the deployed version of your code, it might sometimes be difficult to point out the source of the error. Did someone update the version of a package? Did a team-mate change the value of an environment variable? With IaC you can look back in the source code commit history and find out exactly what changed from the previous version to the current version. And if you can’t quickly fix the change that introduced the strange behaviour, just roll it back to the previous version.

Better collaboration: It’s hard for two team members to collaborate on a task when provisioning of the infrastructure is done manually via the UI; it’s a lot easier if the provisioning is done via code instead. One person can check-in their code and another person can checkout and make changes to it with little to no communication needed with the other person. We can even use the code written by a third-party in the form of downloadable libraries/packages.

Reduction in duplication of effort: It’s easier to reduce duplication of effort between team members, e.g. if one person can write the code to deploy a web server on a VM, nobody else has to repeat that work.

Minimisation of risk: Imagine having one team member be the only one who is fluent in deploying a part of your product. What is the fallback option if that person has to leave the project/organisation suddenly?

Re-usability: Lot of the times you need to provision the same combination of VMs, storage devices, load balancers, etc. but with minor differences(e.g. for dev, staging, production, purposes). You can write code in the form of modules/helper functions such that you define the overall setup once and then pass different values and re-use the same modules/functions to set up multiple environments. Not only does this save time, but it also makes the process easier to maintain, significantly reducing the number of ways that something can go wrong.

Pipeline integration: IaC allows us to integrate the jobs of provisioning infrastructure and modifying/destroying it into our CI pipelines(thus making it a CI/CD pipeline), thereby making the process of committing the code to a repository and seeing the changes in development/staging/production environment into a streamlined, rapid and fully automated process.

Challenges of IaC

One big challenge for a team to consider using IaC is the learning curve – potentially learning a new language(HCL, YAML, etc.), a new set of tools, new paradigms, hitting speed bumps trying to accomplish the same tasks that we’ve been doing easily via a UI might seem like a slow & frustrating process. But once that initial hurdle is overcome, the benefits are exemplary. I plan to highlight the benefits of IaC via practical examples in my next few posts.

References: