Is there anything more stressful than a server that works like a clock during a long time and suddenly goes through some system upgrade and numerous deployments over time? Every new deployment becomes a very hard task, whose main goal is to avoid collapsing the house of cards that inevitably becomes this kind of machine, be it physical or virtual. That’s where it comes Immutable Infrastructure.
A little hack manual in an initial script or a configuration file, a hacker attempt to backport the change in the configuration management system (as long as there is one in place), and you have your finger on the gear.
With the advent of virtualization and cloud solutions, a new approach is becoming formalized: the immutable infrastructure. Let’s see together what its advantages, disadvantages and technical challenges are.
Let it Crash
Development teams generally consider their deliverables as immutable bricks. These deliverables traverse the validation environments by being launched with different configurations, until making their way to production.
The Immutable Infrastructure approach follows the same principles, but considering as basic brick the one that suits a cloud or virtualized system: the server image. The server image is the new building block to build our larger system.
Automating
The analogy has been repeated in different conferences in recent months. In the cloud age, servers should no longer be treated as pets but as a livestock. A faulty server instance must be able to safely be stopped and replaced with a fresh instance.
To reach this level of trust and thus finish servers without remorse, it is necessary to have a certain level of maturity in automation. Provisioning tools such as Puppet, Ansible, Chef or Salt allow you to provision the application stack when you start a new machine. The main disadvantage is the time of operational setting of the machines. If we add the times of:
- detecting a failure on a server;
- setting aside this instance in relation to the traffic;
- starting a new instance;
- implementation of the configuration management system;
- instance integration with standard traffic;
We arrive at best at a handful of minutes, at worst, beyond 10 minutes. Pure automation is therefore not sufficient when the system requirements reach a certain level of criticality.
Anticipation
The scenario described above assumes that you can start a single server image. Such image is assigned an application stack based on its position in security groups, subnets, or other configuration management policies.
The next step performed by many systems is to run the configuration phase upstream and then manage a catalog of server images that can be started directly on the production environment.
This is tantamount to adding this pre-provisioned imaging phase to the end of your continuous delivery chain.
To automate this imaging phase, there are several tools available to you depending on your deployment target:
- VMBuilder
- Weevee
- ImageFactory
- Packer
If you do not know which one to take, I encourage you to start with Packer, which tends to eclipse his comrades, by its technical completeness and simplicity of handling.
Persistence
A common objection in the debates about the Immutable Infrastructure approach is that the system is never static. A totally immutable system would not be very useful. Even if there is still a part of the system that has a state, changing over time, we consider many intermediate components can surely as stateless bricks that could benefit from this approach.
For bricks that require state management, such as databases, several approaches are possible:
- Storage object: Accessed by the server instance at runtime to read and persist data.
- Storage block mounted on the server instance concerned: simple and efficient. It will be necessary to think of setting up a management of snapshots and policies
- Distributed file system with sufficient replication factor to ensure data continuity even when instances are lost. Ceph, HDFS, or ZFS can perform this role.
Incident Analysis
Since the reaction to any failure is the deletion and start of new instances, any diagnosis of the machine becomes impossible. To be able to analyze afterwards, it is necessary to have set up a system of centralization of the logs effective. The post mortem of the machine then becomes possible at the central point of recovery of the logs. This in a much more serene way.
Since we are in a cloud environment, it is possible to automate, not a destruction of faulty server instances, but a “quarantine”. A suspension of the activity of the authority, then a shelving in a zone protected from the standard traffic, makes it possible to the technical teams to come afterwards to analyze the incidents. We take advantage of the extreme plasticity of cloud environments.
Conclusion
This is an approach that does not suit all technical contexts. Despite this, it is a natural evolution of information systems that make use of cloud or virtualization. We should not neglect the benefits it brings should.
In the same way that we must know the secrets of the OS to make deployments that make the most of an infrastructure, it is necessary to control the cloud environment on which we deploy the infrastructure. That’s in order to draw a better infrastructure and put it at the service of your business goals.
Let us know if you have any question in the comments below!