If your team has been experiencing an unacceptably long pipeline, finding a high number of bugs in production, dealing with flaky tests, and/or struggling with disaster recovery and rollback, you might need to reevaluate your approach to continuous integration and continuous deployment. Properly implementing CI/CD requires embracing not only automation in the process, but also really evaluating—and probably changing—some of your development processes.
I have been a developer for ten years. I have worked on a range of projects, from implementing bare metal deployments to fully-automated ten-minute deployments and everything in between. Most of the teams I’ve worked on tend to be interested in deployment automation, so I’ve helped implement CI/CD practices in a range of industries. In my experience, I’ve found that these are the top four most common CI/CD-related problems that teams encounter:
The pipeline takes too long.
When the pipeline is longer than desired oftentimes it is because teams have not properly implemented the Agile test automation pyramid. I’ve seen instances where developers lean too much on heavy, long-running tests instead of unit tests for the bulk of their testing. Waiting for long periods for a test to complete bloats the pipeline and causes a cascade of delays.
In other cases, the system being used makes it difficult for developers to get the code from their local environment to deployment. Having a cumbersome process, especially one that is not automated or minimally automated, discourages developers from making the small, incremental changes fundamental to good CI/CD.
Incremental changes mean faster feedback loops. That offers value to users and validation for developers that features are being developed correctly. Plus, these changes are much easier to roll back in the event that is necessary.
It is difficult to integrate a whole slew of features at once and not end up with some bugs. That means “big bang” changes will often result in a higher tolerance for bugs in production, because the inherently long wait time between development and delivery means greater risk of regression.
In all honesty, I cannot think of a single situation where teams should not be merging in small increments. That does not mean deployment has to occur at the same frequency. Rather the benefits of continuous integration can tighten the pipeline significantly.
Testing automation is flaky.
Regardless of how fast or slow the pipeline might be, when a suite of tests regularly returns indeterminate or inconsistent results, it is often a sign that a team does not take testing seriously. It might sound counterintuitive that a team would continue relying on unreliable tests, but it does happen. It is most common in situations where the team neglects to include testing as part of their development process.
Not only are unreliable tests a waste of time, but a lack of proper testing makes it challenging for developers to have confidence in the code. Building robust, accurate testing is best achieved by applying a DevOps mentality to bring testing to the forefront and incorporating test-driven development (TDD) into team practice. Treating testing as a first-class citizen and making it a priority really improves the CI/CD pipeline.
There is not enough automation.
Automation is not only for testing. Not enough automation can also mean an increased likelihood of human error in the integration and deployment process. Whether the team lacks the resources to implement automation or there is unreliable automation, too much manual effort spent trying to get code into a mergeable state is a poor use of time.
When it comes to packaging and deployment, the process could involve anywhere from ten to 30 steps. Each one of these steps are open to human error if not automated. Having developers spend hours to achieve the same outcome as a robot is a waste of resources.
Solutions that provide this automation exist. In fact, this problem has already been solved many times over.
There is no good reason not to have automation in your integration and deployment processes.
Time is valuable. Automate as much as possible.
There is no easy way to roll back deployment.
Regardless of how much automation you have in place, bugs will occur. If CI/CD is set up correctly, it should be easy to take back the deployment with the push of a button. Especially in teams that have little to no automation, flaky testing, and long pipelines, chances are that there will be no easy rollback capability, either. (Where there’s smoke, there’s fire.)
On top of the firsthand issues that all of these problems create they really compound each other in a production deployment. Consider, especially, finance- and healthcare-related industries.
There is an intricacy in the way a team works together that allows something like rollback to be easy. That includes which version control strategy and organizational standards the team has implemented. It’s like building a team process with Jenga instead of Lincoln Logs. If the team process is haphazardly stacked, it’ll be a mess.
There’s no way to pull things out without everything coming crashing down. But if you slot everything together correctly, you can remove and replace a piece without it affecting the entire structure.
Conclusion
Every team will be different. The biggest factor in how CI/CD looks in terms of solutions will primarily depend on team size. In a larger team, CI/CD will require more complicated processes and tools, while simpler solutions serve smaller teams well. Regardless of team size, though, addressing these four key problem areas—pipeline, testing, automation, and rollback—should put your team on the path to successful CI/CD.
Remember: it is critical to follow the testing pyramid. Make sure not to invest the bulk of test coverage and effort in a big suite of expensive tests. Instead, spend the majority of resources in the unit level. That will not only improve test accuracy and confidence in the code, but will also speed up the pipeline.
If you are looking for more resources on CI/CD, I recommend this survey report from the DevOps Research & Assessment group.
You can also read this blog post by Partner, Rob O'Brien, about how process bottlenecks are hindering true CI/CD.