CI/CD can add significant benefits to operations and the development process, but what it is and how to best implement it is often misunderstood. Stride sat down with Aaron Foster Breilyn (AFB), Principal Consultant Software Developer, and Eric Schoenfeld (ES), Partner, to talk about some of the pitfalls and promises of CI/CD.
What is a misconception you frequently hear when it comes to CI/CD?
AFB: People often say they want CI/CD, but a lot of the time what they are envisioning is that CI/CD is a plug-and-play pipeline that they can apply to their existing system. It’s not. What they are actually asking for is a specific outcome—a solution for something that is causing pain for their development team or the business. Maybe they have had a lot of bugs make it to production, and they’ve heard that CI/CD decreases bugs. Or maybe their teams are siloed and the deployment team only pushes to production once every week or two, and CI/CD is supposed to increase the frequency of deployment. So, they ask for CI/CD. But without being able to articulate the outcome, generic “CI/CD” doesn’t address the pain points. In the first scenario, what they’re really asking for is automated testing, which is a characteristic process involved in CI/CD and can significantly reduce the number of bugs that make it to production. In the second scenario, there are multiple possible routes. One of them might be requiring unit tests and automating as much testing as possible, at high frequency, to support more frequent deployments.
ES: To build on what AFB said, really understanding and articulating those pain points, and deciding how those need to change—what the outcome should be—is missing from these initial discussions. Often, many companies who say they want to implement CI/CD are not asking the right questions. From a business standpoint, the main goal of CI/CD is to decouple features from product releases — to enable small code changes at high frequency, with high confidence. Whenever I’m asked to provide Agile coaching to an organization, I will look at their deployment frequency as an early indicator of how effective coaching will be. If the deployment cycle is too long, it means there are more important underlying problems to fix first.
In your experience, what is the best way for companies to implement CI/CD?
ES: If we work backward from the end goal: Deploying at high frequency, with high confidence, depends on more testing and tighter integrations. The feedback loop gets tightened as a result, so bugs can also get fixed more quickly. I think it’s important to emphasize here that this does not mean fewer bugs will exist, but instead that fewer will make it into production.
AFB: I think the best way to see big benefits and lay the groundwork for CI/CD is to test, test, test. I’d say the lesser-known cousin of CI/CD is “continuous testing” because it’s like what Eric said about needing to have confidence in what you’re building. That’s why we test drive things! It’s not to achieve some arbitrary metrics. We test because we want to know the code we’re shipping is doing the right thing, that it can go to prod and not detract from the experience. So, part of CI/CD is establishing all of those processes and practices that build confidence in the code. If you don’t have a robust testing suite running in an automated fashion in your pipeline, you’re missing the chance to catch small problems before they become bigger ones.
Where does continuous integration fit into high-frequency, high-confidence deployments?
AFB: I’ll back up a bit first, because I think it’s important to create some context. Software incurs risk when code is written and not merged to production in two ways. The first is a lack of delivery of features, which means sad users. The second is technical divergence, which adds technical debt and can increase the chance of bugs and prolong the deployment cycle. CI/CD seeks to solve both problems. So far, though, we’ve mostly focused on high-frequency deployments, which gets features to users faster and hopefully results in more happy users—or at least less sad ones! But CI is the part of CI/CD that helps development teams, because continuous integration keeps the code as unified as possible, which means that developers can be confident that the code they’re working on is as close as possible to the code another developer is working on. And because the development team is working on the same code, less variation means less rework in trying to solve merge conflicts. Not to mention that having continuous deployment along with continuous integration ensures that the code the developers are working on is also the same that the users are seeing.
Any tips for companies wanting to implement CI/CD?
AFB: Testing! I know I’ve said it over and over, but it really is one of the most critical and impactful components of CI/CD. In addition to that, I would also caution that once you start getting going with automating unit tests and seeing your deploy frequency speeding up, it can feel really exciting and you’ll probably want to do it everywhere, but it’s important to keep in mind that you won’t be able to integrate everything into the CI/CD process. Mainframe testing stands out as an example, here. In those cases, the best way to handle it is to separate out those exceptions — draw a box around it and start to build ways to test integration of that box into the rest of the process.
ES: I second AFB’s emphasis on testing, from unit tests at the individual developer level to A/B testing at deployment. I am a big proponent of lean product development and Eric Ries’s philosophy that we should think about products from a scientific standpoint. Every feature should be split-tested and proven to have value (or proven not to have value and subsequently killed). CI/CD is what allows companies to do this kind of testing because it requires the confidence that any code commit can be pushed into production without fear of breaking something.
Speaking of breaking something, should rollback be included as part of CI/CD?
ES: There are times, like with disaster recovery, where having a production rollback capability will be important, but rollback isn’t – and really shouldn’t be – a part of the CI/CD process. Because on the flip side, there are times when you can’t possibly do a rollback, like with data migration, so if you’re doing CI/CD properly, you should always only be rolling forward in production and never rolling back.
AFB: I think the best way to make sure you don’t push code to production that causes some kind of catastrophe is to include a staging phase before production deployment. The staging area should be as production-like as possible so that if you make an infrastructure change that blows up in staging, you can roll it back there without impacting the business. In a local environment, developers should be able to roll back whenever they need to – that should be a part of the CI process.
Do you have any CI/CD success stories you can share?
AFB: Years ago, I was working on an app that was in beta. We were doing some user testing, which included in-person user interviews. There was one particular interview where we got some great feedback, and because we were using CI/CD, I was able to write a follow-up email later that same day to the interviewee, telling them that the feature they wanted during the user interview had already been implemented! That felt really good, and it was only possible because the process was set up to allow these sorts of big changes with high confidence.
Is there any reason for a business not to implement CI/CD?
AFB: CI/CD is almost always a good idea. The way you get there varies, though. There are organizations that, for whatever reason, can’t continuously deploy, but they can add in automated testing, continuous integration, and continuous delivery. It all comes down to what we talked about at the start: What’s the outcome and how can you best achieve it?
ES: Exactly. And that outcome is very important because it also justifies the investment a company makes in CI/CD. Think about a team that deploys relatively large change sets, infrequently – getting them to the holy grail of deploying multiple times per day with high confidence is expensive. Ultimately, outcomes have to be defined and measured in order to justify spending the money to achieve those outcomes.
At the end of the day, every CI/CD implementation is different. You can’t compare two different organizations and say one is doing it better than the other when the codebases, the teams, the complexity and ability...they’re all different. But the goal is the same.