The Software Architect and DevOps

A brief summary of the article
A description of the article’s relevance to Software security
Research and reflect on the development of this topic in the past 5 years. Based on your research and personal experience, what do you think is the direction of this topic in the next 5 years? cite examples of past and future work to strengthen your argum

8 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY
THE PRAGMATIC
ARCHITECT
Editor: Eoin Woods
Endava
[email protected]
The Software Architect
and DevOps
Len Bass
ETSY UPDATES ITS production
servers more than 50 times a day.1
How long does your organization
take to get an update into production? Google has a team that only
handles incidents.2 Why? How does
your organization handle incidents?
SourceD builds an Amazon Web
Services deployment script automatically with input from both the
application developers and security
engineers (see Figure 1).
These are three examples of practices that fall under the name DevOps.
They deal with the velocity of releases,
how fast incidents are handled, and
the enforcement of organizationally
specified security practices. All these
are critical for success in today’s environment, and the architect is critical
for success in adopting DevOps practices. Here, I explain why.
The Architect’s Importance
“DevOps is a set of practices intended to reduce the time between
committing a change to a system and
the change being placed into normal production, while ensuring high
quality.”3 These practices involve
four main concerns:
• Quickly getting a change into
production.
• Finding errors through automated testing.
• Reducing or eliminating errors
that occur during deployment.
• Quickly finding and repairing
faults in the system.
All these concerns have organizational, cultural, and technical
aspects. For example, one practice
for quickly getting a change into
production is to do only automated
testing. This will mean a change in
the quality assurance team’s role or
possibly its elimination. Changing
a team’s role is an organizational
change, adapting to differing roles
is a cultural change, and developing
the appropriate set of test cases is a
technical change.
In all these changes, the software
architect is important. The architect
is usually a role model for the rest
of the team, so his or her reaction to
these changes will impact the team’s
morale. In this article, I focus on
the software architect’s technical
aspects with respect to DevOps
practices, but the organizational
and cultural aspects shouldn’t be
ignored. In addition, DevOps practices rely heavily on tool support
and automation. So, the architect
must work with the team to ensure there are personnel to support
the tools and that the team is familiar with the tools that support
production.
The Deployment Pipeline
Assume a deployment pipeline that
consists of a development environment, build environment, staging
environment, and production environment. Figure 2 illustrates a
simplified pipeline. A code commit
comes from the development environment, undergoes integration and
certain tests in the build environment, undergoes other tests in the
staging environment, and is placed
into production.
All these environments should, as
much as possible, be the same. All
team members should develop using
the same OS version, the same version of the supporting libraries, and
the same IDE version. Furthermore,
these should be the same in the production environment and intermediate test environments. The architect
must work with the team to ensure
that this is the case.
Provisioning tools such as Vagrant will create identically provisioned virtual machines (VMs)
for development. Every team member can use a Vagrant script (version controlled, of course) to create
a new identical VM with a single
command. If the VM’s elements
change—for example, owing to a
new release of an OS—then updating the provisioning script and recreating the appropriate VMs will keep
THE PRAGMATIC ARCHITECT
JANUARY/FEBRUARY 2018 | IEEE SOFTWARE 9
the team synchronized. Other tools
such as CloudFormation are used
to establish the environment. Tools
such as Chef, Puppet, or Ansible
can be used to maintain the desired
configuration in environments other
than the development environment.
Continuous Deployment
The DevOps practice that probably
gets the most press is continuous deployment or continuous delivery.4
This means that after a developer
commits code to the version control
system, it’s automatically subjected
to a variety of tests and, assuming
the tests are passed, placed into production. Modern systems can be deployed multiple times a day. Rapid
deployments mean that human testing is impossible. All testing must be
automated. Even if only one deployment a day was necessary, the testing
load would overload human testers.
So, test automation is critical for
rapid deployment.
The architect works with the
team to select test cases. Test cases
must be complete enough for good
coverage of both the system’s functions and its qualities, such as performance and security. Tests can
consume much time, so having the
minimal number of them that provides good coverage for functions
and qualities is critical. Choosing
the test cases thus becomes important. One side effect of the ability to
deploy rapidly is that if a developer
discovers an error, he or she can repair it and deploy the fix quickly.
The ability to continuously deploy
also depends on the system architecture and the processes the developers
use. Suppose that a new feature is to
be added to the system that involves
several development teams. Continuous deployment means that Team A’s
changes can be deployed regardless
of the state of Team B’s changes. Architectural choices such as using a microservice architecture can enable this
practice. So can process activities such
as ensuring backward compatibility
and graceful handling of unknown
method invocations. (For more details, see Chapter 6 of DevOps: A
Software Architect’s Perspective.3)
The architect must not only
guide the architectural choices but
also work with the team so that
developers understand and correctly
implement the prescribed processes.
For example, backward compatibility of services can be prescribed to
allow for independent deployment
of different services of a new feature.
However, maintaining backward
compatibility is difficult and might
not happen owing to time pressure.
This leads to incompatibilities and
errors. Tests can help enforce the
necessary processes.
Operations
repository (Git)
Application
repository (Git)
• Application code
• Application CF
• Load-balancing
setup
• Virtual Private Cloud
• Subnets
• Security groups
• CF frameworks
• Best practice
CloudFormation
(CF) script
Atlassian
Bamboo
FIGURE 1. Creating an Amazon Web Services deployment script automatically with
input from developers and operations. Such enforcement of organizationally specified
security practices falls into the category of DevOps.
Precommit
tests Commit
X
Build image
and perform
integration
tests
User
acceptance
testing,
staging, and
performance
tests
Deploy to
production
Promote
to normal
production
Commit Precommit
tests
Developers
FIGURE 2. A deployment pipeline in which the movement from commit to production
is automatic. Each environment in the pipeline should, as much as possible, be the
same.
THE PRAGMATIC ARCHITECT
10
Once a new version of a service
has been deployed, the developers
should monitor its behavior carefully
until they gain confidence that it has
no problems. “Monitor carefully”
means that developers should be able
to examine measures such as latency,
failure rates, and the number of requests serviced per unit time. This is
the case for not only the newly deployed service but also the system as
a whole because a new version of a
service might impact the whole system. This monitoring involves both
measures normally collected by the
infrastructure, such as CPU utilization of a VM or container, and business- and service-specific measures.
In other words, the service or new
feature should be instrumented to
record a variety of information and
make that information visible so that
the effectiveness of the change or
new feature can be determined. The
architecture of the service and system must support the collecting and
reporting of a variety of measures.
So, determining what’s monitored
and reported and how to accomplish
this is something the architect must
do in conjunction with the team.
Dealing with Errors
A final DevOps practice involving
the architect is dealing with errors
once the system is in production. The
person on call, possibly a developer,
has a short-term task and a longerterm task.
The short-term task is to find a
work-around for the error so that
the system can continue to operate.
This task requires understanding the
system, its services, and how they interact. Although typically the architect won’t be the person doing this,
it exercises skills an architect needs.
So, the architect should nurture team
members who show such skills. For
a thorough discussion of this aspect,
see Site Reliability Engineering: How
Google Runs Production Systems.2
The longer-term task is to determine the root cause and propose
a process so that this type of error
doesn’t recur. This task depends on
the system elements’ traceability. Every service in production should be
traceable back to its constituent elements. The following aspects are
also relevant and should be recorded:
• the environment in which the
service is executing,
• the test cases to which the service was subjected,
• the service’s dependencies, and
• the versions of the tools in the
pipeline that were used to deploy
the system.
This should be done automatically
as part of the build-and-deployment
process.
A second type of traceability involves recording the sequence of
events that led to the error. This sequence includes the user request that
triggered an error, the services that
processed that request, and the services that the failing service invoked.
Component traceability is the responsibility of the various tools in
the pipeline, but determining the sequence of events that led to the error
requires recording information similar to the monitoring information I
discussed earlier. This type of recording is architectural. Tools such
as Zipkin or Jaeger are useful in this
context.
Architects who can play the
role I described here will
become real leaders of
change, as they enable fundamental
improvements in their organization’s
capability through DevOps implementation.
References

J. Miranda, “How Etsy Deploys
More Than 50 Times a Day,” InfoQ,
17 Mar. 2014; www.infoq.com/news
/2014/03/etsy-deploy-50-times-a-day.
B. Beyer et al., Site Reliability Engineering: How Google Runs Production Systems, O’Reilly Media, 2016.
L. Bass, I. Weber, and Z. Liming,
DevOps: A Software Architect’s
Perspective, Addison-Wesley, 2015.
J. Humble and D. Farley, Continuous
Delivery: Reliable Software Releases
through Build, Test, and Deployment
Automation, Addison-Wesley, 2010.
Read your subscriptions
through the myCS
publications portal at
http://mycs.computer.org
ABOUT THE AUTHOR
LEN BASS is an adjunct faculty member for Carnegie Mellon University’s
Master of Software Engineering program. Contact him at lenbass@cmu
.edu.
IEEE SOFTWARE | WWW.COMPUTER.ORG/S O FTWARE | @IEEES O FTWARE .

The Software Architect and DevOps

Solution

This question has been answered.