Agile services - Part 2: Horizons
This is the second post in the series about how to apply Agile methods to run services based on existing software.
The previous post describing the problem at some length is here, but the TL;DR; is
I don't think there are great resources on how to apply agile methods to run and develop services based on existing (open source) software. We have struggled with it, and I try to write down practices that work for us, mostly taken from Scrum and SRE.
This post is more on the service lifecycle side and how it affects the business.
In this post I don't go deep into applying agile methodologies to run services based on existing software, it is more a post on when to apply them.
Normal disclaimer: these are my own opinions based on my own experiences.
Horizon models
I think I first read about the McKinsey Horizon Model in the Lean Enterprise book (I recommend the book). It made a lot of sense, and I think Scaled Agile Framerwork (SAFe) has made good improvements on it in their Horizon model. SAFe is - as the name says - a framework for scaling up agile in the organization. More on this at a later blog post, but if you're interested, take a look at their web page
I think the Horizon model is also a great way to think about the service lifecycle, and the mappings between service lifecycle and the Horizon model are quite logical.
Edit for clarity: What Horizons? I see this as a good mental model to look at a service for its whole lifecycle, from idea to its eventual shutdown. At different stages of the service the risks, investments, goals and work methods differ. If we want our services to be successful, we should acknowledge this in some way.
I'll try to write down one interpretation of how service lifecycle maps to the SAFe Horizon model. Please note, that the SAFe Horizon model is very business oriented, works on a high level, and often concerns scales wider than this post. When we look at individual services the scale may be much smaller (but then again, may not), but a lot of the practices still apply. The timescales between the SAFe model and this post may also differ a lot, depending on the scale of the service.
Why do we care about the horizon model when we are talking about using agile methods for running services? I think it's important for two main reasons. The first one is to know when in the service lifecycle we need to form agile teams. The second is to look at the team's workload as a function of the horizon we're in. This allows us to have better cost expectations from the business side, and to make sure we actually get the benefits we expect from the service.
Horizons
Let's take a look at a picture to set the ground before digging deeper.
(Edit: updated to matplotlib chart)
Note that this example is fictional, but describes a service that goes through heavy growth after the release. The work does not scale linearly with the service use, but the work does grow. We see that in Horizon 3 and 2, we have testing and development, with growing effort when we're nearing production. In the main lifecycle of the service - Horizon 1 - we see a large increase in effort. This is partly due to adding a lot of operational tasks, and handing a (maybe strongly growing) customer base. However, a lot more expectations are put on the service at this point too, which leads to an increased development effort.
Please note, that the shape of the curve depends a lot on the actual service you operate, and its scale.
Below I'll try to expand a bit what and how much work is done in different horizons. The lists of task for each horizon are by no means exhaustive, and the scope may depend on the service. They mainly serve as examples.
Horizon 3 - Feasibility
Main purpose of this horizon
We should have an answer to the question
Is it feasible to develop this idea further?
What do we do in this horizon?
When you think about deploying a new service, you start here. This step may still be a grassroots movement which is driven by a specific need. E.g. a team needs CI/CD tooling and starts setting it up, and at some point realizes that it's inefficient to do this only for one team.
This can just as well start from business needs, where a service is seen as potentially important and it needs to be explored.
There are some important things to find out in this step.
What are are the options for deploying this software (or what other solutions may exist)?
Will this be technically feasible? Can this scale? If we run it for many customers? Are the security controls in place to run it as a service? Does it seem that this software has a future? Can it be integrated to what our supporting services (e.g. authentication/authorization)?
We also need some idea of the business feasibility of running this service. How much do we estimate it will cost? How many customers do we expect to have? What would be the price point?
I think it's really hard to do a very detailed business feasibility estimate at this point, but we should at least have some idea before proceeding.
How do we work in this horizon?
This work does not need to be very organized, and - depending on the scale - may even be done as somebody's side project. A single enthusiastic admin can do a lot here, but while this step includes a lot about technical work, we should also look into business feasibility. This may need help from other parts of our organization.
If we have existing agile service teams with spare cycles, there is nothing preventing us from giving the Horizon 3 task for a new service to an existing team.
Some of the work done in this horizon:
Operations | Development | Team/personal development | Communications |
---|---|---|---|
Testing | Learning, evaluating technologies | Future stakeholder discussions | |
Setting up prototypes | |||
Documenting results |
Before moving on
While some may be tempted to say "Hey! We have installed the service, let's take in users!", we're way too early at this stage, and we should probably scrap anything we did (except knowledge gathering and documentation). We're not at the minimum viable service point yet.
If the new service does not look feasible after Horizon 3, this is a good time to shut it down. We haven't spent a lot of resources on it yet, and we have gained a decent understanding of what it needs.
If we do move on, the next step requires a lot more resources and it can't (well, shouldn't) be done at the grassroots level any more.
Horizon 2 - Minimum Viable Service
The main purpose of this horizon
In this horizon we want to build a Minimum Viable Service we can release.
What do we do in this horizon?
So, now we know it's most likely technically feasible to implement this service, and the costs make some kind of sense.
We need to make sure we build something we can launch. This is much more work than what we did previously. We need to design the service implementation. How do we do customer onboarding (hopefully automatically)? How do we do scaling? How do we do high availability? How about disaster recovery and backups? How do we do service operations? We probably deploy this with the infrastructure as code approach. This means we need code.
We may also need to work on a lot of integration to our existing organizational systems, monitoring, accounting, access control, etc.
In addition we should on how the finances will work, e.g. how do we set a price on the service (if we set a price). With luck, we're in a state, where the financial models already exist within the organization, and we can mostly just decide on the pricing.
Then there is end-user documentation, marketing, organizing support for the service etc. etc.
In Horizon 2 we probably also launch something to selected end users, when we're nearing production readiness. It's important that we do good expectation management at this point, and we may even say "You can test and play around with the service, but you'll lose all your data you put in the system in 4 months.". Testing our deployment on a limited crowd with low expectations helps us get information on how the service works, and how the operations work, which helps us plan the actual first release.
The result of this step should be a usable service we can run, but it's not perfect, it's a minimum viable service. The point of this step is not that we develop the service to its final state. The point of this step is to have a starting point which can start produce value. Up to now, all we have had is cost.
I refrain from using the term minimum viable product on purpose, as I think it may steer thinking into the direction that if the core software is set up, we're good to go. A minimum viable service has more requirements, from monitoring to documentation and accounting. The quality requirements on the minimum viable service may already be relatively high, depending on what the service is.
How do we work in this horizon?
As opposed to Horizon 3, where we can have a single person working on it, now we need to start building a team, or bringing in an existing team with spare cycles. Having experienced team members helps a lot in this Horizon. We make a lot of architectural and process decisions here, and they have a big impact on the cost of running the service. While we possibly (or probably) completely re-architect the service at some point in Horizon 1, it's a lot harder to do with customers on the service.
Here we start taking agile practices into use - if we don't already use them, and we make sure that several people take part in the design and development of the service. It's also important to that the team at this stage should carry over to Horizon 1. If we lose the team (or a significant part of it) after Horizon 2, we lose the experience in developing and running the service
Some of the work done in this horizon:
Operations | Development | Team/personal development | Communications |
---|---|---|---|
Small amount of customer tickets | Architecture planning | Onboarding members | Stakeholder requirements gathering |
Minor updates | Service automation | Learning the software/service | Building roadmaps |
Operations planning | Building team practices | Marketing | |
Financial planning | Documentation | ||
Certification / Compliance work | Monitoring |
Before moving on
We should now have a good idea about how the service works and is set up. Hopefully we have development, test, and production systems, and we have torn down and deployed all of them many many times. We have also practiced some normal operations and disaster recoveries.
Depending on how deep in the service stack we are, after we launch, we may never be able to have scheduled downtime for the whole service. We have to know the service well enough to dare to do changes. Test systems which closely mirror production are a great help, so we can verify our changes and our processes.
There is still time to kill the whole service at this point. Horizon 1 will need even more resources, and we may e.g. take in customer data that is very hard to get rid of, and makes it hard to shut down the service.
Horizon 1 - Production
The main purpose of this horizon
To run the service in a sustainable way.
What do we do in this horizon?
We're live! Now we can start producing value (Note: I really dislike the term, and try to avoid it. It's too vague, and can mean tons of different things. Sometimes it is the commonly understood word though.)
This is the main lifecycle of the service. We do everything here. We do most of the development and operation of the service is done here. In a long running service I would be surprised if less than 80% of the development of the service is done in Horizon 1.
When we launch the service, we get a lot more work. Up until now, we didn't really have to care about customers, nor our processes for scaling. We get a big chunk of operational work, and in addition we now have to start following up with updates, changes in our organization's services and policies and security updates.
One of the more enjoyable and interesting things that come up, is when you get more users, they probably will start using features you've never heard of, let alone tested. You don't have time to verify all parts of the upstream software after all. These requests often result in somewhat long deep dives into the unknown, which can be fun as long as you have time for it.
In addition to operational work, it's critical to continue active development. If we truly developed a minimum viable service, the first touch with customers should help us show the direction we should move. As we grow, unless we do active development on and automation of the operational tasks, scaling is going to take a lot of resources. So constantly improving our operations is important. The service probably doesn't exist in a vacuum and a static environment either, so we need to react to changes in our surrounding. There might be also more certification work, depending on your service.
While the work amount increases, we should also get more and more benefit out of the service as it goes on, and if it is successful, it will scale up. Our work will not (should not) scale linearly with the usage growth, which means while the absolute costs of running the service grow, the unit costs can shrink a lot.
As we hopefully run the service for a longer time, we have to take into account personnel changes. Experienced people may leave, and this will affect the productiveness of the team. We need to book a decent amount of time for training new people for either growth or replacing team members. This also helps us control the Bus Factor and greatly reduce business risk.
How do we work in this horizon?
The next post will more in depth go into how to apply agile methods in Horizon 1 services.
Some of the work done in this horizon:
Operations | Development | Team/personal development | Communications |
---|---|---|---|
Incident response | Architecture updates | Onboarding team members | Stakeholder communication |
CVE tracking/response | Service automation | Learning the software/service | Building roadmaps |
Customer tickets | Operations automation | Developing team practices | Marketing |
Minor and major updates | Developing financial planning | Managing team growth | Documentation |
Operational maintenance | Feature development | Customer communication | |
Adapting to organizational/environment change | |||
Paying back technical debt | |||
Certification / Compliance work | |||
Monitoring |
Before the last horizon
At some point every service will come to the end of its life. The thing we aim for is that we can control when this happens.
There are many of reasons why we might not be able to control our move to Horizon 0. If our transition to Horizon 1 - or even Horizon 2 - was not controlled, we probably have little control over the transition to Horizon 0. We probably won't get the value (here I go again...) we expected out of the service, and we may wonder why, and have to shut it down. We may not have allocated enough resources to run the service, which makes it unfeasible to run in production, or may artificially limit scaling.
If the team changes between Horizon 2 and Horizon 1, this may kill most of the development of the service. The new team will not have had the design discussions, they don't know the architecture as well, they don't know important items which were postponed, etc. They also have less ownership of both the code and the service.
And if at any point of the service lifecycle we stop the development of the service, we're in Horizon 0 very quickly. If we think that the service development is completed after Horizon 2, and stop all the development, we basically skip Horizon 1 completely, and jump directly to Horizon 0. We may still extract some value out of the service, but we basically can't sustain that, and lose the whole benefit of developing this service in the first place.
Horizon 0 - Sunset
I claim that whenever you stop active development of the service (for any reason), you are in Horizon 0 whether you would want it or not. Every running service accrues technical debt, and requires constant development to pay it off. If you are left too far behind, it will be hard to ever catch up. The best thing you can do is hope to milk the service a bit more, before shutting it down. I would not confuse this with the "Extracting" phase on the SAFe Horizon model. That happens in Horizon 1.
The world also changes around us. If we don't apply development to our services, we can't adapt to changes, and our services become obsolete.
We can also, of course, gracefully enter Horizon 0, by a decision to replace, or deprecate the service.
The main purpose of this horizon
To deprecate the service in a controlled fashion.
What do we do in this horizon?
I must admit that I have less answers in this horizon. That said, having a plan for the complete shut down of the service is important, and may be legally required. If we had any personal data in the service, and are bound by GDPR, we have a responsibility of making sure that data gets properly removed.
If we plan the service deprecation well, it avoids nasty surprises later, when we realize that we left stale administrative accounts on systems, or kept resources running that incur - maybe significant - cost, or had bad communications towards customers, who were taken by surprise.
Some of the work done in this horizon:
Operations | Development | Team/personal development | Communications |
---|---|---|---|
Incident response | Financial planning | Team dissolution processes | Stakeholder communications |
CVE tracking/response | Customer communication, expectation setting | ||
Customer tickets | Documentation | ||
Tranferring workloads | |||
Deprecating systems |
Summary
In the next post, I'll go deeper into running teams in Horizon 1 (and 2, maybe 3). I hope that a lot of our experiences can be generalized.
Geek. Product Owner @CSCfi