One of the more common things we do in the Cloud Operations team at Red Hat Mobile is facilitate changes to environments hosted on the Red Hat Mobile Application Platform, either on behalf of our customers or for our own internal operational purposes.
These are normally done within what is commonly known as a “Change Window”, which is a predetermined period of time during which specific changes are allowed to be made to a system, in the knowledge that fewer people will be using the system or where some level of service impact (or diminished performance) has been deemed acceptable by the business owner.
We have used a number of different models for managing Change Windows over the years, but one of our favourite approaches (that adapts equally well to both simple and complex changes and that is easy for our customers and internal stakeholders to understand) is this 5-phase model.
Planning
The planning phase is basically about identifying (and documenting) a solid plan that will serve as a rule book for all the other elements in this model (below). In addition to specifying the (technical) steps required to make (and validate) the necessary changes, your plan should also include additional (non-technical) information that you will most likely need to share externally so as to set the appropriate expectations with the affected users. This includes specifying:
- What changes are you planning to make?
- When are you proposing to make them?
- How long will they take to complete?
- What will the impact (if any) be on the users of the system before, during and after the changes are made?
- Is there anything your customers/users need to do beforehand or afterwards?
- Why are you making these changes?
Your planning phase should also include a provision for formally communicating the key elements of your plan (above) with those interested in (or affected by) it.
Commencement
The commencement phase is about executing on the elements of your plan that can be done ahead of time (i.e. in the hours or minutes before the Change Window formally opens) but that do not involve any actual changes.
Examples include:
- Capturing the current state of the system (before it is changed) so that you can verify the system has returned to this state afterwards.
- Issuing a final communication notice to your users, confirming that the Change Window is still going ahead.
- Configuring any monitoring dashboards so that the progress (and impact) of the changes can be analysed in real time once they commence.
The commencement phase can be a very effective way to maximise the time available during the formal Change Window itself, giving you extra time to test your changes or handle any unexpected issues that arise.
Execution
The execution phase is where the planned changes actually take place. Ideally, this will involve iterating through a predefined set of commands (or steps) in accordance with your plan.
One important mantra which has stood us in good stead here over the years is, “stick to the plan”. By this we mean, within reason, try not to get distracted by minor variations in system responses which could consume valuable time, to the point where you run out of time and have to abandon (or roll back) your changes.
It’s also strongly recommended that the input to (and outputs from) all commands/steps are recorded for reference. This data can be invaluable later on if there is a delayed impact on the system and steps need to be retraced.
Validation
Again this phase should be about iterating through a predefined set of verification steps that may include examining various monitoring dashboards, running automated acceptance/regression test tooling, all in accordance with two very basic principles:
- Have the changes achieved what they were designed to (i.e. does the new functionality work)?
- Have there been any unintended consequences of the changes (i.e. does all the old functionality still work, or have you broken something)?
Again, it’s very important to capture evidence of the outcomes from validation phase, both as evidence to confirm the changes have been completed successfully and that the system has returned to it’s original state.
All Clear
This phase is very closely linked to the validation phase but is slightly more abstract (and usually less technical) in nature. It’s primary purpose is to act as a higher-level checklist of tasks that must to be completed, in order that the final, formal communication to the customer (or users) can be sent, confirming that the work has been completed and verified successfully.