Azure Update Management - Part 0 - Introduction
Hello there, the goal of this serie is to describe a real world implementation of Azure Update Management. This service was designed to update any machine in your infrastructure, whether they are hosted on Azure or elsewhere, provided your OSs are technically supported.
In order for every reader to understand and find answers to their need, to I’ll try to give a comprehensive feedback from my experience, as well as sharing tips about design and architecture, automation, effectivement, troubleshooting issues, and so on!
Have a nice reading.
Plan
- Part 0 - Introduction (you’re here)
- Part 1 - Architecture
- Part 2 - Azure Policy
- Part 3 - Azure ARC
- Part 4 - Log Analytics agents
- Part 5 - Automation accounts
- Part 6 - Monitoring
- Part 7 - Security patches on Azure ARC
- Part 8 - Security patches on CentOS machines
Introduction
I worked for the past year as an Operational Security Engineer on Azure Update Management. Since my client was a big international retail company, their IT environment was heterogeneous, with many and various constraints that made the work more complicated. Since I spent a lot of time in this project, and because I face complex use cases, I decided to share my experience and maybe save you some time if you also try to setup the solution.
Of course, I will present the solution I presented, and you may work on a different architecture, but I hope that many things could be used in any context.
Scenario
This scenario may differ from your use cases, but keep in mind that this is a real world scenario! Thus, you may find interesting tips and information to build a service that fits your needs. Because it is an overview, you will find a lot of information in this section, and you may not have a clear picture of how to set up all of this. Don’t worry, technical stuff will be described in the next posts.
Update policy
It is important that you define an update policy in the early stage of the project. In my case, the update policy is the following :
- We want to apply only security or critical patches. Here are the descriptions of each type, from the Microsoft documentation
- Critical updates - A widely released fix for a specific problem that addresses a critical, non-security-related bug.
- Security updates - An update that collects all the new security updates for a given month and for a given product, addressing security-related vulnerabilities. It’s distributed through Windows Server Update Services (WSUS), System Center Configuration Manager and Microsoft Update Catalog. Security vulnerabilities are rated by their severity. The severity rating is indicated in the Microsoft security bulletin as critical, important, moderate, or low. This Security-only update would be displayed under the title Security Only Quality Update when you download or install the update. It will be classified as an Important update.
- We want to apply those patches once a week. This can be considered being a high update frequency : the reason of this choice is that the security team required defined SLA to handle and mitigate vulnerabilities, based on their CVSS. By applying patches once a week, we ensure that critical vulnerabilities that need a patch will be patched within a week.
- For a vulnerability with CVSS > 10, remediate as soon as possible
- For a vulnerability with CVSS >= 9, remediate within 2 weeks
- For a vulnerability with CVSS >= 7, remediate within 3 months
- For a vulnerability with CVSS < 7, remediate within 6 months
-
We want to reboot only if needed. One of the major constraints regarding patches is the need to reboot the VM to apply patches, because servers may host critical business applications and must provide a service continuity. Altough security patches usually don’t require to reboot servers, it is still necessary to define maintenance schedules i.e. a timeframe when we can reboot the server. In our case the maintenance schedule will last 2 hours. If the machine can’t update during this timeframe, then the patching process is stopped.
- We want to update only backuped machines. In the case the applied updates have a side effect, we want to make sure that all the machines are backuped, so we can rollback any time. A concrete example on Windows Server 2016 is the C01A001D.
Note : it is important to keep in mind that update management is not the one and only solution to keep your VMs safe. Indeed, OS may no longer be supported, and vulnerabilities must sometimes be mitigated by configuring specific settings rather than updating the machine.
Environment
In this scenario, we assume we have about a thousand servers distributed across Azure and multiple other cloud or non-cloud environments (OCI, OVH, GCP, On-premise, etc.). Since Update management is an Azure service, you may think that this is easier to set up for Azure VM, and this is true.
- For Azure VMs, we only need to deploy an agent, the Log Analytics agent on the machine to collect its logs and know which updates are missing.
- For non Azure VMs, we first need to establish a link between the VM and Azure using the Azure ARC agent, so that it can be managed like any other resource from Azure. Once this is done, the we only also to deploy the Log Analytics agent, just like Azure VMs.
In terms of OSs, You can find a list of supported OS in the Microsoft documentation.
Windows OS | Linux OS |
---|---|
Windows Server 2019 | CentOS 6, 7, and 8 |
Windows Server 2016 | Oracle Linux 6.x, 7.x, 8x |
Windows Server 2012 R2 | Red Hat Enterprise 6, 7, and 8 |
Windows Server 2012 | SUSE Linux Enterprise Server 12, 15, and 15.1 |
Windows Server 2008 R2 | Ubuntu 14.04 LTS, 16.04 LTS, 18.04 LTS, and 20.04 LTS |