Hello there, the goal of this serie is to describe a real world implementation of Azure Update Management. This service was designed to update any machine in your infrastructure, whether they are hosted on Azure or elsewhere, provided your OSs are technically supported.
In order for every reader to understand and find answers to their need, to I’ll try to give a comprehensive feedback from my experience, as well as sharing tips about design and architecture, automation, effectivement, troubleshooting issues, and so on!
In the previous posts, we saw how to dynamically patch both Azure VMs and Azure ARC VMs. However, as mentionned before, there is an issue with CentOS VMs : we can’t apply security and critical patches only, which can be a serious problem in a production environment (e.g. service disruption after update). Let’s see why we encounter this issue, and of course how to bypass it!
Context
If you have CentOS machines, you probably faced issues when trying to apply security updates : a yum update may crash your applications because it does not apply security-only updates, while yum update --security won’t update any package because it relies on packages metadata, which are not set for CentOS packages. Alternatively, you could allow only security repositories, but this implies additional maintenance when installing non-security related stuff on your machines.
When using Azure Automation Update solution, the issue remains. If you take a look at the Microsoft documentation, you’ll see that Update Management classifies updates into three categories: Security, Critical or Others. This is fine, but you will also read that Unlike other distributions, CentOS does not have classification data available from the package manager. If you have CentOS machines configured to return security data for the following command, Update Management can patch based on classifications. The command to test is shown below, and does not return anything by default. This is a real problem because in order for this command to work, you need to add metadata to packages yourself, or usually pay for a service that does it.
When observing what I just described for the first time, I couldn’t believe it! There are not native nor simple solution to setup security updates on CentOS.
Architecture
A short word about the architecture : if you took a look at my previous posts, you may remember the architecture split by environment at the subscription level, and split by CSP at the resource group level. As you can see on the scheme below, we will have one script per CSP, because we didn’t want to have a single script to handle all of our Azure ARC VMs, especially because we wanted an automation account to have write permissions on itself only, and not on other automation accounts.
Solution
Of course, the reason I wrote this post is that I spent some time working on it, and I have a solution! To make it short, the solution is to periodically run a smart script, once a week, that you can deploy in an Azure Runbook. This script will create deployement schedules for each CentOS machine that needs security updates or critical updates. Simple right?
To be more specific, I actually wrote two very similar scripts : one for Azure VMs, the other for Azure ARC VMs. Why? Because writing a single script didn’t match my architecture, but you could merge them in a single script.
In any case, the script behavior remains the same and checks that all VMs match the two following criteria.
The machine must be a CentOS machine;
The machine must be tagged with a valid patch tag. If a VM doesn’t comply, it won’t be patched.
An additional word about tags : in my case, we defined a tag policy in order for machines to be updated each week on a specific schedule. Here is the tag pattern : ^CENTOS-[PQR]-(MON|TUE|WED|THU|FRI|SAT|SUN)-(03|12|22):00$.
[PQR] is for the environment : P for Production, Q for Qualif, R for Recette. Basically, if we run the script in a production environment, machines containing the P- prefix will be updated, the others being ignored.
(MON|TUE|WED|THU|FRI|SAT|SUN) is for the day of the week when the machine must be updated.
(03|12|22):00 is for the hour when to update the machine.
Solution for Azure ARC VMs
Step 1 - Iterate over all machines
First of all, we need to import a lot of Azure dependencies, because our script will use the Azure Python SDK to manage our machines. If you decide to run this code in a runbook, take a look at the Microsoft documentation to upload Python libraries. Additionally, you will find documentation references if you need additional information.
In this next piece of code, we load values from the Automation Account variables. In order to set these variables in automation accounts, check the Microsoft documentation. Here, variables are the current automation account name, its resource group and its subscription. I guess there may be some dynamic function to retrieve this values from the runbook, but we decided to explicitly declare them in variables.
Here, we create credentials using the DefaultAzureCredential() function, which will use the automation account managed identity to make the APIs calls. We will see later which permissions should be configured on the automation account. Then, we instanciate different clients to communicate with the Azure API.
This next piece of code shows two variables that will be used later in the code. DAYS will be used to find the next available schedule to update the machine. The regex will be used to evaluate machines tags: as seen earlier, the pattern differs, based on the production or non production environment.
Ok, here begins the interesting part : we iterate over all susbcriptions, and for each subscription, we iterate over all Azure VMs (there is a small variation for Azure ARC VMs, I’ll discuss this later). In this double loop, we ensure the patch tag is compliant with what we defined, and we also ensure that the VM is a CentOS VM. Note : when checking the OS, we need to check both custom OS and Azure-provided OS.
Step 2 - Retrieve all updates
We now have a list of machines to update, and we need to get all available updates for each machine. For this, we send a KQL query to the Log Analytics workspaces that collects updates information : we store this in the df variable.
Step 3 - Deploy or update deployment schedules
In this last part of the script, we iterate over our machines to update, and we check in our df variable if the current machine needs updates :
if it doesn’t, then we ignore it and directly go to the next loop iteration;
if it does, we calculate the next schedule when the machine should be updated, based on its patch tag, and then we instanciate the necessary objects that will create the deployment schedules in our automation account.
Permissions
In order for this script to work, the identity used to run the script must have the following permissions :
Virtual Machines contributor role on each subscription or resource group where you have CentOS VMs to update;
Log Analytics reader role on Log Analytics workspace that collect update data;
Automation contributor on the Automation Account you use for patch management.
If you run the script on Azure, you should assign these permissions using system-assigned managed identities, c.f. Microsoft documentation.
Run the script
Once you ran the script, it will deploy one deployment schedule per VM, as shown below. If you click on the deployment schedule, then go to Include/exclude updates, you will see the explicit list of all updates to be installed on the VM. When configuring a deployment schedule like this, Azure no longer use the yum install --security command, but it uses the yum install <package1> <package2> <packagen> command!
Solution for Azure VMs
The script behavior is almost the same for Azure ARC VMs, the only things that change are the objects properties, because we now work with Microsoft.HybridCompute/machines resource type rather than Microsoft.Compute/virtualMachines. Here is the code.
Step 1 - Iterate over all machines
As we did for the Azure VM version, we still need to import a lot of Azure dependencies, because our script will use the Azure Python SDK to manage our machines. If you decide to run this code in a runbook, take a look at the Microsoft documentation to upload Python libraries accordingly. Additionally, you will find documentation references if you need additional information.
In this next piece of code, we load variables values from the Automation Account variable. In order to set these variables in automation accounts, check the Microsoft documentation. Here, variables are the current automation account name, its resource group and its subscription. I guess there may be some dynamic function to retrieve this values from the runbook, but we decided to explicitly declare them in variables.
Here, we create credentials using the DefaultAzureCredential() function, which will use the automation account managed identity to make the APIs calls. We will see later which permissions should be configured on the automation account. Then, we instanciate different clients to communicate with the Azure API.
This next piece of code shows two variables that will be used later in the code. DAYS will be used to find the next available schedule to update the machine. The regex will be used to evaluate machines tags: as seen earlier, the pattern differs, based on the production or non production environment.
Here begins the interesting part : in my architecture, all Azure ARC VMs coming from OVH are deployed in an OVH resource group. For this reason, the script iterates over all the machine belonging to a single resource group. In this simple loop, we ensure the patch tag is compliant with what we defined, and we also ensure that the VM is a CentOS VM.
Step 2 - Retrieve all updates
We now have a list of machines to update, and we need to get all available updates for each machine. For this, we send a KQL query to the Log Analytics workspaces that collects updates information : we store this in the df variable.
Step 3 - Deploy or update deployment schedules
In this last part of the script, we iterate over our machines to update, and we check in our df variable if the current machine needs updates :
if it doesn’t, then we ignore it and directly go to the next loop iteration.
if it does, we calculate the next schedule when the machine should be updated, based on its patch tag, and then we instanciate the necessary objects that will create the deployment schedules in our automation account.
Permissions
In order for this script to work, the identity used to run the script must have the following permissions :
Azure Connected Machine Resource Administrator role on each subscription or resource group where you have CentOS VMs to update;
Log Analytics reader role on Log Analytics workspace that collect update data;
Automation contributor on the Automation Account you use for patch management.
If you run the script on Azure, you should assign these permissions using system-assigned managed identities, c.f. Microsoft documentation.
Run the script
Once you ran the script, it will deploy one deployment schedule per VM, as shown below. If you click on the deployment schedule, then go to Include/exclude updates, you will see the explicit list of all updates to be installed on the VM. When configuring a deployment schedule like this, Azure no longer use the yum install --security command, but it uses the yum install <package1> <package2> <packagen> command!
Deployment schedule lifecycle
What I didn’t tell you is that those deployment schedules are run only once, because they contains specific packages to update that probably won’t need to be patched on next week. Once the deployment schedule is executed, you will see that Next run time is set to None. You have two options.
Option 1 - Clean deployment schedules
In order to clean deployment schedules to remove those that won’t execute anymore, you can use that simple script:
Option 2 - Clean deployment schedules
The other option is simply to let your deployment schedules as is. Why? Because the next time you will run the CentOS patch script, it will deploy new deployments schedules, that will simply update existing deployment schedules by setting a new set of packages to patch, and by setting a new start time.
I hope this will help you patch your CentOS machines and leverage all the Azure Automation Updates capabilities. Thanks for reading!