Automating upgrade of Cisco Switches using Ansible

What's a Challenge ???

Upgrading 1–15 switches can be done manually. But what about 250+ switches including different models and versions? it is challenging and time-consuming. A simple humane error can cause troubleshooting for many hours. This post will explain the procedure and steps to automate this process using Ansible. At the end of this implementation, the following benefits can be achieved,

  • Automated verifications

Build your inventory first…

Before everything, an inventory supported for Ansible should be developed for the infrastructure. Here following YAML template is used to build the inventory.

all:
children:
switches:
children:
Prod_Sites:
Prod_Sites:
children:
ABC:
DEF:
GHI:
vars:
ansible_connection: network_cli
ansible_network_os: ios
ansible_user: XXXXXX
ansible_password: XXXXXX
ABC:
hosts:
SITE - ABC | SW - ABC-SW1 | IP - 10.10.10.1:
ansible_host: '10.10.10.1'
SITE - ABC | SW - ABC-SW2 | IP - 10.10.10.2:
ansible_host: '10.10.10.2'
DEF:
hosts:
SITE - DEF | SW - DEF-SW1 | IP - 10.10.20.1:
ansible_host: '10.10.20.1'
SITE - DEF | SW - DEF-SW2 | IP - 10.10.20.2:
ansible_host: '10.10.20.2'
GHI:
hosts:
SITE - GHI | SW - GHI-SW1 | IP - 10.10.30.1:
ansible_host: '10.10.30.1'
SITE - GHI | SW - GHI-SW2 | IP - 10.10.30.2:
ansible_host: '10.10.30.2'

First, a branch is created to define the production switches.

children:
switches:
children:
Prod_Sites:

Then credentials and variables common for all switches have been defined under Prod_SitesProd_Sites includes many sites as shown below — ABC, DEF — these are example site codes… In our case, we are having 40+ sites.

Prod_Sites:
children:
ABC:
DEF:
GHI:
vars:
ansible_connection: network_cli
ansible_network_os: ios
ansible_user: XXXXXX
ansible_password: XXXXXX

The following variables have been defined here in addition to the SSH credentials.

Connection Method — ansible_connection

OS of the connected host — ansible_network_os

Next, the host IPs of each switch have been defined. Site code, switch hostname and IP have been added for easy reference.

ABC:
hosts:
SITE - ABC | SW - ABC-SW1 | IP - 10.10.10.1:
ansible_host: '10.10.10.1'
SITE - ABC | SW - ABC-SW2 | IP - 10.10.10.2:
ansible_host: '10.10.10.2'

If you only have 10–20 switches, yes you can easily create this inventory in YAML format. But what about if you have 250+ switches? or 1000+? this is not practical. There is an easy way to populate your inventory YAML file using another simple ansible script. Read this blog post. It simply explains how to convert the infrastructure CSV inventory into YAML.

Folder Structure…

Now inventory has been populated for all sites with 250+ switches. Let’s understand the folder structure, this is my upgrade folder.

There is a sub-folder called vars. It has a set of YAML files renamed by each switch model. In our case, there are around 15+ different models and there is a different YAML file in vars folder for each model. The content of each YAML file will be explained in next steps.

Next… upgrade script…

Let’s start the upgrade script…

Create the Backup Folder

These tasks will be run on localhost.

Get ansible date/time facts

- hosts: localhost  tasks:
- name: Get ansible date/time facts
setup:
filter: "ansible_date_time"
gather_subset: "!all"

Here, date and time information have been collected from the localhost. In the next step, it is saved as a variable.

Store DTG as fact

- name: Store DTG as fact
set_fact:
DTG: "{{ ansible_date_time.date }}"

Create a config backup folder

Next a folder is created to backup the switch configurations. The folder will be named by the current date.

- name: Create a config backup folder
file:
path: /mnt/c/Users/Public/ios-upgrade-automation/backups/{{hostvars.localhost.DTG}}
state: directory
run_once: true

Configuration Backup

These tasks will be run on each cisco switch.

- name: Main Play for Cisco IOS Upgrade
hosts: # Enter Site Code Here
serial: 1
gather_facts: false
connection: local
vars_files:
- ["vars/{{ ansible_net_model }}.yml"]

There are a few key parameters to be defined here.

hosts:

  • Site code should be defined. As per our inventory, it can be ABC, DEF or GHI. The script will only run on mentioned site.

serial:

  • This is used to serialize the code execution. If this is removed, the script will be run on all switches at the site in parallels. If 1 is defined here, automation will run on one switch first. Once it is successfully upgraded, it will run on the next switch in the inventory.

var files:

vars_files:    
- ["vars/{{ ansible_net_model }}.yml"]
  • As mentioned in a previous explanation, there are a few different models in the infrastructure. IOS file, upgrade version and MD5 value of IOS file vary depending on the switch model.
---
ios_version: 15.XXXX
ios_file: c2960s-universalk9XXXXX.bin
ios_md5: ea604d030b378b6c5c3xxxxxxxxxxx
ios_size: 20000

Backup switch running-config

Next, a set of tasks will run inside a task block. The first subtask will collect Cisco IOS facts. Then it backup switch running configuration using ios_command and register it to a variable called config. After that, the registered output is saved to a file in the backup folder. The file is renamed using the IP address of the switch and date.

tasks:# Collect IOS facts

- name: Collect IOS facts
ios_facts:
# Backup switch running config - name: Backup switch running config
ios_command:
commands: show run
register: config
- name: Save output to /mnt/c/Users/Public/ios-upgrade-automation/backups/
copy:
content: "{{config.stdout[0]}}"
dest: "/mnt/c/Users/Public/ios-upgrade-automation/backups/{{hostvars.localhost.DTG}}/{{ ansible_host }}-{{hostvars.localhost.DTG}}-runningconfig.txt"
tags:
- runconfig

Backup switch startup-config

Next, it saves startup-config. Once it is done, the config will be saved using ios_config and save_when.

- name: Backup switch startup config  
ios_command:
commands: show start
register: config
- name: Save output to /mnt/c/Users/Public/ios-upgrade-automation/backups/
copy:
content: "{{config.stdout[0]}}"
dest: "/mnt/c/Users/Public/ios-upgrade-automation/backups/{{hostvars.localhost.DTG}}/{{ ansible_host }}-{{hostvars.localhost.DTG}}-startupconfig.txt"
tags:
- startconfig
# Save the running config - name: Save running config
ios_config:
save_when: always
vars:
ansible_command_timeout: 120

Next, we come to our first verification. The current status of the boot variable is checked using the ‘show boot | i BOOT’ command and registered the output to a variable called bootvar. This will be later used to exclude switches where the boot variable is already set to new IOS file.

Check boot path

- name: Check boot path
ios_command:
commands: 'show boot | i BOOT'
register: bootvar
tags:
- bootvar

This is the output when this command is run on switch CLI.

X-XXX-01-SW1#show boot | i BOOT
BOOT path-list : flash:cxxxx-universalk9-mz.xxxx.xx.bin;flash2:cxxxx-universalk9-mz.xxxx.xx.bin

Identify Stack Switches

Next, the main upgrade block is started which checks free space, copy IOS, set the boot variable and upgrade. First, the output of “show file systems | include flash” is saved to a variable called flash_values. This will be used in the next steps of the script to evaluate some logic to identify stack switches.

Check availability of flashes

- name: Main Block - Copy/Verify/Reboot when switch IOS version is non-compliant
block:
- name: Check availability of flashs
ios_command:
commands: "show file systems | include flash"
register: flash_values

Here is the CLI output on 3 switch stacks,

X-XXX-01-SW1#show file systems | include flash
* 122185728 52075008 flash rw flash: flash1:
122185728 46614528 flash rw flash2:
122185728 46624768 flash rw flash3:

Skip if IOS is already copied

In our environment, there are single switches, 2 switch stacks, 3 switch stacks and 4 switch stacks. Before IOS is copied to each available flash, it should be verified that IOS is not already available there. To assure that, first, ‘show flash: | include {{ ios_file }}’ command is run and register the output to a variable.

There are 4 tasks here. Each task checks a specific flash. First one checks flash:, next flash1:, then flash2: …etc.

Check if IOS is already present on the flash

- name: Check if IOS is already present on the flash
ios_command:
commands: 'show flash: | include {{ ios_file }}'
register: ios_in_flash0
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is already present on the flash1
ios_command:
commands: 'show flash1: | include {{ ios_file }}'
register: ios_in_flash1
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash1:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is already present on the flash2
ios_command:
commands: 'show flash2: | include {{ ios_file }}'
register: ios_in_flash2
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash2:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is already present on the flash3
ios_command:
commands: 'show flash3: | include {{ ios_file }}'
register: ios_in_flash3
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash3:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is already present on the flash4
ios_command:
commands: 'show flash4: | include {{ ios_file }}'
register: ios_in_flash4
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash4:" in flash_values.stdout[0]'
tags:
- availabilityofios

There is a when condition added under each task. Each task will only be executed when the conditions are satisfied.

when:   
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash:" in flash_values.stdout[0]'

In a previous step, output of ‘show boot | i BOOT’ command was registered to a variable called bootvar.

  • ‘“{{ ios_file }}” not in bootvar.stdout[0]’ — This condition make sure that this task is run only when new IOS file name is not in the output of bootvar. Therefore, if boot variable is already set to the new IOS, this tasks will be skipped.

Above two conditions will be used in many tasks in this script. Next, we start checking free space before copying the IOS.

Check Free Space

Delete the current IOS if current space is not sufficient

- name: Delete the current IOS if current space is not sufficient 
cli_command:
command: "delete /force {{ ansible_net_image }}"
when:
- ansible_net_filesystems_info['flash:']['spacefree_kb'] < {{ ios_size }}
- '"{{ ios_file }}" not in ios_in_flash0.stdout[0]'

First, the existing IOS will be deleted if there is not sufficient space to copy the IOS file. A parameter collected from IOS facts collections is used here to build the logic — ansible_net_filesystems_info[‘flash:’][‘spacefree_kb’] — is compared against the ios_size variable in var file. When condition assures that IOS is deleted only when sufficient space is not there.

- name: Assert that there is enough flash space for upload
assert:
that:
- ansible_net_filesystems_info['flash:']['spacefree_kb'] > {{ ios_size }}
msg: "There is not enough space left on the device's flash...stopping the upgrade!!!"
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"{{ ios_file }}" not in ios_in_flash0.stdout[0]'
tags:
- enoughflashspace

Next, ansible module assert is used to assure that there is enough space to copy the new IOS. If this is not assured, the script will break, the upgrade will stop and move to the next host.

Once the space check is successfully completed, it starts copying IOS to all available flashes using FTP.

Copy IOS to flash

Copy new IOS to target device flash

- name: Copy new IOS to target device flash - This could take up to 4 minutes 
block:
- name: Copy new IOS to target device flash - This could take up to 4 minutes
ios_command:
commands:
- command: copy ftp://10.X.X.X/{{ ios_file }} flash:/{{ ios_file }}
prompt: '{{ ios_file }}'
answer: "\r"
vars:
ansible_command_timeout: 3600
when:
- '"{{ ios_file }}" not in ios_in_flash0.stdout[0]'
when:
- '"flash:" in flash_values.stdout[0]'
- name: Copy new IOS to target device flash1 - This could take up to 4 minutes
block:
- name: Copy new IOS to target device flash1 - This could take up to 4 minutes
ios_command:
commands:
- command: copy ftp://10.X.X.X/{{ ios_file }} flash1:/{{ ios_file }}
prompt: '{{ ios_file }}'
answer: "\r"
vars:
ansible_command_timeout: 3600
when:
- '"{{ ios_file }}" not in ios_in_flash1.stdout[0]'
when:
- '"flash1:" in flash_values.stdout[0]'
- name: Copy new IOS to target device flash2 - This could take up to 4 minutes
block:
- name: Copy new IOS to target device flash2 - This could take up to 4 minutes
ios_command:
commands:
- command: copy ftp://10.X.X.X/{{ ios_file }} flash2:/{{ ios_file }}
prompt: '{{ ios_file }}'
answer: "\r"
vars:
ansible_command_timeout: 3600
when:
- '"{{ ios_file }}" not in ios_in_flash2.stdout[0]'
when:
- '"flash2:" in flash_values.stdout[0]'
- name: Copy new IOS to target device flash3 - This could take up to 4 minutes
block:
- name: Copy new IOS to target device flash3 - This could take up to 4 minutes
ios_command:
commands:
- command: copy ftp://10.X.X.X/{{ ios_file }} flash3:/{{ ios_file }}
prompt: '{{ ios_file }}'
answer: "\r"
vars:
ansible_command_timeout: 3600
when:
- '"{{ ios_file }}" not in ios_in_flash3.stdout[0]'
when:
- '"flash3:" in flash_values.stdout[0]'
- name: Copy new IOS to target device flash4 - This could take up to 4 minutes
block:
- name: Copy new IOS to target device flash4 - This could take up to 4 minutes
ios_command:
commands:
- command: copy ftp://10.X.X.X/{{ ios_file }} flash4:/{{ ios_file }}
prompt: '{{ ios_file }}'
answer: "\r"
vars:
ansible_command_timeout: 3600
when:
- '"{{ ios_file }}" not in ios_in_flash4.stdout[0]'
when:
- '"flash4:" in flash_values.stdout[0]'

Once copying is completed, ‘show flash: | include {{ ios_file }}’ command is again run on each available flash. Then, it uses an ansible assure module to assure that new IOS is now available in each flash.

Check if IOS is now present on the flash

- name: Check if IOS is now present on the flash
ios_command:
commands: 'show flash: | include {{ ios_file }}'
register: ios_in_flash0
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is now present on the flash1
ios_command:
commands: 'show flash1: | include {{ ios_file }}'
register: ios_in_flash1
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash1:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is now present on the flash2
ios_command:
commands: 'show flash2: | include {{ ios_file }}'
register: ios_in_flash2
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash2:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is now present on the flash3
ios_command:
commands: 'show flash3: | include {{ ios_file }}'
register: ios_in_flash3
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash3:" in flash_values.stdout[0]'
tags:
- availabilityofios
- name: Check if IOS is now present on the flash4
ios_command:
commands: 'show flash4: | include {{ ios_file }}'
register: ios_in_flash4
when:
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- '"flash4:" in flash_values.stdout[0]'
tags:
- availabilityofios

Assert that the new IOS is now available in flash

- name: Assert that the new IOS is now available in flash
assert:
that:
- '"{{ ios_file }}" in ios_in_flash0.stdout[0]'
msg: "New ios is not available in flash...stopping the upgrade!!!"
when:
- '"flash:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Assert that the new IOS is now available in flash1
assert:
that:
- '"{{ ios_file }}" in ios_in_flash1.stdout[0]'
msg: "New ios is not available in flash1...stopping the upgrade!!!"
when:
- '"flash1:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Assert that the new IOS is now available in flash2
assert:
that:
- '"{{ ios_file }}" in ios_in_flash2.stdout[0]'
msg: "New ios is not available in flash2...stopping the upgrade!!!"
when:
- '"flash2:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Assert that the new IOS is now available in flash3
assert:
that:
- '"{{ ios_file }}" in ios_in_flash3.stdout[0]'
msg: "New ios is not available in flash3...stopping the upgrade!!!"
when:
- '"flash3:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Assert that the new IOS is now available in flash4
assert:
that:
- '"{{ ios_file }}" in ios_in_flash4.stdout[0]'
msg: "New ios is not available in flash3...stopping the upgrade!!!"
when:
- '"flash4:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'

Check & Verify MD5

Next, MD5 verification will happen. First, ios_command module is used to run “verify /md5 flash:{{ ios_file }}” command on each available flash and register the output to a different variable.

Check MD5 value of copied IOS in flash

- name: Check MD5 value of copied IOS in flash
ios_command:
commands: "verify /md5 flash:{{ ios_file }}"
vars:
ansible_command_timeout: 3600
register: var_ios_md5_final0
when:
- '"flash:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Check MD5 value of copied IOS in flash1
ios_command:
commands: "verify /md5 flash1:{{ ios_file }}"
vars:
ansible_command_timeout: 3600
register: var_ios_md5_final1
when:
- '"flash1:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Check MD5 value of copied IOS in flash2
ios_command:
commands: "verify /md5 flash2:{{ ios_file }}"
vars:
ansible_command_timeout: 3600
register: var_ios_md5_final2
when:
- '"flash2:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Check MD5 value of copied IOS in flash3
ios_command:
commands: "verify /md5 flash3:{{ ios_file }}"
vars:
ansible_command_timeout: 3600
register: var_ios_md5_final3
when:
- '"flash3:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Check MD5 value of copied IOS in flash4
ios_command:
commands: "verify /md5 flash4:{{ ios_file }}"
vars:
ansible_command_timeout: 3600
register: var_ios_md5_final4
when:
- '"flash4:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'

Then the registered MD5 values have been compared against the MD5 value mentioned in the variable file. If the MD5 value is not matching, the script breaks and the upgrade fails with the “MD5 is not matching with original…stopping the upgrade!!!” error message.

Verify MD5 is matching with the original IOS

- name: Verify MD5 is matching with original IOS - flash
assert:
that:
- {{ ios_md5 }} == var_ios_md5_final0.stdout[0].split(' = ')[1]
msg: "MD5 is not matching with original...stopping the upgrade!!!"
when:
- '"flash:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Verify MD5 is matching with original IOS - flash1
assert:
that:
- {{ ios_md5 }} == var_ios_md5_final1.stdout[0].split(' = ')[1]
msg: "MD5 is not matching with original...stopping the upgrade!!!"
when:
- '"flash1:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Verify MD5 is matching with original IOS - flash2
assert:
that:
- {{ ios_md5 }} == var_ios_md5_final2.stdout[0].split(' = ')[1]
msg: "MD5 is not matching with original...stopping the upgrade!!!"
when:
- '"flash2:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Verify MD5 is matching with original IOS - flash3
assert:
that:
- {{ ios_md5 }} == var_ios_md5_final3.stdout[0].split(' = ')[1]
msg: "MD5 is not matching with original...stopping the upgrade!!!"
when:
- '"flash3:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'
- name: Verify MD5 is matching with original IOS - flash3
assert:
that:
- {{ ios_md5 }} == var_ios_md5_final4.stdout[0].split(' = ')[1]
msg: "MD5 is not matching with original...stopping the upgrade!!!"
when:
- '"flash4:" in flash_values.stdout[0]'
- '"{{ ios_file }}" not in bootvar.stdout[0]'

Once all MD5 verifications have been passed, switch boot variable is set to the new IOS file.

Change Boot Variable

Change boot variable to new image

- name: Change boot variable to new image 
ios_config:
commands:
- "boot system flash:{{ ios_file }}"
save_when: always
vars:
ansible_command_timeout: 120
when:
- ansible_net_version != "{{ ios_version }}"
- '"{{ ios_file }}" not in bootvar.stdout[0]'

Once it is configured and saved, the following verification steps are run to assure that the boot variable has been accurately configured.

Assert that the boot path is set to the new IOS

- name: Check Boot path
ios_command:
commands: 'show boot | i BOOT'
register: bootvar
when:
- ansible_net_version != "{{ ios_version }}"
tags:
- bootvar
- name: Assert that the boot path is set to the new IOS
assert:
that:
- '"{{ ios_file }}" in bootvar.stdout[0]'
msg: "Boot path is not set to the new image...stopping the upgrade!!!"
when:
- ansible_net_version != "{{ ios_version }}"
tags:
- bootvar

Reboot the Switch

Then the switch is rebooted. Localhost will wait and check until switch comes up again — it usually takes around 5–7 mins.

Reload the Device

- name: Reload the Device 
cli_command:
command: reload
prompt:
- confirm
answer:
- 'y'
when:
- ansible_net_version != "{{ ios_version }}"
- name: Reset connection to the switch
meta: reset_connection
- name: Wait for device to come back online
wait_for:
host: "{{ ansible_host }}"
port: 22
delay: 240
timeout: 1800
delegate_to: localhost
connection: local

Final Verification

Finally, once the switch is live, again ios_facts are collected and verified that the new version is the expected version. For documentation purposes, a debug output has been added to the script to print the details of an upgraded switch on the console.

Assure that version is correct

- name: Collect IOS facts
ios_facts:
- name: Check image version after the reboot
ios_facts:
- name: Assure that version is correct
assert:
that:
- ios_version == ansible_net_version
- debug:
msg:
- "SOFTWARE UPGRADE IS SUCCESSFULLY COMPLETED!!!"
- "HOSTNAME - {{ ansible_net_hostname }}"
- "IP ADDRESS - {{ ansible_net_all_ipv4_addresses }}"
- "MODEL - {{ ansible_net_model }}"
- "SERIAL NUMBER - {{ ansible_net_stacked_serialnums }}"
- "IOS VERSION - {{ ansible_net_version }}"
- "IMAGE USED - {{ ansible_net_image }}"

All done!!!

A switch is successfully upgraded now. Automation will move to the next switch in the inventory now.

Thanks !