Cumulus Switch Upgrade Guide

Prev Next

Summary

This document outlines the steps for upgrading a Cumulus switch. Follow these steps carefully to ensure a smooth upgrade process.

For a graceful process, we shut down the server's side ports and then back up the configuration with the ports down. This will ensure the ports remain down after we restore the switch configuration. when the restore was completed successfully, we bring the ports back up and check connectivity.

Switches with 5.10 or greater Cumulus OS

ℹ️ Info

Cumulus 5.10 introduces a greatly simplified upgrade process.

  1. Validate that there is enough space on VAR (use df -h) (you need 3GB free on /var)

  2. Copy update image version to the switch

  3. Copy from location to /var/images

    sudo cp cumulus-linux-5.12.1-mlx-amd64.bin /var/images/cumulus-linux-5.12.1-mlx-amd64.bin
  4. Upgrade the switch

    nv action install system image files cumulus-linux-5.12.1-mlx-amd64.bin
  5. Set the second partition

    nv action boot-next system image other
  6. Shutdown Ports to nodes or BGP neighbors

It's easier on the SPDK (CNode to DNode communication) if the node-facing ports go down before the switch maintenance and are brought back up after everything is stable without jittering.

FOR LEAF SWITCHES: Deactivate the node-facing ports

nv show interface
nv set interface swp1-18,33-37 link state down
cumulus@leaf-1-1:mgmt:~$ nv config diff
- set:
    interface:
      swp1-18,33-37:
        link:
          state:
            down: {}
cumulus@leaf-1-1:mgmt:~$ nv config apply
applied [rev_id: 6]

FOR SPINE SWITCHES: Deactivate the BGP ports

This command disables the BGP ports at the protocol layer, which is a cleaner break than taking the swp ports down or rebooting. There is a 3 second keep-alive timer and 9 second hold timer which we avoid by doing this.

nv show interface
nv set vrf default router bgp neighbor swp1-28 shutdown on
nv conf diff
nv conf apply
sudo vtysh -c "show ip bgp summary"  (Make sure BGP neighbors show idle for State)
  1. Reboot (sudo reboot)

  2. Bring ports up and run health checks

FOR LEAF SWITCHES: Enable Ports to CNodes and DNodes

Take the ports from Upgrading Cumulus and apply them

cumulus@cumulus:mgmt:~$ nv set interface swp1-18,swp33-37 link state up
cumulus@cumulus:mgmt:~$ nv config diff
- unset:
    interface:
      swp1-18,33-37:
        link:
          state:
            down:
- set:
    interface:
      swp1-18,33-37:
        link:
          state:
            up: {}
cumulus@cumulus:mgmt:~$ nv config apply
cumulus@cumulus:mgmt:~$ nv config save

FOR SPINE SWITCHES: Enable Ports for BGP Traffic on Spine Switches

nv unset vrf default router bgp neighbor swp1-28 shutdown
nv conf diff
nv conf apply
nv conf save
sudo vtysh -c "show ip bgp summary" (State should now a "1" that it learned the neighbor)

FOR SPINE SWITCHES: Validate BGP (MANDATORY!)

 Log in to vtysh

sudo vtysh -c "show ip bgp summary"

Validate BGP states.

cumulus@vast-test-lab-31-leaf02:mgmt:~$ sudo vtysh -c "show ip bgp summary"
[sudo] password for cumulus: 
IPv4 Unicast Summary (VRF default):
BGP router identifier 10.10.10.20, local AS number 65290 vrf-id 0
BGP table version 73
RIB entries 39, using 8736 bytes of memory
Peers 13, using 258 KiB of memory
Peer groups 1, using 64 bytes of memory
Neighbor                            V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
peerlink.4094                       4          0         0         0        0    0    0    never         Idle        0 N/A
vast-test-lab-16-spine01(swp49) 4      65298   1193147   1192582        0    0    0 1d17h12m           13       20 N/A
vast-test-lab-16-spine01(swp50) 4      65298   1193147   1192616        0    0    0 1d17h12m           13       20 N/A
vast-test-lab-19-spine02(swp51) 4      65298   1190113   1189748        0    0    0 01:19:28           14       20 N/A
vast-test-lab-19-spine02(swp52) 4      65298   1190113   1189748        0    0    0 01:19:28           14       20 N/A
vast-test-lab-22-spine03(swp53) 4      65298   1182361   1183023        0    0    0 02:29:16           12       20 N/A
vast-test-lab-22-spine03(swp54) 4      65298   1182322   1183001        0    0    0 02:29:16           12       20 N/A
vast-test-lab-25-spine04(swp55) 4      65298   1190126   1189799        0    0    0 00:55:53           14       20 N/A
vast-test-lab-25-spine04(swp56) 4      65298   1190126   1189794        0    0    0 00:55:53           14       20 N/A
vast-test-lab-28-spine05(swp57) 4      65298   1187436   1187191        0    0    0 01:53:39           14       20 N/A
vast-test-lab-28-spine05(swp58) 4      65298   1187436   1187192        0    0    0 01:53:38           14       20 N/A
vast-test-lab-31-spine06(swp59) 4      65298   1190515   1190301        0    0    0 00:33:13           14       20 N/A
vast-test-lab-31-spine06(swp60) 4      65298   1190515   1190304        0    0    0 00:33:13           14       20 N/A
Total number of neighbors 13

ℹ️ Info

Don’t forget to exit if using “sudo vtysh”!

vast-test-lf01# exit
  1. Proceed with the next switch


Procedure docs from nvidia

https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-512/Installation-Management/Upgrading-Cumulus-Linux/#image-upgrade


Pre 5.10

1. Shutdown Server-Facing Ports

List all interfaces

nv show interface | grep up | grep swp

Filter out switch-facing ports

Spine is base on switch name

nv show interface | grep up | grep swp | grep -v Spine

Deactivate server-facing ports

for port in `nv show interface | grep up | grep swp | grep -v Spine | awk '{print $1}'`; do
  nv set interface $port link state down;
done

Verify configuration change

nv config diff

Apply the configuration

nv config apply

2. Backup Configuration

Save the configuration

nv config save

Copy configuration off the switch

- On the cumulus switch
sudo cp /etc/nvue.d/startup.yaml /home/cumulus
sudo chown cumulus /home/cumulus/startup.yaml
- From the cnode/your laptop
scp cumulus@<cumulus IP>:/home/cumulus/startup.yaml <switch_name>.yaml

Unset specific settings (if upgrading between certain versions)

nv unset system config auto-save enable on
nv unset service snmp-server enable on

3. Retain Connectivity

Set DHCP on the relevant port

nv set interface eth0 ip address dhcp

Connect using a terminal (if required)

sudo minicom -b 115200 -D /dev/ttyUSB0

If locked

sudo ps aux | grep /dev/ttyUSB0 | egrep -v "grep|sudo"
sudo kill -9 <PID>

4. Upgrade

Copy the new image to the switch

scp cumulus-linux-<version>.bin cumulus@<switch IP>:/tmp

Create an empty ZTP file, this will prevent the ports from coming up after the upgrade before the new configuration is applied

cat > /tmp/empty <<'_EOF'
#!/bin/sh

#CUMULUS-AUTOPROVISIONING

exit 0
_EOF

Install the new image

sudo -S onie-install -z /tmp/empty -af -i /tmp/cumulus-linux-<version>.bin 

Reboot

sudo reboot

5. Complex Password Management

Disable password hardening (optional)

nv set system security password-hardening state disabled
nv config apply

Set a new password

nv set system aaa user cumulus password
nv config apply

6. Restore Configuration

Set a static IP address

nv unset interface eth0 ip address dhcp
nv set interface eth0 ip address <IP>/<prefix>
nv set interface eth0 ip gateway <gateway>
nv config apply

Copy the backup file to the switch

scp startup.yaml cumulus@<switch IP>:/tmp
cp /tmp/startup.yaml /home/cumulus/

Load and apply the configuration

nv config patch /home/cumulus/startup.yaml
nv config apply

7. Enable Server-Facing Ports

Activate ports

nv set interface swp1s0,swp2s0,swp3s0-1,swp4s0-1 link state up

Verify changes

nv config diff

Apply and save the configuration

nv config apply
nv config save

 


Notes

  • Keep a backup of the original configuration and image for rollback if necessary.