Summary
If CNode or DNode hardware has been idle for long periods, the onboard RTC clock may drift substantially from the current time. Once a substantial deviation (>5-10 min) occurs, chronyd will not be able to step or slew the system clock in a reasonable amount of wall clock time (it takes weeks to years) to bring the nodes into time sync, and potentially will not be able to accept the response from an NTP server. The quickest solution is to:
Disable NTP on the affected nodes.
Manually set the time.
Re-enable NTP.
Restart
chronydon ALL nodes.
This will bring the node to within a few minutes of the current NTP time and speed up synchronization.
Symptoms
During an Install, Expansion, Upgrade, Node Replacement or other task, the installer is asked to run the following commands:
Configure the time zone and time
# Restart chronyd. This ensures chrony is actually using the configured NTP servers and not the OS default.
clush -a sudo systemctl restart chronyd
# Set UTC and confirm time is in sync (NTP should be working).
clush -a sudo timedatectl set-timezone UTC
clush -aB dateExample Output
---------------
172.16.3.[1-12,100-105] (18)
---------------
Sat Apr 30 03:38:18 GMT 2022Issue
In some circumstances, the new node(s) may have substantial time drift.
clush -aB dateExample Output
---------------
172.16.3.[1-8,100-105] (14)
---------------
Sat Apr 30 03:38:18 GMT 2022
---------------
172.16.3.[9-12] (4)
---------------
Sat Apr 30 03:05:18 GMT 2022
As shown in the example above, nodes 9-12 have drifted by 33 minutes from the current NTP time.
Solution
First, double-check all configurations and verify that NTP servers are reachable.
Confirm chrony config
Confirm that all configs have the same chrony servers and they are correct:
clush -aB grep ^server /etc/chrony.confIf they aren't the same or they are wrong, make them the same and/or correct them and restart chronyd again.
Ping the NTP Server(s)
Change NTP1/2 to your customer’s NTP server addresses.
ping -c 4 -i 0.2 -M do -s 1000 NTP1 |grep loss
ping -c 4 -i 0.2 -M do -s 1000 NTP2 |grep lossIf you cannot reach the NTP servers, there may be an issue on the Management Network.
Disable NTP, manually set the time, then re-enable NTP and restart chronyd
# Disable NTP on all affected nodes
clush -w 172.16.3.[9-12] 'sudo timedatectl set-ntp false'
#Manually set the
clush -w 172.16.3.[9-12] 'sudo timedatectl set-time "2019-06-22 13:41:00"'
# Enable NTP on all affected nodes
clush -w 172.16.3.[9-12] 'sudo timedatectl set-ntp true'
# Restart chronyd on all nodes to force an NTP sync.
clush -a 'sudo systemctl restart chronyd'
Check you work
clush -aB dateExample Output
---------------
172.16.3.[1-12,100-105] (18)
---------------
Sat Apr 30 03:38:18 GMT 2022
Drifts of < 5 minutes should resolve within 3-5 minutes of restarting chronyd. If issues persist after following these steps, collect all logs, SSH terminal history, and contact CS.