I’ve done about 100’s of ASA upgrades remotely and over the console and I thought I’ve seen it all but apparently not. This time it was a remote upgrade from version 9.4.x to 9.6.2.x and nothing was out of ordinary.
Cisco ASA: 9.4.x > 9.6.2
I’ve followed the standard practice of upgrading standby unit first, rebooting it and then failing over to verify functionality before proceeding to the primary. After I’ve issued reload it took a strangely long time for it to come back up online, no log messages on active unit indicated the state of the other one. I was VPNed into the primary unit so all of my access seemed ok, however, I took precautions and tried to access something over a site to site VPN tunnel from the internal resource. My suspicious was true, I was loosing 50% of the packets to anything traversing primary ASA so something was drastically wrong. One of the primary causes of packet loss is asymmetric packet flow so this led me to believe the other unit was not down but actually up and was causing active/active behavior. The only choice I had is to shutdown switch ports leading to standby unit and it fixed packet loss issue. I needed to figure out the root cause so I had someone dispatched and console over to standby ASA. Nothing stand out right away so I had it rebooted and watched the log.
Reading from flash…
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!……………………………………………….. ……………………..ERROR: Failed to apply IP address to interface GigabitEthernet0/7, as the network overlaps with interface Internal-Data0/3. Two interfaces cannot be in the same subnet.
*** Output from config line 3299, “failover interface ip Fa…”
Cross referencing interface configuration I found that my failover interface was configured with Link-local IP.
failover link Failover-Interface GigabitEthernet0/7
failover interface ip Failover-Interface 169.254.1.1 255.255.255.252 standby 169.254.1.2
Apparently, in version 9.6.2 this IP is used for internal data plane and due to the conflicting configuration was not applied to failover interface which caused standby unit to go active.
After applying RFC 1918 IP instead of Link-local address to active and standby units failover pair normalized and traffic started flowing normally.
To perform this on production unit follow these article to avoid disruption in traffic flow.
failover interface ip Failover-Interface 192.168.255.1 255.255.255.252 standby 192.168.255.2
- Do not use Link-local addresses on failover interface
- If upgrade did not go as planned and you have to assume one of the units is down always verify connectivity through ASA
- Have a console, it may not be needed 99% of the time but that 1% will get you eventually