Cisco access point firmware corruption issues and how to remediate
Over the last years, some nasty bugs have accumulated with Cisco lightweight APs, involving flash corruption. In the first instances, APs have lost configuration (coming up with default name and role), later some refused to boot or were stuck in a boot loop. That occured at any given time the AP reboots - power outage, software upgrade,… The most common I have seen:
*Mar 1 00:00:30.483: %SYS-5-RELOAD: Reload requested by Init. Reload Reason:
Radio FW image download failed ....
There are a number of bugs that resulted in these behaviours, ranging from software version 8.0 to 8.5. Cisco Field Notice: FN - 70330 lists the corresponding bug IDs, software versions and APs (including 1700/2700/3700, which were hit the hardest it seems).
If you are running one of the affected versions, seeing this problems and want to upgrade - Cisco made a Python tool, that tries to fix the flash before you’re upgrading, saving you from stranded APs. You can read how to use it in their document Understanding Various AP-IOS Flash Corruption Issues.
If you already have non-functional APs, they are probably fixable, just not remote. Here’s how you do it:
Since the AP probably isn’t booting anymore, you will have to take it down and attach a serial console to it. Set your console to 9600 8N1 and power the AP.
From the output you can see why it wont boot anymore, probably something along the lines with image download fail. To revive the AP and make him a happy little accesspoint again, press and hold the mode button. Keep it pressed, until you see this line:
button is pressed, wait for button to be released...
Then release the button. You will now boot to a “ap:” prompt:
button pressed for 30 seconds
process_config_recovery: set IP address and config to default 10.0.0.1
process_config_recovery: image recovery
image_recovery: Download default IOS tar image tftp://255.255.255.255/ap3g2-k9w7-tar.default
examining image...
DPAA Set for Independent Mode
tide_boot_speed = 1000
DPAA_INIT = 0x0
%Error opening tftp://255.255.255.255/ap3g2-k9w7-tar.default (connection timed out)ap:
ap:
Now, depending on what exactly went wrong with the flash, I had APs working with as little work as booting the recovery image on the AP:
ap: dir flash:
Directory of flash:/
2 -rwx 12312 private-multiple-fs
3 -rwx 56200 event.log
4 -rwx 116551 event.r0
5 -rwx 210 env_vars
38 drwx 576 ap3g2-rcvk9w8-mx
74 drwx 0 configs
75 -rwx 64 sensord_CSPRNG0
76 -rwx 64 sensord_CSPRNG1
77 -rwx 8 sensord_crashWer0
78 -rwx 100 capwap-saved-config-bak
79 -rwx 99 capwap-saved-config
80 drwx 2496 ap3g2-k9w8-mx.153-3.JC15
145 -rwx 137639 event.r1
19586560 bytes available (21572096 bytes used)
ap: boot flash:/ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-mx
The AP reboots with recovery image, connects to controller, downloads the new image, and is happy again.
However, I had APs that needed more attention. In that case I formatted the flash completely, and transferred the recovery image via TFTP.
ap: format flash:
Are you sure you want to format "flash:" (all data will be lost) (y/n)?y
flashfs[0]: 0 files, 1 directories
flashfs[0]: 0 orphaned files, 0 orphaned directories
flashfs[0]: Total bytes: 41158656
flashfs[0]: Bytes used: 1024
flashfs[0]: Bytes available: 41157632
flashfs[0]: flashfs fsck took 28 seconds.
Filesystem "flash:" formatted
ap: set IP_ADDR 10.10.6.233
ap: set NETMASK 255.255.255.0
ap: set DEFAULT_ROUTER 10.10.6.1
ap: tftp_init
ap: ether_init
Initializing ethernet port 0...
Ethernet speed is 1000 Mb - FULL Duplex
ap: tar -xtract tftp://10.7.7.7/ap3g2-rcvk9w8-tar.153-3.JH.tar flash:
extracting info (274 bytes)
ap3g2-rcvk9w8-mx/ (directory) 0 (bytes)
extracting ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-mx (229940 bytes)..................................................
extracting ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-tx (73 bytes)
extracting ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-xx (7313476 bytes)......................[output shortened].........................................
extracting ap3g2-rcvk9w8-mx/info (274 bytes)
extracting ap3g2-rcvk9w8-mx/file_hashes (438 bytes)
extracting ap3g2-rcvk9w8-mx/final_hash (141 bytes)
extracting ap3g2-rcvk9w8-mx/final_hash.sig (512 bytes)
extracting ap3g2-rcvk9w8-mx/img_sign_rel.cert (1375 bytes)
extracting ap3g2-rcvk9w8-mx/img_sign_rel_sha2.cert (1371 bytes)
extracting info.ver (274 bytes)
ap: set BOOT flash:/ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-mx
ap: boot flash:/ap3g2-rcvk9w8-mx/ap3g2-rcvk9w8-mx
Rebooting system to reset DPAA...
The AP reboots, boots the recovery image, connects to controller, downloads correct image, and is a happy little access point again.
With the “fixed” versions of controller software (e.g. 8.2.170.0, or 8.5.140.0) I have not had any problem like this so far. Knock on wood for fixed bugs!