Downtown Doug Brown » What occurred with ASUS routers this morning?

After I awakened at present round 6:45 AM PDT, I didn’t appear to have web service out there. My telephone instructed me that I used to be linked to my Wi-Fi community, however it didn’t have connectivity. “Hmm, that’s bizarre,” I believed. Possibly a fiber lower within the space or one thing? I checked out my IRC shopper on my desktop Home windows PC, which is good as a result of it information timestamps of after I lose my connection:

My connection had been down for over 3 hours at this level. Bizarre! I figured I’d log into my ASUS RT-AC86U router’s net interface and see what was occurring. One thing occurred that I wasn’t anticipating in any respect — the web page wouldn’t totally load. Parts of it confirmed the little “unhappy web page” icon indicating a connection error.
I attempted to SSH into the router as an alternative. The primary few connection makes an attempt failed, after which lastly I obtained in. What I discovered, although, was that I couldn’t run any instructions. It simply spit this error again at me:
-sh: cannot fork
OK, so one thing was actually tousled. I made a decision to energy cycle the router at this level. Possibly some bizarre glitch occurred or one thing. Which might be odd — this router has been fairly rock strong since I’ve had it, except for 2.4 GHz Wi-Fi points over time. That’s one other story I don’t need to get into at present.
Anyway, when the router got here again up the whole lot appeared positive. However then, 40 minutes later, my connection dropped once more with the identical signs.

The truth that they have been each at precisely 23 seconds might be only a loopy coincidence. I used to be beginning to panic a bit at this level. I actually didn’t suppose a difficulty like this may very well be my ISP’s fault, however I hadn’t modified a single factor about my community setup. I hadn’t up to date my router firmware for fairly some time both — I had automated updates turned off, and final I had checked, ASUS hadn’t launched a brand new replace for it.
I used to be capable of efficiently SSH into the router this time, and I did just a few fast diagnostics. I used top to point out me what was occurring. I sadly didn’t take any screenshots, however I seen {that a} course of referred to as asd was taking on 50% of my CPU. The CPU is dual-core in keeping with /proc/cpuinfo, so 50% seemingly means one core was totally pegged.
My first intuition was to seek for asd (which was troublesome with a non-working web connection) however I discovered that it’s an ASUS safety daemon. This made me really feel slightly bit higher, however I nonetheless felt prefer it needed to be concerned in the issue. Usually after I SSH into my router, prime doesn’t present something utilizing wherever near 50% of the CPU.
I began looking on Reddit and Twitter to see if anybody else had run into something comparable, and that’s after I noticed this tweet by @stevecantsmell:
Anybody with an ASUS router having connection points since 6am (-0400)?
We’re discovering individuals needing to restart and manually replace the firmware to maintain a steady connection.
— SteveP ???????? (@stevecantsmell) May 17, 2023
The best way he worded it, it seems like he works for an ISP. This sounded so just like my concern, even right down to the timeframe! That might correspond to three AM in my time zone. I adopted his recommendation. I rapidly rebooted the router and went proper into the firmware replace web page in its net UI. Certain sufficient, I used to be working model 3.0.0.4.386.48260 and there was an replace out there for 3.0.0.4.386.51529 which was launched final month. It seems I had additionally missed a firmware launch that got here out in March. I do prefer to preserve my router updated, however I had been checking at a slower interval since there hadn’t been an replace for a couple of yr.
I used to be capable of set up the replace. The router rebooted by itself after the replace completed and the whole lot has been positive since then. asd is now not utilizing 50% of the CPU both. Within the hours since this downside occurred, I’ve heard of numerous different individuals who bumped into this very same concern with quite a lot of ASUS routers. Extra individuals chimed in within the Twitter thread linked above, and there have been several posts on Reddit and SNBForums. In some circumstances a beta firmware was required to repair the problem. It was comforting to know that I wasn’t alone, but additionally extremely irritating to listen to that so many individuals have been affected. I wager ISP tech assist staff had a “great” day at present.
So…what precisely occurred early this morning to set this entire factor off? Did ASUS’s asd program obtain some sort of defective file from their servers that prompted it to hold up? Was somebody making an attempt a mass exploit on a vulnerability that was lately patched by ASUS? Did updating the firmware actually repair the problem or did it simply cease a series of occasions that may restart itself once more quickly?
I don’t know, however right here’s what I’ve been capable of collect to this point. It seems that the file /jffs/asd.log (and /jffs/asd.log.1, which I believe is the rolled-over model containing earlier entries) on my router was being crammed with hundreds of traces of the next error message:
1684335272[chknvram_action] Invalid string
The quantity seems to be a UNIX timestamp, comparable to 7:54 AM PDT this morning, which might be proper across the time that I lastly put in the firmware replace. I’m guessing this was continuously being written to this log as quickly as the issue started at 3:24 AM.
I additionally discovered these fascinating messages in /jffs/syslog.log-1 at across the time the connection was first misplaced:
Could 17 03:18:14 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware info Could 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7702)]fimrware replace verify first time Could 17 03:18:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7733)]no have to improve firmware Could 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware info Could 17 03:18:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7707)]fimrware replace verify as soon as Could 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware info Could 17 03:19:21 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7707)]fimrware replace verify as soon as Could 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7687)]retrieve firmware info Could 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7702)]fimrware replace verify first time Could 17 03:19:51 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7733)]no have to improve firmware Could 17 03:22:58 kernel: CPU: 1 PID: 12870 Comm: contact Tainted: P O 4.1.27 #2 Could 17 03:22:58 kernel: {Hardware} title: Broadcom-v8A (DT) Could 17 03:22:58 kernel: activity: ffffffc01730eb00 ti: ffffffc0126dc000 activity.ti: ffffffc0126dc000 Could 17 03:22:58 kernel: PC is at 0xf6dcc90c Could 17 03:22:58 kernel: LR is at 0xf6dccfd4 Could 17 03:22:58 kernel: computer : [<00000000f6dcc90c>] lr : [<00000000f6dccfd4>] pstate: 400f0010 Could 17 03:22:58 kernel: sp : 00000000fff3d154 Could 17 03:22:58 kernel: x12: 00000000fff3d188 Could 17 03:22:58 kernel: x11: 00000000fff3d1d4 x10: 00000000f76334c0 Could 17 03:22:58 kernel: x9 : 00000000fff3d189 x8 : 00000000fff3d184 Could 17 03:22:58 kernel: x7 : 000000000000000b x6 : 0000000000000000 Could 17 03:22:58 kernel: x5 : 00000000fff3e8bc x4 : 00000000fff3d420 Could 17 03:22:58 kernel: x3 : 000000006e69622f x2 : 00000000fff3e8bc Could 17 03:22:58 kernel: x1 : 00000000fff3d420 x0 : fffffffffffffff2 Could 17 03:23:17 kernel: CPU: 1 PID: 12894 Comm: contact Tainted: P O 4.1.27 #2 Could 17 03:23:17 kernel: {Hardware} title: Broadcom-v8A (DT) Could 17 03:23:17 kernel: activity: ffffffc01730eb00 ti: ffffffc01151c000 activity.ti: ffffffc01151c000 Could 17 03:23:17 kernel: PC is at 0xf6dcc90c Could 17 03:23:17 kernel: LR is at 0xf6dccfd4 Could 17 03:23:17 kernel: computer : [<00000000f6dcc90c>] lr : [<00000000f6dccfd4>] pstate: 400f0010 Could 17 03:23:17 kernel: sp : 00000000fff3d154 Could 17 03:23:17 kernel: x12: 00000000fff3d188 Could 17 03:23:17 kernel: x11: 00000000fff3d1d4 x10: 00000000f76334c0 Could 17 03:23:17 kernel: x9 : 00000000fff3d189 x8 : 00000000fff3d184 Could 17 03:23:17 kernel: x7 : 000000000000000b x6 : 0000000000000000 Could 17 03:23:17 kernel: x5 : 00000000fff3e8bc x4 : 00000000fff3d420 Could 17 03:23:17 kernel: x3 : 000000006e69622f x2 : 00000000fff3e8bc Could 17 03:23:17 kernel: x1 : 00000000fff3d420 x0 : fffffffffffffff4 Could 17 03:23:51 watchdog: restart_firewall due DST time modified(1->0) Could 17 03:23:51 rc_service: watchdog 1807:notify_rc restart_firewall Could 17 03:23:51 rc_service: watchdog 1807:notify_rc restart_wan Could 17 03:23:51 rc_service: waitting "restart_firewall" through watchdog ... Could 17 03:23:51 firewall: apply guidelines error(2857) Could 17 03:23:51 firewall: apply guidelines error(2892) Could 17 03:23:51 providers: apply guidelines error(17779) Could 17 03:23:51 firewall: apply guidelines error(4580) Could 17 03:23:52 miniupnpd[5322]: shutting down MiniUPnPd Could 17 03:23:53 DualWAN: skip single wan wan_led_control - WANRED off Could 17 03:23:58 dnsmasq-dhcp[5503]: failed to put in writing /var/lib/misc/dnsmasq.leases: No house left on gadget (retry in 60s)
So it did an auto firmware replace verify at 3:18 AM (once more, I’ve auto updates turned off) after which 3 minutes later, the kernel obtained mad about one thing. As you’ll be able to see on the backside, different issues began to fail too. The dnsmasq error clearly signifies that there was no house out there in /var/lib/misc. /var is mounted as a tmpfs, so I believe this implies the router was out of RAM.
It seems to be just like the auto firmware verify is fairly frequent to see each morning within the log, though it did fail on Monday if that’s related:
Could 15 03:18:05 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 15 03:19:17 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]couldn't retrieve firmware info: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23 Could 15 03:19:46 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 15 03:21:11 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]couldn't retrieve firmware info: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23 Could 15 03:21:40 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7669)]do webs_update Could 15 03:22:44 WATCHDOG: [FAUPGRADE][auto_firmware_check:(7737)]couldn't retrieve firmware info: webs_state_update = 1, webs_state_error = 1, webs_state_dl_error = 0, webs_state_info.len = 23
It’s unclear to me if the auto firmware verify is even associated to when the issue first began. Possibly it’s considered one of a number of periodic duties that run at round that point? It seems to be like sometimes I see this message about 30-40 minutes after the auto firmware verify:
ahs: [read_json]Replace ahs JSON file.
This appears to be associated to “ASUS Therapeutic System” which I don’t even know if I’ve enabled or not. I additionally noticed the auto replace verify and ahs JSON message present up once more within the log after my first router reboot, at round 6:47 AM. Not too lengthy after that, the dnsmasq.leases “no house left on gadget” error occurred once more, so I believe it was out of RAM once more — maybe asd was gobbling up CPU time and RAM.
Does anybody have any additional information on what occurred right here? My two theories are: both asd downloaded a nasty file from ASUS that prompted it to crash, or somebody was exploiting a vulnerability that was patched in considered one of ASUS’s two most up-to-date updates for my RT-AC86U router. If it’s the latter, it’s clearly my unhealthy for not retaining my firmware updated, however I can’t assist however marvel if an automated file obtain in the midst of the night time prompted it. I’m very interested by what occurred! Did anybody with an ASUS router not run into an analogous downside at present?