[OpenWrt-Devel] Sysupgrade possibly broken in recent development snapshots: "message": "Firmware image couldn't be validated"
Hannu Nyman
hannu.nyman at iki.fi
Thu Jan 2 10:48:08 EST 2020
Petr Štetiar kirjoitti 1.1.2020 klo 22.46:
> Petr Novák <petrn at me.com> [2020-01-01 21:11:30]:
>
>> But how come the workaround was to use an older libubox and ubus - was there
>> any new check which was not there before?
> I don't have definitive answer, as I would need RPi-4 (or any other real
> hardware with Cortex-A72 core) to find the actual bit in the libubox which
> caused this change in the behavior, but here is a part of the commit
> description[1] which might help answering that:
>
> It seems like the recent fixes in the libubox library, particulary in
> the jshn sub-component (which empowers json_dump used in the shell
> script executed by the child process) made the execution somehow faster,
> thus exposing this racy behaviour in the validate_firmware_image_call at
> least on RPi-4 (Cortex-A72) target.
>
> As I was unable to trigger this issue even in the QEMU/Cortex-A72 I assume,
> that it was simply some kind of race, needed specific timing, provided
> preciously only by that RPi-4 hardware.
I think that there may have been an older race condition behaviour that has
now just surfaced better with RPi4 after the recent changes. It has earlier
manifested itself sometimes with some routers, but more rarely.
I have seen an occasional failure of sysupgrade in one of my routers since
October (ar71xx or ath79 / WNDR3700v2). I wrote about that to the mailing
list in November, although then I thought that it might be just a "force"
option failure:
http://lists.infradead.org/pipermail/openwrt-devel/2019-November/019996.html
Others have seen that also, based on forum discussion:
https://forum.openwrt.org/t/build-for-wndr3700v1-v2-wndr3800/64/295
Petr Novak describes similar thing as my error as: "it does just reboot but
does not flash anything."
I have tried to debug that in my WNDR3800 that has serial console connection,
but have not managed to produce the error in that 3800. With 3800 the
sysupgrade has succeeded always. However, in my 3700v2 (that has identical
hardware except the RAM size) on the other side of the building, I still
occasionally see the behaviour of LuCI based sysupgrade starting ok, but the
router booting back to the same firmware after an invisible error. After that
reboot the next sysupgrade attempt via LuCI usually works quite ok. (sounds
like a sysupgrade from a recently booted system usually works, but
sysupgrading a system after some runtime does sometimes not work.)
I first thought that it was related to using force in the ar71xx/ath79 jump,
but it has been present in normal sysupgrades.
Possibly a manifestation of the same race condition in
sysupgrade/procd/libubox, so hopefully your patches will fix also that.
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel
More information about the openwrt-devel
mailing list