[OpenWrt-Devel] Notes on (dangerous ?) sysupgrade

Sun Jan 13 09:30:48 EST 2019

Am 13.01.2019 um 14:31 schrieb Jo-Philipp Wich:
> Hi Reiner,
> 
>> After having several unpleasant encounters using sysupgrade, I had a
>> quick glance at the code, after more or less successfully implementing
>> workarounds for incomplete sysupgrades, resulting in inconsistent systems.
>> My questions are:
>> - Is it safe, simply to kill running processes during sysupgrade ? As
>> there might be services, restarted automatically (by procd ?).
> 
> Roughly, the sysupgrade process is as follows:
> 
> 1) /sbin/sysupgrade (shell script)
> 
> Parses arguments, sets default, assembles conffiles to backup, runs
> partials scripts in /lib/upgrade, checks the image, ends with `ubus call
> system sysupgrade`. All fatal exit conditions (such as invalid image)
> should be handled here.
> 
> 2) ubus call system sysupgrade (procd ubus procedure)
> 
> Invokes a procedure in procd that instructs procd to terminate itself
> and exec into /sbin/upgraded (which has been copied to a ramdisk at
> /tmp/root first), turning /tmp/root/sbin/upgraded into pid 1 and
> releasing the pid 1 use of /.
> 
> 3) /tmp/root/sbin/upgraded (binary)
> 
> Functions as pid 1 placeholder to prevent the kernel from panicking. It
> does two things; keep serving the watchdog to prevent spontaneous resets
> and executing /lib/upgrade/stage2
> 
> 4) /lib/upgrade/stage2 (shell script)
> 
> Assemble backup tarball, write image, append backup tarball to just
> written image. The exact procedure depends on the platform.
> 
> 
> So yes, it is safe to simply kill processes in the sense that there will
> be no procd running anymore at this point which would relaunch them.
> 
> Merely killing processes instead of shutting them down through their
> respective init scripts is not ideal though, that eventually needs rework.
> 
> Ideally sysupgrade should try to cleanly stop as many services through
> their respective init scripts as possible before invoking stage2, then
> only do the 'kill TERM; sleep 3; kill KILL' sequence on processes that
> somehow failed to stop initially (buggy init scripts, timeouts, ...).
> 
>> -  What about a killed process, simply taking some time to shut down ?
>> (example: squid closing lot of open files on block-device; having
>> internal shutdown timer 30s by default)
> 
> Such services are not gracefully handled atm, see above.
> 
>> - What about open swap file on block-device ?
> 
>  From a cursory look, it does not appear that sysupgrade currently
> performs any swapoff at all, adding a `swapoff -a` after the process
> termination would certainly make sense.
> 
>> - What about mounted block-device for mass storage ?
> 
> Same as swap, there is no umount handling either as far as I can see. I
> think this should be added as well along with the swapoff. Since the
> sysupgrade runs off a pivot_root'ed /tmp/root at this point, all fses
> should be free to umount. (Might still need two or three cycles due to
> layered mounts).
> 
>> - What about (slow) wwan connection, managed by pppd. When killed by
>> sysupgrade, will netifd restart pppd ?
> 
> It should not happen. Theoretically it could be that pppd is killed
> first while netifd is still running, netifd will then try to restart
> pppd shortly before netifd itself will get killed, but the second KILL
> loop three seconds later should catch this rare circumstance.
> 
> However, as discussed above a graceful service shutdown would be better.
> 
>> As a workaround, before calling sysupgrade I
>> - explicitly use /etc/init.d/most_services stop
>> - explicitly kill squid and wait for termination
>> - explicitly disable swap
>> - explicitly dismount mounted block-device
>> - ifdown wwan
> 
> That certainly makes a lot sense and most of this should probably go
> into sysupgrade (stage1 aka /sbin/sysupgrade) directly. A slight
> difficulty is see is how to identify "most_services" but I guess a
> hardcoded whitelist for things like "dropbear", "openssh" or "telnetd"
> will do.
> 
> As for awaiting squid termination - I think if its not already the case,
> the squid init script should be reworked so that /etc/init.d/squid stop
> does not return (successfully) before squid is actually stopped.
> 
>> Before I had several cases, that
>> sysupgrade -n -v -f /tmp/newfiles.tar.gz /tmp/new_fw.bin
>> updated all files from /tmp/newfiles.tar.gz, but did not do the flash of
>> new_fw.bin
> 
> This is quite strange as appending the /tmp/newfiles.tar.gz archive will
> only happen after /tmp/new_fw.bin has been written. I could only imagine
> that the image write procedure itself somehow failed, but appending the
> archive still worked.
> 
> How exactly this could fail depends on the platform. Can you provide
> some more details about the device this issue occurred on?
> 
> ~ Jo
> 
> 

I had these observations on my ZBT WE1026-5g.
I am running several special services, like squid, collectd, chilli, nginx, uhttpd, openvpn.
The WE1026-5g includes a SD-card, used for swap-file, squid-caching, logfiles (from squid and nginx).
Quectel EC25 is used for wwan (serial, 3g); but I _think_ I had same effects using wan instead of wwan, too.
Additionally, I have several simple private processes, like continuous ping to keep wwan active.
And some cron-jobs defined.

Note: When killing processes in sysupgrade, it might be a good idea, to wait for some time, to make shure, process 
(here: squid) it shut down completely. Good practice, not to rely upon certain settings in squid.conf or similar.

And, it might be a good idea to include some type of persistent error report, in case sysupgrade fails. I.e. to create a
special "/etc/sysupgrade_error.info".

Cheers,

Reiner

_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel