[PATCH] kernel: fix mtd partition erase<parent_erasesize writing to wrong address

Adrian Schmutzler mail at adrianschmutzler.de
Sat Sep 19 07:09:13 EDT 2020


Hi,

> -----Original Message-----
> From: openwrt-devel [mailto:openwrt-devel-bounces at lists.openwrt.org]
> On Behalf Of John Thomson
> Sent: Mittwoch, 5. August 2020 23:14
> To: openwrt-devel at lists.openwrt.org
> Cc: John Thomson <git at johnthomson.fastmail.com.au>
> Subject: [PATCH] kernel: fix mtd partition erase<parent_erasesize writing to
> wrong address
> 
> This bug applied where mtd partition end address, or erase start address,
> was not cleanly divisible by parent mtd erasesize.
> 
> This error would cause the bits following the end of the partition to the next
> erasesize block boundary to be erased, and this partition-overflow data to be
> written to the partition erase address (missing additional partition offset
> address) of the mtd (top) parent device.

We had a longer discussion about that on IRC yesterday, where we also discussed (stylistically) different solutions.

As it appears, kernel did a major overhaul of mtdpart.c in a more recent major kernel version:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef

So, any cosmetic improvement would probably be in vain anyway. Therefore, I decided to merge the patch in its current (tested) state.

However, this essentially means we will meet the same problem again with the next kernel bump, and it won't become easier.
In the discussion, jow had the idea to circumvent these problems and handle the non-aligned partitions in user-space. Maybe one can exploit the mtd-rw tool (available as kmod-mtd-rw in packages repo) for unlocking the relevant areas and then do everything else in user-space.

If that seems suitable, we could use the merged patch for the 20.xx release, and then migrate master to a better solution when we have one.

I put the discussion below in case somebody feels inspired to play around with that (first half is the general discussion, second half mentions the user-space solution).

Best

Adrian

---

[...]
12:31 jow: adrianschmutzler: I took a look at the difference and in hindsight it seems to make a lot of sense
12:33 jow: adrianschmutzler: plain diff: (Link: https://pastebin.com/AqE89TXB)https://pastebin.com/AqE89TXB - code context: (Link: https://pastebin.com/Hn2ny4Lt)https://pastebin.com/Hn2ny4Lt
12:34 jow: adrianschmutzler: so the "instr->addr -= part->offset;" address manipulation is moved after the _write() which makes sense since we're supposed to both erase and (partially) re-write the same block
12:34 jow: (my second paste shows the old, not the patched code)
12:35 jow: the way the code is right now, it would erase the entire block partly occupied by a partition, then overwrite the first block of the mtd device
12:35 jow: this is my interpretation at least, I am not an mtd subsystem expert
12:38 adrianschmutzler: okay, thanks. will have a look at the older kernel now, as in the PR it is called "a merge bug  in 4.19 & 5.4" ...
12:39 jow: yeah, my first thought was the same, probably a merge/patch rebase quirk
12:41 jow: tbh, I'd refactor the code while at it, directly substract instr->addr after part->parent->_erase(part->parent, instr); like it is done in the current broken code and explicitly pass instr->addr + part->offset and instr->addr + instr->len + part->offset respectively to part->parent->_write() to mirror what is done for mtd_read() a few further lines up
12:41 adrianschmutzler: unfortunately upstream also doesn't seem to be interested in commenting on the patches
12:43 jow: mtd_read() and part->parent->_write() take explicit offsets as arguments (part->offset + instr->addr + ...) while part->parent->_erase() uses the ->addr member from instr, hence the address manipulation
12:43 jow: and I'd revert the address to its original value as soon as possible and continue passing calculated offsets later
12:44 adrianschmutzler: jow: So, it's actually bad design after all?
12:45 jow: this would be my fix on top of the current broken code: (Link: https://pastebin.com/diff/EjUdXshw)https://pastebin.com/diff/EjUdXshw
12:45 adrianschmutzler: the need to increase/decrease instr->addr ...
12:47 adrianschmutzler: jow: okay. IIRC, they stirred around mtd in one of the recent kernel anyway, and also did some fairly similarly looking changes with offsets there as well
12:47 jow: I do not know enough about the struct erase_info and part->parent->_erase() semantics to understand why _erase() apparently uses instr->addr relative to the partition offset while part->parent->_write() expects an absolute offset relative to the mtd block
12:48 jow: that seems inconsitent to me, maybe the _erase() operates on a wrong offset as well
12:53 jow: adrianschmutzler: I took at the upstream code as well: (Link: https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197)https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197 and I think I got a better idea now. So the various part_*() functions expect an erase_info struct with partition-relative addresses, they then interally manipulate those addresses to be mtd-relative, invoke the lowlevel whole mtd ops, then revert the address back to their
12:53 jow: original partition-relative value
12:54 jow: so the originally proposed patch which simply moves the reversal (instr->addr -= part->offset;) directly before the `return ret;` is fine and matches the style of the kernel part_*() functions
12:58 jow: adrianschmutzler: just for completeness, and even better fix would be (Link: https://pastebin.com/diff/mFSb8Uqp)https://pastebin.com/diff/mFSb8Uqp
13:02 adrianschmutzler: jow: why is the additional mtd_to_part(mtd) superior?
13:03 jow: it should be optimized away by the compiler, the code is superior because the intention is clearer
13:03 jow: likewise the mtd_read() at the beginning could be changed to part_read()  and the `part->offset + ` removed in each invocation
13:04 jow: this way the only "address fiddling" that remains is in the original function body of part_erase()
13:04 jow: (we can't obviously call part_erase() from within part_erase() since that'd recurse forever)
13:05 jow: but this is all just a matter of style and taste, not sure what the subsystem maintainers would prefer
13:06 jow: I guess however that this partial erase will likely will never go upstream
13:06 adrianschmutzler: okay, I see your point now
13:06 adrianschmutzler: probably that's why they remained silent on the other issues
13:07 jow: I actually wonder why this can't be done in userspace
13:08 jow: or in wahtever code using the mtd api
13:08 jow: after all it is just a matter of reading an entire block, holding it in memory, erasing the entire block and then simply partially write back the parts we did not want to erase
13:09 jow: the only obstacle would be the write protection for non-eb aligned parts I guess
13:10 adrianschmutzler: no idea; but I will throw that idea towards the affected people, maybe they want to go into that direction
13:11 jow: in the meanwhile I'd say we should apply the proposed fix
13:11 jow: it makes sense
13:11 jow: and its not the submitters fault that the orginal code is not beautiful to begin with
13:11 adrianschmutzler: sysupgrade won't be a problem for a user-space solution, as other stuff there happens in user-space as well?
13:12 jow: if I remember correctly, that partial erase thing was introduced years ago to support sysupgrade on Redboot systems
13:13 adrianschmutzler: that's one of currently broken systems and the patch fixes them apparently
13:13 jow: there we wanted to write accross the FIS table dictated kernel/rootfs boundary using the mtd util and nbd implemented a kernel patch and corresponding functionality in mtd to do exactly that
13:13 jow: and I believe one of (the?) problem was that the kernel mtd subsystem write-protects all non-aligned partitions by default
13:14 adrianschmutzler: but that sounds more like the subsequent patch:
13:14 adrianschmutzler: 412-mtd-partial_eraseblock_unlock.patch
13:15 adrianschmutzler: which states it's for ixp4xx only, which probably isn't true anymore ...
13:16 jow: adrianschmutzler: I think both is needed
13:19 jow: adrianschmutzler: I think the partial erase was needed to rewrite the FIS partition while keeping the Redboot config (often both share one eraseblock)
13:20 adrianschmutzler: okay, then since this is tested and waiting some time already, and since part_write is removed with newer kernel anyway, I merge the existing patch.
13:20 jow: adrianschmutzler: and the partial erase unlock patch covers cases where the partition you manipulate is locked because its not spanning an entire eb
13:21 jow: at least this is my understanding
13:22 adrianschmutzler: jow: okay. but then the 412 patch is probably relevant to all devices with redboot partitions and the ixp4xx references should be removed from the description ...
13:22 jow: or rather, it allows unlocking non-aligned parts by simply tweaking the flash area begin and end offsets to start at the previous eb boundary and end on the next
13:22 jow: so you're not only unlocking the target area but a little bit of stuff before and after too
13:23 jow: otherwise the mtd subsystem would reject the unlock
13:24 adrianschmutzler: so, it's effectively a hack
13:24 jow: yes
13:24 adrianschmutzler: btw just found that Rafał wanted to remove these scripts already in March: (Link: https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/)https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/
13:25 adrianschmutzler: scripts->patches
13:26 jow: I think that if we somehow have a mechanism to forcibly unlock all flash areas (which could be done by a much leaner and more isolated patch I suppose) we could handle all the other edge cases in userspace
13:26 jow: by disregarding any partitioning and simply treating the entire mtd as continuous block device with offsets
13:27 jow: like you would e.g. dd an x86 image to the blockdev
13:28 jow: ah, just remebered this: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw
13:28 jow: something like this could be a solution for sysupgrade
13:28 jow: not sure if its still compatible to 5.4 but if yes the we wouldn't even need a patch at all
13:29 adrianschmutzler: mtd-rw is in packages repo and works fine AFAIK
13:29 jow: a slightly less dangerous solution would be a kernel module or some other facility which would allow us to create mtd partitions on demand, not sure if something like this is possible
13:30 jow: one could thne simply "spwan" a new mtd partition for the flash area relevant for sysupgrade, then "mtd write" in there
13:30 adrianschmutzler: I just hesitate to include and enable a tool like that by default
13:30 jow: while "masking" other sensible parts like bootloader etc. to avoid bricks if something goes wrong
13:33 jow: hm well, yeah I understand your concerns
13:33 jow: but on x86 for example you can also happily dd over your entire disk
13:33 jow: granted, it is way easier to recover, but still...
13:34 adrianschmutzler: I will push that forward to the Mikrotik people anyway, as they seem to be the most affected at the moment. Maybe they have time to play around with idea going into that direction
13:34 adrianschmutzler: "Mikrotik people" -> contributors working with Mikrotik devices, not the vendor of course
13:37 jow: I could imagine that adding mtd-rw by defualt and adding some kind of sysctl to trigger it would be good enough
13:37 adrianschmutzler: together with Tomasz on the Redboot front there should be enough people interested
13:37 adrianschmutzler: hmm, I really have to think about that
13:37 jow: that sysctl could then be toggled by "mtd unlock" which never actually worked for me when I needed it
13:38 jow: or some other equivalent commend to not break existing "mtd unlock" users
13:38 jow: *command
13:38 adrianschmutzler: I fear this may end up in a lot of scripts then, where people will use it to take shortcuts ...
13:39 adrianschmutzler: but if it solves the problem, we should consider it
13:40 adrianschmutzler: because maintaining these patches won't become easier
13:41 jow: I agree
14:12 rmilecki: adrianschmutzler: jow: omg, that parial erase size crap again?
14:13 jow: rmilecki: we've been discussing its original intnetion and possible, non-invasive alternatives
14:14 jow: rmilecki: current working idea is to ship some kind of mtd unlock mechanism and handling the overlapping write/erase things in userspace
14:15 jow: rmilecki: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw plus some clever scripting should already achive that
14:15 adrianschmutzler: rmilecki: the problem is that this partial erase stuff is needed for the redboot devices and all of the mikrotik ones as well (which currently all have "broken" sysupgrade)
14:16 rmilecki: the problem is it exists in a current form
14:16 rmilecki: afair it enables those partial wries for all ops, without checking if access is aligned or not
14:18 rmilecki: adrianschmutzler: i see jow posted few improvement ideas, do you want to implement them? if so, as part of submitted patch? or on top of that (later)?
14:21 adrianschmutzler: rmilecki: most of jow's improvement were about style. however, in (Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef)https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef kernel changes partition handling anyway
14:21 rmilecki: oh, that one is a hure change
14:21 adrianschmutzler: so, I don't think it's worth to care too much about style when we have to do a different implementation for next stable kernel anyway.
14:21 rmilecki: it's in some recent kernel though, right?
14:21 adrianschmutzler: and the existing patch is tested and known to be working on the affected devices
14:22 rmilecki: adrianschmutzler: ok, i agree, let's just make it work
14:22 adrianschmutzler: haven't checked explicitly, but somewhere after 5.4 of course
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openpgp-digital-signature.asc
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.openwrt.org/pipermail/openwrt-devel/attachments/20200919/48641814/attachment-0001.sig>


More information about the openwrt-devel mailing list