Thanks for tracking this down and pointing out where the error is. There are actually a handful of errors in that code. Assume a source value sv, source alpha sa, mask alpha ma, destination value dv, and destination alpha da. The values sv and dv are stored premultiplied by their corresponding alphas (the one true way). Given those values, the correct new values for the destination pixel in an S over D op are: dv = (sv*ma + dv*(255-sa*ma)) / 255 da = (sa*ma + da*(255-sa*ma)) / 255 Bug #1: The current draw.c does the division separately on the two halves: dv = (sv*ma)/255 + (dv*(255-sa*ma))/255 This can be off by one if the remainders from the two divisions sum to >= 255. Bug #2: The MUL0123 macro assumes four values are packed into a 32-bit int and runs them in simultaneous pairs as 32-bit operations (MUL02 and MUL13) operating on 16-bit halves of the word. Those two don't use the right rounding for the bitwise op implementation of /255. On a single value, the implementation is x / 255 == (t = x+1, (t+(t>>8))>>8) (x+127) / 255 == (t = x+128, (t+(t>>8))>>8) These calculations only need 16 bits so you can run two of them in the two halves of a 32-bit word. The second implements round-to-nearest and is what the draw code tries to do in this case. But it only adds 128 (0x80), so it only rounds the bottom half correctly. It needs to add 0x00800080, which would round both of them. This explains: src rFF gFF bFF αFF mask kFF αFF dst r00 g00 b00 αFF dst after calc rFE gFE bFF αFF Bug #3: MUL0123 is enabled whenever the src and dst both have 32-bit pixel width, but there is no check that the sub-channels are in the same order. You don't say what the image chans were in your test, but this: src rFF g00 b00 αFF mask kFF αFF dst r00 g00 b00 αFF dst after calc rFE gFE b00 α00 would be explained by, say, src==ARGB and dst==RGBA. The A and R values in src became the R and G chans in dst. (In fact, since the dst R and G are FE not FF, that's almost certainly the scenario, modulo the little-endian draw names.) This one is similarly explained: src rCC g00 b00 αCC mask kFF αFF dst r00 g00 b00 αFF dst after calc rCB gCB b00 α33 The destination got scaled by 0x33/0xFF, leaving 00 00 00 33, and then the source, cc 00 00 cc, was added in the wrong place, using the incorrect rounding, to produce cb cb 00 33. I have corrected these bugs in the plan9port copy of src/libmemdraw/draw.c and finally get the right answer for Andrey's test (attached). I leave it as an exercise to the interested reader to port the changes to the other dozen copies of libmemdraw that are floating around, or to unify them all. Russ