Dithering and Per-pixel Ops

Dithering makes a good example of how cogen can make an abstract implementation run fast. This section sketches a procedure that dithers a 24-bit rgb image down to any size color-cube. The rgb image's channels may have any linear organization. A software cache converts loop code with byte-loads into code with word loads, shifts, and masks. It works by keeping the low two bits of a loop index static. The ditherer is built from the following components: a 1D linear looper, the software cache, a hash-noise dither, and a byte permuter.

Why RTCG: The size of the destination color-cube depends on how many unallocated colors are available in the X server, and thus isn't available until runtime. Furthermore the user may want to display images from different sources with different channel layouts.

In this section, the partially-static integers and the operations on them are manually. This is a new [i've never seen it, have you?] form of binding time improvement. It seems likely that applying the idea behind partially static structures to partially static bit-fields could automate this division. Eventually an analogue of variable splitting might be used to further optimize a number of bit-preserving operations.

If x is a dynamic variable, then x2 is the static low two bits.

Elimination of the cache test requires variable splitting and improving the binding time of the eq? function so that (eq? v v) --> #t.

When the compiler runs it could produce eg:

Similar optimizations include allowing the user to write a one-channel filter function, and then applying it to grayscale, rgb, and rgba images without loss of performance. Applying such a function to an 8-bit indexed image is the next step.