Picobit native endian
If I'd been paying attention, I'd have noticed that Picobit's memory access primitives emulate big-endian byte ordering. This sort of makes sense in the case of constants and the byte code: It makes picobit's images portable. I'm not actually interested in this, however: I build binaries for specific platforms. Even if I was, it's preferable not to do this for RAM.
After a bit of hacking and debugging I've made RAM access native endian. Actually I've only tested it on little endian systems: X64 and ARM in little endian mode (which is the default for Chibios). I have a very limited performance test: 11 queens. The difference this makes varies depending on the optimisation settings. Picobit suggests -Os. I've also tried -O2. These are the results for amd64:
You get a lot of bang for your buck with -02 vs -Os. I guess the main thing here is that code size differs much less between -O2 and -Os using native endian. There's a big performance benefit. It does seem like if I converted ROM access to native endian, the code size difference between -Os and -O2 might almost disappear.
I think this is just because -O2 in-lines the memory access primitives, and for big-endian emulation, each of those in-lines are bigger.
I haven't got a sensible way of timing this on a micro controller yet, but this result is enough to merge this, and continue on to add an endian-ness flag to the compiler so I can switch constant access to use native endian-ness, because this should further shrink the executable, even if it doesn't have much effect on performance. I don't think there's any point changing the bytecode though: not many instructions include full word values.
-O2 | -Os | |||
Size | Time | Size | Time | |
Native-endian | 53040 | 3.119 | 49864 | 6.340 |
Big-endian | 56848 | 3.733 | 50192 | 7.908 |
-3800 | -16% | -328 | -19% |
You get a lot of bang for your buck with -02 vs -Os. I guess the main thing here is that code size differs much less between -O2 and -Os using native endian. There's a big performance benefit. It does seem like if I converted ROM access to native endian, the code size difference between -Os and -O2 might almost disappear.
I think this is just because -O2 in-lines the memory access primitives, and for big-endian emulation, each of those in-lines are bigger.
I haven't got a sensible way of timing this on a micro controller yet, but this result is enough to merge this, and continue on to add an endian-ness flag to the compiler so I can switch constant access to use native endian-ness, because this should further shrink the executable, even if it doesn't have much effect on performance. I don't think there's any point changing the bytecode though: not many instructions include full word values.