Inspired by the question Difference in initalizing and zeroing an array in c/c++ ?, I decided to actually examine the assembly of, in my case, an optimized release build for Windows Mobile Professional (ARM processor, from the Microsoft Optimizing Compiler). What I found was somewhat surprising, and I wonder if someone can shed some light on my questions concerning it.
These two examples are examined:
byte a[10] = { 0 };
byte b[10];
memset(b, 0, sizeof(b));
They are used in the same function, so the stack looks like this:
[ ] // padding byte to reach DWORD boundary
[ ] // padding byte to reach DWORD boundary
[ ] // b[9] (last element of b)
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ] // b[0] = sp + 12 (stack pointer + 12 bytes)
[ ] // padding byte to reach DWORD boundary
[ ] // padding byte to reach DWORD boundary
[ ] // a[9] (last element of a)
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ] // a[0] = sp (stack pointer, at bottom)
The generated assembly with my comments:
; byte a[10] = { 0 };
01: mov r3, #0 // r3 = 0
02: mov r2, #9 // 3rd arg to memset: 9 bytes, note that sizeof(a) = 10
03: mov r1, #0 // 2nd arg to memset: 0-initializer
04: add r0, sp, #1 // 1st arg to memset: &a[1] = a + 1, since only 9 bytes will be set
05: strb r3, [sp] // a[0] = r3 = 0, sets the first element of a
06: bl memset // continue in memset
; byte b[10];
; memset(b, 0, sizeof(b));
07: mov r2, #0xA // 3rd arg to memset: 10 bytes, sizeof(b)
08: mov r1, #0 // 2nd arg to memset: 0-initializer
09: add r0, sp, #0xC // 1st arg to memset: sp + 12 bytes (the 10 elements
// of a + 2 padding bytes for alignment) = &b[0]
10: bl memset // continue in memset
Now, there are two things that confuses me:
- What's the point of lines 02 and 05? Why not just give &a[0] and 10 bytes to memset?
- Why isn't the padding bytes of a 0-initialized? Is that only for padding in structs?
Edit: I was too curious to not test the struct case:
struct Padded
{
DWORD x;
byte y;
};
The assembler for 0-initializing it:
; Padded p1 = { 0 };
01: mov r3, #0
02: str r3, [sp]
03: mov r3, #0
04: str r3, [sp, #4]
; Padded p2;
; memset(&p2, 0, sizeof(p2));
05: mov r3, #0
06: str r3, [sp]
07: andcs r4, r0, #0xFF
08: str r3, [sp, #4]
Here we see in line 04 that a padding indeed occur, since str
(as opposed to strb
) is used. Right?
-
The reason for lines 2 and 5 is because you specified a 0 in the array initializer. The compiler will initialize all constants then pad out the rest using memset. If you were to put two zeros in your initializer, you'd see it strw (word instead of byte) then memset 8 bytes.
As for the padding, it's only used to align memory accesses -- the data shouldn't be used under normal circumstances, so memsetting it is wasteful.
Edit: For the record, I may be wrong about the strw assumption above. 99% of my ARM experience is reversing code generated by GCC/LLVM on the iPhone, so my assumption may not carry over to MSVC.
-
Both bits of code are bugfree. The two lines mentioned aren't smart, but you're just proving that this compiler is emitting suboptimal code.
Padding bytes are usually only initialized if that simplifies the assembly or speeds up the code. E.g if you have padding between two zero-filled members, it's often easier to zero-fill the padding as well. Also, if you have padding at the end and your memset() is optimized for multi-byte writes, it may be faster to overwrite that padding too.
Cody Brocious : Actually, this code very well could be optimal. The way instructions are pipelined on ARM could easily make it more efficient to strb then branch off and loop. That said, the performance difference would likely be negligible, and you're using an extra 4 bytes, so who knows.MSalters : Unlikely. You've got unaligned memory accesses (one byte and 9 bytes - ARM often has a 16 bits bus. That means read/modify/write!). Also, you've got extra register pressure : you need R3 as well. -
Some quick testing indicates that Microsoft's x86 compiler generates different assembly if the initializer list is empty, compared to when it contains a zero. Maybe their ARM compiler does too. What happens if you do this?
byte a[10] = { };
Here's the assembly listing I got (with options
/EHsc /FAs /O2
on Visual Studio 2008). Note that including a zero in the initializer list causes the compiler to use unaligned memory accesses to initialize the array, while the empty initializer list version and thememset()
version both use aligned memory accesses:; unsigned char a[10] = { }; xor eax, eax mov DWORD PTR _a$[esp+40], eax mov DWORD PTR _a$[esp+44], eax mov WORD PTR _a$[esp+48], ax ; unsigned char b[10] = { 0 }; mov BYTE PTR _b$[esp+40], al mov DWORD PTR _b$[esp+41], eax mov DWORD PTR _b$[esp+45], eax mov BYTE PTR _b$[esp+49], al ; unsigned char c[10]; ; memset(c, 0, sizeof(c)); mov DWORD PTR _c$[esp+40], eax mov DWORD PTR _c$[esp+44], eax mov WORD PTR _c$[esp+48], ax
Matt Joiner : wooh!! why on earth does it do that? :P at least you'd expect the explicit 0 initialization to first copy the value in al, to all bytes in eax. it's like an optimization was half-done for explicit initialization using 0.
0 comments:
Post a Comment