Thursday, September 27, 2012

Evil C++ 6: Default Method Return Values!?

Caveat: Contents certainly platform-dependent -- these results are from GCC 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3). Your mileage may vary. If you get different results on other platforms, I'd love to hear about it in the comments!

Today I came across a wonderfully devious C++ gotcha, which tops my "evil C++" charts so far: failing to return x at the end of a non-void member function, you get as the return value the address of the instance (i.e. this), cast to the return type of the function.

Of course, one should always return something from a non-void function. But it's an easy mistake to make, due to typo or misconception.

Consider, for example:



Quite surprisingly, this compiles with no errors or warnings by default!

Enabling -Wreturn-type (which comes with -Wall) does get us this:

$ g++ -o null_member null_member.cpp -g -O0 -Wall
null_member.cpp: In member function ‘int foo::thinger()’:
null_member.cpp:5:24: warning: no return statement in function returning non-void [-Wreturn-type]

It seems like this should always be an error... there is never a time that this is a good idea.


In any case, the program outputs:

$ ./null_member 
f @ 0x7fffc30e54cf
f.thinger() = c30e54cf

Clearly, nothing in the C++ code is doing this.


But having studied Apple IIe assembly for half a semester back in high school, I thought I'd try my luck with the GDB disassembler.

$ gdb ./null_member
Reading symbols from null_member...done.
(gdb) disassemble main
Dump of assembler code for function main(int, char**):
   0x00000000004004f4 <+0>:     push   %rbp
   0x00000000004004f5 <+1>:     mov    %rsp,%rbp
   0x00000000004004f8 <+4>:     sub    $0x20,%rsp
   0x00000000004004fc <+8>:     mov    %edi,-0x14(%rbp)
   0x00000000004004ff <+11>:    mov    %rsi,-0x20(%rbp)
   0x0000000000400503 <+15>:    lea    -0x1(%rbp),%rax
   0x0000000000400507 <+19>:    mov    %rax,%rdi
   0x000000000040050a <+22>:    callq  0x400544 <foo::thinger()>
   0x000000000040050f <+27>:    mov    %eax,-0x8(%rbp)
   0x0000000000400512 <+30>:    lea    -0x1(%rbp),%rax
   0x0000000000400516 <+34>:    mov    %rax,%rsi
   0x0000000000400519 <+37>:    mov    $0x40063c,%edi
   0x000000000040051e <+42>:    mov    $0x0,%eax
   0x0000000000400523 <+47>:    callq  0x4003f0 <printf@plt>
   0x0000000000400528 <+52>:    mov    -0x8(%rbp),%eax
   0x000000000040052b <+55>:    mov    %eax,%esi
   0x000000000040052d <+57>:    mov    $0x400644,%edi
   0x0000000000400532 <+62>:    mov    $0x0,%eax
   0x0000000000400537 <+67>:    callq  0x4003f0 <printf@plt>
   0x000000000040053c <+72>:    mov    $0x0,%eax
   0x0000000000400541 <+77>:    leaveq 
   0x0000000000400542 <+78>:    retq   
End of assembler dump
(gdb) disassemble foo::thinger
Dump of assembler code for function foo::thinger():
   0x0000000000400544 <+0>:     push   %rbp
   0x0000000000400545 <+1>:     mov    %rsp,%rbp
   0x0000000000400548 <+4>:     mov    %rdi,-0x8(%rbp)
   0x000000000040054c <+8>:     pop    %rbp
   0x000000000040054d <+9>:     retq   
End of assembler dump.


Compare the last bit to the assembly for a thinger that returns 42:

Dump of assembler code for function foo::thinger():
   0x0000000000400544 <+0>:     push   %rbp
   0x0000000000400545 <+1>:     mov    %rsp,%rbp
   0x0000000000400548 <+4>:     mov    %rdi,-0x8(%rbp)
   0x000000000040054c <+8>:     mov    $0x2a,%eax
   0x0000000000400551 <+13>:    pop    %rbp
   0x0000000000400552 <+14>:    retq   
End of assembler dump.


In main(), at instruction 0x40050f, register %eax is copied into -0x8(%rbp), or the location of variable i; the 32-bit accumulator register %eax is used to store the return value. Presumably this is a shortcut; writing to a known register is quicker than pushing a return value onto the stack.

In the latter thinger, 0x2a (42) is written to %eax at instruction 0x40054c, but in the former, we don't do anything. %eax is just whatever it happened to be... essentially an uninitialized variable.

In the case of normal (static) function call, this is true -- %eax is just some leftover junk. But for a method call, GCC generates the following:

   0x00000000004004d2 <+30>:    lea    -0x1(%rbp),%rax
   0x00000000004004d6 <+34>:    mov    %rax,%rdi
   0x00000000004004d9 <+37>:    callq  0x4004e6 <foo::thinger()>
   0x00000000004004de <+42>:    mov    %eax,-0x8(%rbp)

lea (load equivalent address) copies the address of f (here, one byte before the base pointer %rbp) into %rax for temporary storage. Since %rax is a 64-bit wide register of which %eax comprises the lower half, %eax is left with part of the address of f, and that's what gets interpreted as the return value. The call to foo::thinger was expected to modify it, but didn't.


It's a wonderful piece of evil. It's a plausible typo bug which compiles without error and causes functions to return corrupt data. It depends on compiler- and machine-specific, assembly-level implementation details, invisible in the source code. Bug reports will vary by platform. And I have no evidence of this, but "no return means return NULL-ish" sure sounds like common C++ misconception.

I caught it because I happened to be returning a pointer, which caused a segfault on deference. But a float could easily go unnoticed, as could a pointer of the same class as self.

Happy coding, and beware of C++!