Thursday, September 27, 2012

Evil C++ 6: Default Method Return Values!?

Caveat: Contents certainly platform-dependent -- these results are from GCC 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3). Your mileage may vary. If you get different results on other platforms, I'd love to hear about it in the comments!

Today I came across a wonderfully devious C++ gotcha, which tops my "evil C++" charts so far: failing to return x at the end of a non-void member function, you get as the return value the address of the instance (i.e. this), cast to the return type of the function.

Of course, one should always return something from a non-void function. But it's an easy mistake to make, due to typo or misconception.

Consider, for example:



Quite surprisingly, this compiles with no errors or warnings by default!

Enabling -Wreturn-type (which comes with -Wall) does get us this:

$ g++ -o null_member null_member.cpp -g -O0 -Wall
null_member.cpp: In member function ‘int foo::thinger()’:
null_member.cpp:5:24: warning: no return statement in function returning non-void [-Wreturn-type]

It seems like this should always be an error... there is never a time that this is a good idea.


In any case, the program outputs:

$ ./null_member 
f @ 0x7fffc30e54cf
f.thinger() = c30e54cf

Clearly, nothing in the C++ code is doing this.


But having studied Apple IIe assembly for half a semester back in high school, I thought I'd try my luck with the GDB disassembler.

$ gdb ./null_member
Reading symbols from null_member...done.
(gdb) disassemble main
Dump of assembler code for function main(int, char**):
   0x00000000004004f4 <+0>:     push   %rbp
   0x00000000004004f5 <+1>:     mov    %rsp,%rbp
   0x00000000004004f8 <+4>:     sub    $0x20,%rsp
   0x00000000004004fc <+8>:     mov    %edi,-0x14(%rbp)
   0x00000000004004ff <+11>:    mov    %rsi,-0x20(%rbp)
   0x0000000000400503 <+15>:    lea    -0x1(%rbp),%rax
   0x0000000000400507 <+19>:    mov    %rax,%rdi
   0x000000000040050a <+22>:    callq  0x400544 <foo::thinger()>
   0x000000000040050f <+27>:    mov    %eax,-0x8(%rbp)
   0x0000000000400512 <+30>:    lea    -0x1(%rbp),%rax
   0x0000000000400516 <+34>:    mov    %rax,%rsi
   0x0000000000400519 <+37>:    mov    $0x40063c,%edi
   0x000000000040051e <+42>:    mov    $0x0,%eax
   0x0000000000400523 <+47>:    callq  0x4003f0 <printf@plt>
   0x0000000000400528 <+52>:    mov    -0x8(%rbp),%eax
   0x000000000040052b <+55>:    mov    %eax,%esi
   0x000000000040052d <+57>:    mov    $0x400644,%edi
   0x0000000000400532 <+62>:    mov    $0x0,%eax
   0x0000000000400537 <+67>:    callq  0x4003f0 <printf@plt>
   0x000000000040053c <+72>:    mov    $0x0,%eax
   0x0000000000400541 <+77>:    leaveq 
   0x0000000000400542 <+78>:    retq   
End of assembler dump
(gdb) disassemble foo::thinger
Dump of assembler code for function foo::thinger():
   0x0000000000400544 <+0>:     push   %rbp
   0x0000000000400545 <+1>:     mov    %rsp,%rbp
   0x0000000000400548 <+4>:     mov    %rdi,-0x8(%rbp)
   0x000000000040054c <+8>:     pop    %rbp
   0x000000000040054d <+9>:     retq   
End of assembler dump.


Compare the last bit to the assembly for a thinger that returns 42:

Dump of assembler code for function foo::thinger():
   0x0000000000400544 <+0>:     push   %rbp
   0x0000000000400545 <+1>:     mov    %rsp,%rbp
   0x0000000000400548 <+4>:     mov    %rdi,-0x8(%rbp)
   0x000000000040054c <+8>:     mov    $0x2a,%eax
   0x0000000000400551 <+13>:    pop    %rbp
   0x0000000000400552 <+14>:    retq   
End of assembler dump.


In main(), at instruction 0x40050f, register %eax is copied into -0x8(%rbp), or the location of variable i; the 32-bit accumulator register %eax is used to store the return value. Presumably this is a shortcut; writing to a known register is quicker than pushing a return value onto the stack.

In the latter thinger, 0x2a (42) is written to %eax at instruction 0x40054c, but in the former, we don't do anything. %eax is just whatever it happened to be... essentially an uninitialized variable.

In the case of normal (static) function call, this is true -- %eax is just some leftover junk. But for a method call, GCC generates the following:

   0x00000000004004d2 <+30>:    lea    -0x1(%rbp),%rax
   0x00000000004004d6 <+34>:    mov    %rax,%rdi
   0x00000000004004d9 <+37>:    callq  0x4004e6 <foo::thinger()>
   0x00000000004004de <+42>:    mov    %eax,-0x8(%rbp)

lea (load equivalent address) copies the address of f (here, one byte before the base pointer %rbp) into %rax for temporary storage. Since %rax is a 64-bit wide register of which %eax comprises the lower half, %eax is left with part of the address of f, and that's what gets interpreted as the return value. The call to foo::thinger was expected to modify it, but didn't.


It's a wonderful piece of evil. It's a plausible typo bug which compiles without error and causes functions to return corrupt data. It depends on compiler- and machine-specific, assembly-level implementation details, invisible in the source code. Bug reports will vary by platform. And I have no evidence of this, but "no return means return NULL-ish" sure sounds like common C++ misconception.

I caught it because I happened to be returning a pointer, which caused a segfault on deference. But a float could easily go unnoticed, as could a pointer of the same class as self.

Happy coding, and beware of C++!

Wednesday, September 26, 2012

CouchDB Clustering and BigCouch

I'm a big fan of CouchDB as a quick and easy document store backend. However, the advertised feature list is sometimes a bit... ambitious. You "can" do a lot of things which you just shouldn't, where a naive application will almost certainly lead to trouble in production.

One such thing is master-master replication. It is easy to set up bi-directional continuous replication between two CouchDB instances. Throw a load balancer in front of them, and in certain, limited situations this could perform as a high-availability cluster.

A serious issue with this setup is handling conflicts when a document is updated on multiple nodes. The CouchDB documentation has a detailed explanation of how it handles conflicts, and suggests some client-side code for getting the right version of a conflicting document.

In practice, I've had a lot of trouble with this scheme in CouchDB 1.1.0. I have seen view indexes fail to update with the merge "winner" on replication, leading to a situation where the view results do not represent the underlying data. And since this "can't happen," there is no easy way to force a reindex.

In any case, it would be much better to have the cluster do the work for us.

Clustering

The CouchDB Guide has a chapter on clustering, which recommends using the CouchDB Lounge clustering framework. Lounge is a proxy which sits in front of several CouchDB servers. It has a few parts: a "dumb" proxy which redirects non-view requests to any node, a "smart" proxy which fans out views across several nodes, and a replication tool to make data redundant.

I find the guide's recommendation surprising, as there are a few problems with Lounge:

  • The deploy process is very tied to (now-defunct) Meebo's production platform, based on RPMs
  • It hasn't been touched in over three years, despite promises for fixes in "the near future"
  • It relies CouchDB's built-in conflict resolution
  • It relies on a custom patch to CouchDB

Enter BigCouch

Fortunately, the fine people at Cloudant needed CouchDB clusters to actually work in order to make money, so they developed BigCouch to solve these problems.

BigCouch is not without its own issues, but these are mostly political -- since its inception a few years ago there has been talk of merging BigCouch back into CouchDB, yet the passive-aggressive Twitter arguments continue. Also, they do not seem to be particularly worried about build tests.

Installation

Setting up a BigCouch cluster on Ubuntu is very easy; I was up and running on precise in minutes. The only hangup was that bigcouch isn't packaged for precise, nor is the version of libicu (exactly 4.4) on which it depends.
  1. Get libicu44
    1. Download the package from Natty Narwhal: http://packages.ubuntu.com/natty/libicu44
    2. dpkg -i libicu44_4.4.2-2ubuntu0.11.04.1_amd64.deb # or i386
  2. Add cloudant repository for Oneiric Ocelot:
    1. echo "deb http://packages.cloudant.com/ubuntu oneiric main" | sudo tee /etc/apt/sources.list.d/cloudant.list
    2. apt-get update
  3. apt-get install bigcouch
Oddly, and unlike most services, it starts automatically when you install it. The service is managed by sv (see /etc/services/bigcouch). You can start and stop it with sv up bigcouch and sv down bigcouch. Configuration, libraries, and binaries are installed into /opt/bigcouch.

Configuration

Now, there is a little configuration (described in more detail in Installing & Using BigCouch).

Edit /opt/bigcouch/etc/vm.args and change:
  • -name bigcouch@thiscomputer.example.com
  • -setcookie some_secret_string
etc/defaults.ini and etc/local.ini work much like in CouchDB -- here you can fiddle with ports, enable SSL, etc. Note that the [chttpd] section describes the user-facing CouchDB server, and [httpd] describes the BigCouch "backdoor" used for administration and cluster coordination.

Your nodes will need to talk. Configure your firewall so that they can see each others' ports 5984 (CouchDB), 5986 (BigCouch), and 4369 (Erlang port mapper), plus any used for SSL.

Building the Cluster

To add a node to the cluster, you use the admin JSON API exposed on port 5986:

curl -X PUT http://thiscomputer.example.com:5986/nodes/bigcouch@othernode.example.com -d {}

(And vice versa)

Now, you should be able to interact with the CouchDB server on port 5984 of either node (ignoring completely the BigCouch-ness going on) and changes should appear immediately on both nodes.

The internal sharding and replication mechanism is based on Amazon's Dynamo Paper, which is pretty generally regarded as the definitive guide to robust cluster storage.

The API to BigCouch necessarily differs a bit from CouchDB's: for example, _stats lives on the admin side since it is now cluster-wide. The differences are outlined in the BigCouch API docs.

High Availability, etc.

The nodes talk in the background to ensure consistency and all are peers, so you can connect to any one you want. For high availability, put a load balancer or other such routing in front of the nodes to create a single point of entry. BigCouch recommends HAProxy. (And for an illuminating discussion of why you might use keepalived but not heartbeat, see this post).

    Thursday, May 10, 2012

    Creating custom directives in a Python docutils ReST parser

    Parsing some ReStructured Text with Python's docutils, and want to make your own custom directives? Here's a minimal example.

    Tuesday, April 24, 2012

    Intel RS2WC080 + 3TB Hitachi DeskStar + Ubuntu 10.04 LTS

    Ubuntu 10.04 LTS is good for servers (long lifecycle), but the packaged drivers for LSI MegaRAID may be too old for the latest PCI-E LSI RAID chipsets (SAS200x, etc.). Mostly these chipsets are found in other vendors' products, and driver source availability varies.

    Test system:

    • Intel DX68SO2 Motherboard
    • Intel Xeon W3565 CPU
    • Intel RS2WC080 RAID Controller
    • 1x 32 GB OCZ Onyx SSD
    • 12x 3 TB Hitachi Deskstar H3IK30003254SW
    • Ubuntu 10.04.3 LTS
    where the OS lives on the SSD and the 12x comprise a (software) RAID 5 array. Eight Deskstars are on the Intel RAID controller; the remaining four are on the motherboard SATA controller.

    The RAID controller found the eight attached drives and automatically configured them as individual JBODs. dmesg shows that the controller initialized properly (module megaraid_sas v4.7) and lists the drives, but they are not enumerated.

    Since Intel only provided RPMs, I went to LSI's site to grab the latest driver source. They don't explicitly provide downloads for "Intel" products, but it's all the same driver, so I found a similar card (this one) and grabbed the zip file for Ubuntu 10.04 LTS (v5.30 at time of writing).

    The zip file contains three tarballs (whatever!): compiled modules for two specific kernels, and the source. Following the included instructions for "recompiling,"
    1. apt-get install build-essential libncurses5 libncurses5-dev linux-headers-`uname -r`
    2. ln -s /usr/src/linux-headers-`uname -r` /usr/src/linux
    3. unzip Ubuntu_10.04_LTS_05.30.zip
    4. tar zxvf megaraid_sas-v00.00.05.30-src.gz; cd megaraid_sas-v00.00.05.30
    5. make -C /lib/modules/`uname -r`/build/ M=`pwd`
    6. cp megaraid_sas.ko /lib/modules/`uname -r`/kernel/drivers/scsi/megaraid
    7. mv /boot/initrd.img-`uname -r` ~/initrd.img.backup
    8. update-initramfs -c -k `uname -r`
    Note that step 6 is wrong in LSI's instructions -- the driver lives in scsi/megaraid/, not in scsi/.

    You should now be able to install the new module (no rebooting):
    1. rmmod megaraid_sas
    2. cd /lib/modules/`uname -r`/kernel/drivers/scsi/megaraid
    3. insmod megaraid_sas.ko
    `modinfo megaraid_sas` should report the correct version number (5.30), and dmesg should show the drives enumerated.

    Now, RAID 'em up! For example (assuming drives have been partitioned), mdadm -Cv /dev/md0 -l5 -n12 /dev/sd[bcdefghijklm]1. Party on.

    Monday, January 23, 2012

    Evil C++ #5: Member function pointer special

    Function pointers are one of the most esoteric and misunderstood features of C, and little was done in C++ to improve that situation (until C++11). I recently came to appreciate the peculiar evil of member function pointers in C++.

    Friend of mine has a class, call it Foo, with a member function, call it threadMe. When constructed, Foo should create a new pthreads-like thread running threadMe, which will have access to this Foo's members.

    That's all I know. Maybe there's a way to refactor it, but this is inside a tangle of library GUI code, which is always a mess (especially when that library is ROOT), and it's academically interesting.

    Phase 0: Introduction
    Before thinking about this, I had to refresh my memory on function pointers. Here is a simple example:

    Even the somewhat arcane (*pointer)(args,args) syntax isn't not so bad. You're clearly dereferencing a pointer to a function.

    Phase 1: Denial
    The extension from the above globally-scoped functions to class member functions seems obvious. If the member is static, I should be able to do this:

    int (*pStaticMemberFunction)(int);
    pStaticMemberFunction = &(Foo::threadMe);

    (Full program here). In C++, static member functions behave exactly like non-member functions.

    For a non-static member function, one would reasonably expect the following syntax (full program):

      Foo* o = new Foo;
      int (*pMemberFunction)(int);
      pMemberFunction = &(o->threadMe);

    NOPE. This isn't valid:

      error: ISO C++ forbids taking the address of a bound
      member function to form a pointer to member function.
      Say ‘&Foo::threadMe’ 

    But... but... compiler. I don't want any Foo's threadMe. I want o's threadMe. Of course, if you foolishly follow GCC's advice anyway, hoping for some kind of magic, no. Just no.

      error: invalid use of non-static member function
      ‘int Foo::threadMe(int)’

    Saw that one coming.

    It appears that the &(object->memberFunction) syntax was actually considered for C++ at one point, as described in this heartbreaking 1994 article in the "Callbacks Using Template Functors" section. But in the end, this feature was deemed too complicated for the language and the compilation thereof, and left for libraries to implement. Some have, notably Boost.

    Phase 2: Bad Hacks
    There appears to be no general solution to the above problem, which is more than a syntactic issue, but has deep roots in the language's representation of function pointers. To proceed, I clearly need a pointer to a function with external linkage. The goal is to use such a function as a proxy: somehow, I call it and it calls the real (non-static member) function for me.

    Bad Hack #1: Inadvisable Casting: The C++ FAQ strongly discourages casting a function pointer to a void pointer, but such a cast could allow me to construct a static wrapper function which would take an instance and a cleverly disguised member function pointer as an argument, solving the problem. At least for me, this casting works. However, there is no way to cast the void* back to a member function pointer that you could actually use. If C++ supported rvalue references, you could do something like

    (void*) fnPtr = voidPtrArgument;

    and perhaps get away with it, but it would be really evil C++. At any rate, it doesn't work.

    Bad Hack #2: The Class With Incredible Foresight / Sledgehammer Approach: Another bad option is to create a mysterious superclass with virtual member functions having a wide variety of prototypes. For each of these prototypes, you can use a typedef to create a type meaning "pointer to a function like this one," e.g.

    typedef int (SuperClass::*pSuperClassMemFnTakesIntReturnsInt)(int);
    typedef void (SuperClass::*pSuperClassMemFnTakesIntReturnsVoid)(int);

    Then, for each of these types, create a global launcher function, like

    void* launchMember(SuperClass* o, pSuperClassMemFnTakesIntReturnsInt p)

    Rather unwieldy. But, with this vast library of types, you can do whatever you want. Subclass "SuperClass" and implement member functions of your choosing, then call

    launchMember(superClassInstance, &SuperClass::whateverFunction)

    with abandon. A example "solving" the original problem is here.

    Bad Hack #3: One Member Function to Rule Them All: The least kludgy option, I think. Sacrifice some generality and pick one exact function prototype (name and all), which will the the thready one. Then you can at least use the launcher for any class that has that prototype. This is cheating: the global launcher function doesn't take an instance and a member function pointer, it just takes an instance and we "know" which function to call.

    A working example can be found here. This approach was partly inspired by this thread.

    Phase 3: The Solution, But You Won't Like It
    The best, most general solution is to change the language, which they just did in the new C++11 standard. The biggest deal is the introduction of a proper native threading library. Lambda functions also provide the abstraction needed to make creating threads with ad-hoc functions pretty slick. In all its glory:


    This is great if you're running GCC >= 4.6. Otherwise, I guess Bad Hack #3 is probably best. If anyone else has better ideas, let me know in the comments!