the mips abi reserves stack space equal to the size of the in-register
args for the callee to save the args, if desired. this would cause the
beginning of the thread structure to be clobbered...
the old code worked in qemu app-level emulation, but not on real
kernels where the clone syscall does not copy the register values to
the new thread. save arguments on the new thread stack instead.