The new implementation is capable of iterating the keyspace but also
sets, hashes, and sorted sets, and can be used to implement SSCAN, ZSCAN
and HSCAN.
Sometimes when we resurrect a cached master after a successful partial
resynchronization attempt, there is pending data in the output buffers
of the client structure representing the master (likely REPLCONF ACK
commands).
If we don't reinstall the write handler, it will never be installed
again by addReply*() family functions as they'll assume that if there is
already data pending, the write handler is already installed.
This bug caused some slaves after a successful partial sync to never
send REPLCONF ACK, and continuously being detected as timing out by the
master, with a disconnection / reconnection loop.
Since we started sending REPLCONF ACK from slaves to masters, the
lastinteraction field of the client structure is always refreshed as
soon as there is room in the socket output buffer, so masters in timeout
are detected with too much delay (the socket buffer takes a lot of time
to be filled by small REPLCONF ACK <number> entries).
This commit only counts data received as interactions with a master,
solving the issue.
There was a bug that over-esteemed the amount of backlog available,
however this could only happen when a slave was asking for an offset
that was in the "future" compared to the master replication backlog.
Now this case is handled well and logged as an incident in the master
log file.
The code freed a reply object that was never created, resulting in a
segfault every time randomkey returned a key that was deleted before we
queried it for size.
Multiple missing calls to lua_pop prevented the error handler function
pushed on the stack for lua_pcall() to be popped before returning,
causing a memory leak in almost all the code paths of EVAL (both
successful calls and calls returning errors).
This caused two issues: Lua leaking memory (and this was very visible
from INFO memory output, as the 'used_memory_lua' field reported an
always increasing amount of memory used), and as a result slower and
slower GC cycles resulting in all the CPU being used.
Thanks to Tanguy Le Barzic for noticing something was wrong with his 2.8
slave, and for creating a testing EC2 environment where I was able to
investigate the issue.
When no encoding is possible, at least try to reallocate the sds string
with one that does not waste memory (with free space at the end of the
buffer) when the string is large enough.
We are sure that a string that is longer than 21 chars cannot be
represented by a 64 bit signed integer, as -(2^64) is 21 chars:
strlen(-18446744073709551616) => 21