It looks safer to return C_OK from freeMemoryIfNeeded() when clients are
paused because returning C_ERR may prevent success of writes. It is
possible that there is no difference in practice since clients cannot
execute writes while clients are paused, but it looks more correct this
way, at least conceptually.
Related to PR #4028.
1. brpop last key index, thus checking all keys for slots.
2. Memory leak in clusterRedirectBlockedClientIfNeeded.
3. Remove while loop in clusterRedirectBlockedClientIfNeeded.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.
On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.
This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
The __sync builtin can be correctly detected by Helgrind so to force it
is useful for testing. The API in the INFO output can be useful for
debugging after problems are reported.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
The master client cleanup was incomplete: resetClient() was missing and
the output buffer of the client was not reset, so pending commands
related to the previous connection could be still sent.
The first problem caused the client argument vector to be, at times,
half populated, so that when the correct replication stream arrived the
protcol got mixed to the arugments creating invalid commands that nobody
called.
Thanks to @yangsiran for also investigating this problem, after
already providing important design / implementation hints for the
original PSYNC2 issues (see referenced Github issue).
Note that this commit adds a new function to the list library of Redis
in order to be able to reset a list without destroying it.
Related to issue #3899.
Normally we never check for OOM conditions inside Redis since the
allocator will always return a pointer or abort the program on OOM
conditons. However we cannot have control on epool_create(), that may
fail for kernel OOM (according to the manual page) even if all the
parameters are correct, so the function aeCreateEventLoop() may indeed
return NULL and this condition must be checked.
During the review of the fix for #3899, @yangsiran identified an
implementation bug: given that the offset is now relative to the applied
part of the replication log, when we cache a master, the successive
PSYNC2 request will be made in order to *include* the transaction that
was not completely processed. This means that we need to discard any
pending transaction from our replication buffer: it will be re-executed.