6231 Commits

Author SHA1 Message Date
cbgbt
d048f9721c cli: Only print elapsed time on OUTPUT_STANDARD 2017-06-20 16:54:44 +02:00
Aric Huang
b5f22939c2 (fix) Update create-cluster README
Fix a few typos/adjust wording in `create-cluster` README
2017-06-20 16:54:44 +02:00
antirez
0b7ba621b7 SLOWLOG: log offending client address and name. 2017-06-15 17:02:16 +02:00
Antonio Mallia
1fbc90fe03 Removed duplicate 'sys/socket.h' include 2017-06-15 17:02:16 +02:00
Antonio Mallia
c7a6b711f3 Fixed comment in clusterMsg version field 2017-06-15 17:02:16 +02:00
Qu Chen
73d358f79f Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:16:24 +02:00
antirez
c782d1894b Fix PERSIST expired key resuscitation issue #4048. 2017-06-13 10:37:28 +02:00
antirez
cb548bf3c0 More informative -MISCONF error message. 2017-05-19 12:04:11 +02:00
antirez
8cd6a2bd86 Collect fork() timing info only if fork succeeded. 2017-05-19 12:03:54 +02:00
antirez
a3941aa569 redis-cli --bigkeys: show error when TYPE fails.
Close #3993.
2017-05-15 11:23:55 +02:00
antirez
6b21cebd3d Modules TSC: use atomic var for server.unixtime.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.

On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.

This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
2017-05-11 16:44:46 +02:00
antirez
54bd224f0e atomicvar.h: show used API in INFO. Add macro to force __sync builtin.
The __sync builtin can be correctly detected by Helgrind so to force it
is useful for testing. The API in the INFO output can be useful for
debugging after problems are reported.
2017-05-11 16:44:46 +02:00
antirez
a864d25c6e zmalloc.c: remove thread safe mode, it's the default way. 2017-05-11 16:44:46 +02:00
antirez
b338f2b908 Modules TSC: Add mutex for server.lruclock.
Only useful for when no atomic builtins are available.
2017-05-11 16:44:46 +02:00
antirez
7e9c658d13 Modules TSC: Improve inter-thread synchronization.
More work to do with server.unixtime and similar. Need to write Helgrind
suppression file in order to suppress the valse positives.
2017-05-11 16:44:46 +02:00
antirez
e69af32fd7 Simplify atomicvar.h usage by having the mutex name implicit. 2017-05-11 16:44:46 +02:00
antirez
26e57f177d Lazyfree: fix lazyfreeGetPendingObjectsCount() race reading counter. 2017-05-11 16:44:46 +02:00
antirez
2acf003c05 Modules TSC: HELLO.KEYS reply format fixed. 2017-05-11 16:44:46 +02:00
antirez
12fd298fe7 Modules TSC: put the client in the pending write list. 2017-05-11 16:44:26 +02:00
antirez
5b1afa4a22 adlist: fix final list count in listJoin(). 2017-05-11 16:44:26 +02:00
antirez
717b2eeab3 adlist: fix listJoin() to handle empty lists. 2017-05-11 16:44:26 +02:00
antirez
a839036a1d Modules: remove unused var in example module. 2017-05-11 16:44:26 +02:00
antirez
eda5ee5e91 Modules TSC: HELLO.KEYS example draft finished. 2017-05-11 16:44:26 +02:00
antirez
fb8734fe9c Module: fix RedisModule_Call() "l" specifier to create a raw string. 2017-05-11 16:44:26 +02:00
antirez
c4b884958e Modules TSC: Release the GIL for all the time we are blocked.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
2017-05-11 16:44:26 +02:00
antirez
fcd9a07df0 Modules TSC: Export symbols of the new API. 2017-05-11 16:44:26 +02:00
antirez
8affa3e78f Modules TSC: Handling of RM_Reply* functions. 2017-05-11 16:44:03 +02:00
antirez
31b1f3c1ae Modules TSC: Basic TS context creeation and handling. 2017-05-11 16:44:03 +02:00
antirez
74f3a84390 Modules TSC: GIL and cooperative multi tasking setup. 2017-05-11 16:44:03 +02:00
antirez
5021fda2b9 Regression test for #3899 fixed. 2017-05-11 16:43:46 +02:00
antirez
166bdbda03 Regression test for PSYNC2 issue #3899 added.
Experimentally verified that it can trigger the issue reverting the fix.
At least on my system... Being the bug time/backlog dependant, it is
very hard to tell if this test will be able to trigger the problem
consistently, however even if it triggers the problem once in a while,
we'll see it in the CI environment at http://ci.redis.io.
2017-04-28 10:40:44 +02:00
antirez
b506eb74ac Check event loop creation return value. Fix #3951.
Normally we never check for OOM conditions inside Redis since the
allocator will always return a pointer or abort the program on OOM
conditons. However we cannot have control on epool_create(), that may
fail for kernel OOM (according to the manual page) even if all the
parameters are correct, so the function aeCreateEventLoop() may indeed
return NULL and this condition must be checked.
2017-04-28 10:40:44 +02:00
antirez
806905627c PSYNC2: fix master cleanup when caching it.
The master client cleanup was incomplete: resetClient() was missing and
the output buffer of the client was not reset, so pending commands
related to the previous connection could be still sent.

The first problem caused the client argument vector to be, at times,
half populated, so that when the correct replication stream arrived the
protcol got mixed to the arugments creating invalid commands that nobody
called.

Thanks to @yangsiran for also investigating this problem, after
already providing important design / implementation hints for the
original PSYNC2 issues (see referenced Github issue).

Note that this commit adds a new function to the list library of Redis
in order to be able to reset a list without destroying it.

Related to issue #3899.
2017-04-27 17:08:53 +02:00
antirez
8c4b0f411f Defrag: test currently disabled, too many false positives.
Related to #3786.
2017-04-22 16:00:16 +02:00
antirez
6839c759b8 Reformat 4.0 RC3 change log. 2017-04-22 13:49:41 +02:00
antirez
51b12ed1b5 Defrag: fix test false positive.
Apparently 1.4 is too low compared to what you get in certain setups
(including mine). I raised it to 1.55 that hopefully is still enough to
test that the fragmentation went down from 1.7 but without incurring in
issues, however the test setup may be still fragile so certain times this
may lead to false positives again, it's hard to test for these things
in a determinsitic way.

Related to #3786.
4.0-rc3
2017-04-22 13:23:27 +02:00
antirez
635bbe573a Redis 4.0.0-RC3 (3.9.103). 2017-04-22 13:16:41 +02:00
oranagra
94a7090705 add test for active defrag 2017-04-22 13:16:01 +02:00
antirez
1a7a532e96 Revert "Jemalloc updated to 4.4.0."
This reverts commit 36c1acc222d29e6e2dc9fc25362e4faa471111bd.
2017-04-22 13:12:42 +02:00
antirez
6bc6bd4c38 PSYNC2: discard pending transactions from cached master.
During the review of the fix for #3899, @yangsiran identified an
implementation bug: given that the offset is now relative to the applied
part of the replication log, when we cache a master, the successive
PSYNC2 request will be made in order to *include* the transaction that
was not completely processed. This means that we need to discard any
pending transaction from our replication buffer: it will be re-executed.
2017-04-20 07:58:24 +02:00
antirez
a91cc5bc2d Fix PSYNC2 incomplete command bug as described in #3899.
This bug was discovered by @kevinmcgehee and constituted a major hidden
bug in the PSYNC2 implementation, caused by the propagation from the
master of incomplete commands to slaves.

The bug had several results:

1. Borrowing from Kevin text in the issue: "Given that slaves blindly
copy over their master's input into their own replication backlog over
successive read syscalls, it's possible that with large commands or
small TCP buffers, partial commands are present in this buffer. If the
master were to fail before successfully propagating the entire command
to a slave, the slaves will never execute the partial command (since the
client is invalidated) but will copy it to replication backlog which may
relay those invalid bytes to its slaves on PSYNC2, corrupting the
backlog and possibly other valid commands that follow the failover.
Simple command boundaries aren't sufficient to capture this, either,
because in the case of a MULTI/EXEC block, if the master successfully
propagates a subset of the commands but not the EXEC, then the
transaction in the backlog becomes corrupt and could corrupt other
slaves that consume this data."

2. As identified by @yangsiran later, there is another effect of the
bug. For the same mechanism of the first problem, a slave having another
slave, could receive a full resynchronization request with an already
half-applied command in the backlog. Once the RDB is ready, it will be
sent to the slave, and the replication will continue sending to the
sub-slave the other half of the command, which is not valid.

The fix, designed by @yangsiran and @antirez, and implemented by
@antirez, uses a secondary buffer in order to feed the sub-masters and
update the replication backlog and offsets, only when a given part of
the query buffer is actually *applied* to the state of the instance,
that is, when the command gets processed and the command is not pending
in the Redis transaction buffer because of CLIENT_MULTI state.

Given that now the backlog and offsets representation are in agreement
with the actual processed commands, both issue 1 and 2 should no longer
be possible.

Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in
identifying and designing a fix for this problem.
2017-04-20 07:58:22 +02:00
antirez
278972ceb1 Fix getKeysUsingCommandTable() in cluster mode.
Close #3940.
2017-04-20 07:57:44 +02:00
张文康
2028501776 update block->free after some diff data are written to the child process 2017-04-20 07:57:44 +02:00
Jan-Erik Rediger
05ac217f2f Reorder to make dict-benchmark compile on Linux
Fixes #3944
2017-04-18 16:31:57 +02:00
antirez
8d44c52ae3 Fix #3848 by closing the descriptor on error. 2017-04-18 16:24:50 +02:00
antirez
5c107c627e Clarify why we save ziplist elements in revserse order.
Also get rid of variables that are now kinda redundant, since the
dictionary iterator was removed.

This is related to PR #3949.
2017-04-18 16:18:06 +02:00
spinlock
2299641417 rdb: saving skiplist in reversed order to accelerate the deserialisation process 2017-04-18 16:18:01 +02:00
antirez
d98ef35a11 Cluster: discard pong times in the future.
However we allow for 500 milliseconds of tolerance, in order to
avoid often discarding semantically valid info (the node is up)
because of natural few milliseconds desync among servers even when
NTP is used.

Note that anyway we should ping the node from time to time regardless and
discover if it's actually down from our point of view, since no update
is accepted while we have an active ping on the node.

Related to #3929.
2017-04-18 16:17:45 +02:00
antirez
e47c8e3f6a Test: fix, hopefully, false PSYNC failure like in issue #2715.
And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.
2017-04-18 16:17:42 +02:00
antirez
1e659a04cf Cluster: always add PFAIL nodes at end of gossip section.
To rely on the fact that nodes in PFAIL state will be shared around by
randomly adding them in the gossip section is a weak assumption,
especially after changes related to sending less ping/pong packets.

We want to always include gossip entries for all the nodes that are in
PFAIL state, so that the PFAIL -> FAIL state promotion can happen much
faster and reliably.

Related to #3929.
2017-04-18 16:17:39 +02:00