5664 Commits

Author SHA1 Message Date
antirez
0a45fbc302 Ability of slave to announce arbitrary ip/port to master.
This feature is useful, especially in deployments using Sentinel in
order to setup Redis HA, where the slave is executed with NAT or port
forwarding, so that the auto-detected port/ip addresses, as listed in
the "INFO replication" output of the master, or as provided by the
"ROLE" command, don't match the real addresses at which the slave is
reachable for connections.
2016-07-28 13:05:19 +02:00
antirez
c3982c0905 redis-benchmark: new option to show server errors on stdout.
Disabled by default, can be activated with -e. Maybe the reverse was
more safe but departs from the past behavior.
2016-07-28 13:04:42 +02:00
antirez
fdafe23315 Multiple GEORADIUS bugs fixed.
By grepping the continuous integration errors log a number of GEORADIUS
tests failures were detected.

Fortunately when a GEORADIUS failure happens, the test suite logs enough
information in order to reproduce the problem: the PRNG seed,
coordinates and radius of the query.

By reproducing the issues, three different bugs were discovered and
fixed in this commit. This commit also improves the already good
reporting of the fuzzer and adds the failure vectors as regression
tests.

The issues found:

1. We need larger squares around the poles in order to cover the area
requested by the user. There were already checks in order to use a
smaller step (larger squares) but the limit set (+/- 67 degrees) is not
enough in certain edge cases, so 66 is used now.

2. Even near the equator, when the search area center is very near the
edge of the square, the north, south, west or ovest square may not be
able to fully cover the specified radius. Now a test is performed at the
edge of the initial guessed search area, and larger squares are used in
case the test fails.

3. Because of rounding errors between Redis and Tcl, sometimes the test
signaled false positives. This is now addressed.

Whenever possible the original code was improved a bit in other ways. A
debugging example stanza was added in order to make the next debugging
session simpler when the next bug is found.
2016-07-27 12:11:31 +02:00
antirez
a1bfe22a80 Replication: when possible start RDB saving ASAP.
In a previous commit the replication code was changed in order to
centralize the BGSAVE for replication trigger in replicationCron(),
however after further testings, the 1 second delay imposed by this
change is not acceptable.

So now the BGSAVE is only delayed if the AOF rewriting process is
active. However past comments made sure that replicationCron() is always
able to trigger the BGSAVE when needed, making the code generally more
robust.

The new code is more similar to the initial @oranagra patch where the
BGSAVE was delayed only if an AOF rewrite was in progress.

Trivia: delaying the BGSAVE uncovered a minor Sentinel issue that is now
fixed.
2016-07-27 12:08:32 +02:00
antirez
7ca69aff26 Sentinel: new test unit 07 that tests master down conditions. 2016-07-27 12:08:25 +02:00
antirez
5b5e65203f Sentinel: check Slave INFO state more often when disconnected.
During the initial handshake with the master a slave will report to have
a very high disconnection time from its master (since technically it was
disconnected since forever, so the current UNIX time in seconds is
reported).

However when the slave is connected again the Sentinel may re-scan the
INFO output again only after 10 seconds, which is a long time. During
this time Sentinels will consider this instance unable to failover, so
a useless delay is introduced.

Actaully this hardly happened in the practice because when a slave's
master is down, the INFO period for slaves changes to 1 second. However
when a manual failover is attempted immediately after adding slaves
(like in the case of the Sentinel unit test), this problem may happen.

This commit changes the INFO period to 1 second even in the case the
slave's master is not down, but the slave reported to be disconnected
from the master (by publishing, last time we checked, a master
disconnection time field in INFO).

This change is required as a result of an unrelated change in the
replication code that adds a small delay in the master-slave first
synchronization.
2016-07-27 12:08:17 +02:00
antirez
21cffc2681 Avoid simultaneous RDB and AOF child process.
This patch, written in collaboration with Oran Agra (@oranagra) is a companion
to 780a8b1. Together the two patches should avoid that the AOF and RDB saving
processes can be spawned at the same time. Previously conditions that
could lead to two saving processes at the same time were:

1. When AOF is enabled via CONFIG SET and an RDB saving process is
   already active.

2. When the SYNC command decides to start an RDB saving process ASAP in
   order to serve a new slave that cannot partially resynchronize (but
   only if we have a disk target for replication, for diskless
   replication there is not such a problem).

Condition "1" is not very severe but "2" can happen often and is
definitely good at degrading Redis performances in an unexpected way.

The two commits have the effect of always spawning RDB savings for
replication in replicationCron() instead of attempting to start an RDB
save synchronously. Moreover when a BGSAVE or AOF rewrite must be
performed, they are instead just postponed using flags that will try to
perform such operations ASAP.

Finally the BGSAVE command was modified in order to accept a SCHEDULE
option so that if an AOF rewrite is in progress, when this option is
given, the command no longer returns an error, but instead schedules an
RDB rewrite operation for when it will be possible to start it.
2016-07-27 12:08:12 +02:00
antirez
017378eca9 Replication: start BGSAVE for replication always in replicationCron().
This makes the replication code conceptually simpler by removing the
synchronous BGSAVE trigger in syncCommand(). This also means that
socket and disk BGSAVE targets are handled by the same code.
2016-07-27 12:08:08 +02:00
antirez
940be9ab54 Regression test for issue #3333. 2016-07-06 11:50:13 +02:00
antirez
21736b41a2 getLongLongFromObject: use string2ll() instead of strict_strtoll().
strict_strtoll() has a bug that reports the empty string as ok and
parses it as zero.

Apparently nobody ever replaced this old call with the faster/saner
string2ll() which is used otherwise in the rest of the Redis core.

This commit close #3333.
2016-07-06 11:47:07 +02:00
antirez
0b748e9139 redis-cli: check SELECT reply type just in state updated.
In issues #3361 / #3365 a problem was reported / fixed with redis-cli
not updating correctly the current DB on error after SELECT.

In theory this bug was fixed in 0042fb0e, but actually the commit only
fixed the prompt updating, not the fact the state was set in a wrong
way.

This commit removes the check in the prompt update, now that hopefully
it is the state that is correct, there is no longer need for this check.
2016-07-05 17:40:32 +02:00
sskorgal
1158386bb8 Fix for redis_cli printing default DB when select command fails. 2016-07-05 17:40:32 +02:00
antirez
026f9fc7b0 Sentinel: fix cross-master Sentinel address update.
This commit both fixes the crash reported with issue #3364 and
also properly closes the old links after the Sentinel address for the
other masters gets updated.

The two problems where:

1. The Sentinel that switched address may not monitor all the masters,
   it is possible that there is no match, and the 'match' variable is
   NULL. Now we check for no match and 'continue' to the next master.

2. By ispecting the code because of issue "1" I noticed that there was a
   problem in the code that disconnects the link of the Sentinel that
   needs the address update. Basically link->disconnected is non-zero
   even if just *a single link* (cc -- command link or pc -- pubsub
   link) are disconnected, so to check with if (link->disconnected)
   in order to close the links risks to leave one link connected.

I was able to manually reproduce the crash at "1" and verify that the
commit resolves the issue.

Close #3364.
2016-07-04 18:50:40 +02:00
antirez
11523b3e0e CONFIG GET is now no longer case sensitive.
Like CONFIG SET always was. Close #3369.
2016-07-04 16:09:07 +02:00
antirez
f5a7f4f2d9 Fix test for new RDB checksum failure message. 2016-07-04 12:41:25 +02:00
antirez
4c6ff74c07 Make tcp-keepalive default to 300 in internal conf.
We already changed the default in the redis.conf template, but I forgot
to change the internal config as well.
2016-07-04 12:33:29 +02:00
antirez
27dbec2a36 In Redis RDB check: more details in error reportings. 2016-07-04 12:33:28 +02:00
antirez
41f300473a In Redis RDB check: log decompression errors. 2016-07-04 12:24:15 +02:00
antirez
278fe3e965 In Redis RDB check: log object type on error. 2016-07-04 12:24:08 +02:00
antirez
a117dfa807 Added a trivial program to randomly corrupt RDB files in /utils. 2016-07-04 12:24:05 +02:00
antirez
f5110c3c7c In Redis RDB check: minor output message changes. 2016-07-04 12:24:02 +02:00
antirez
35b18bfba3 In Redis RDB check: better error reporting. 2016-07-04 12:23:59 +02:00
antirez
f578f08544 In Redis RDB check: initial POC.
So far we used an external program (later executed within Redis) and
parser in order to check RDB files for correctness. This forces, at each
RDB format update, to have two copies of the same format implementation
that are hard to keep in sync. Morover the former RDB checker only
checked the very high-level format of the file, without actually trying
to load things in memory. Certain corruptions can only be handled by
really loading key-value pairs.

This first commit attempts to unify the Redis RDB loadig code with the
task of checking the RDB file for correctness. More work is needed but
it looks like a sounding direction so far.
2016-07-04 12:23:47 +02:00
tielei
7f1e1caee7 A string with 21 chars is not representable as a 64-bit integer. 2016-07-04 12:10:22 +02:00
antirez
7a3a595fb4 Test: new randomized stress tester for #3343 alike bugs. 2016-06-30 16:50:12 +02:00
antirez
c75ca104f4 Stress tester WIP. 2016-06-30 16:50:10 +02:00
antirez
2c3fcf87cc Regression test for issue #3343 exact min crash sequence.
Note: it was verified that it can crash the test suite without the patch
applied.
2016-06-30 16:50:06 +02:00
antirez
704196790e Fix quicklistReplaceAtIndex() by updating the quicklist ziplist size.
The quicklist takes a cached version of the ziplist representation size
in bytes. The implementation must update this length every time the
underlying ziplist changes. However quicklistReplaceAtIndex() failed to
fix the length.

During LSET calls, the size of the ziplist blob and the cached size
inside the quicklist diverged. Later, when this size is used in an
authoritative way, for example during nodes splitting in order to copy
the nodes, we end with a duplicated node that may contain random
garbage.

This commit should fix issue #3343, however several problems were found
reviewing the quicklist.c code in search of this bug that should be
addressed soon or later.

For example:

1. To take a cached ziplist length is fragile since failing to update it
leads to this kind of issues.

2. The node splitting code needs auditing. For example it works just for
a side effect of ziplistDeleteRange() to be able to cope with a wrong
count of elements to remove. The code inside quicklist.c assumes that
-1 means "delete till the end" while actually it's just a count of how
many elements to delete, and is an unsigned count. So -1 gets converted
into the maximum integer, and just by chance the ziplist code stops
deleting elements after there are no more to delete.

3. Node splitting is extremely inefficient, it copies the node and
removes elements from both nodes even when actually there is to move a
single entry from one node to the other, or when the new resulting node
is empty at all so there is nothing to copy but just to create a new
node.

However at least for Redis 3.2 to introduce fresh code inside
quicklist.c may be even more risky, so instead I'm writing a better
fuzzy tester to stress the internals a bit more in order to anticipate
other possible bugs.

This bug was found using a fuzzy tester written after having some clue
about where the bug could be. The tester eventually created a ~2000
commands sequence able to always crash Redis. I wrote a better version
of the tester that searched for the smallest sequence that could crash
Redis automatically. Later this smaller sequence was minimized by
removing random commands till it still crashed the server. This resulted
into a sequence of 7 commands. With this small sequence it was just a
matter of filling the code with enough printf() to understand enough
state to fix the bug.
2016-06-27 18:12:12 +02:00
antirez
04c7261f03 Redis 3.2.1. 3.2.1 2016-06-17 15:15:21 +02:00
oranagra
8207e82804 config set list-max-ziplist-size didn't support negative values, unlike config file 2016-06-17 14:49:37 +02:00
antirez
6ad0371c9b Fix Sentinel pending commands counting.
This bug most experienced effect was an inability of Redis to
reconfigure back old masters to slaves after they are reachable again
after a failover. This was due to failing to reset the count of the
pending commands properly, so the master appeared fovever down.

Was introduced in Redis 3.2 new Sentinel connection sharing feature
which is a lot more complex than the 3.0 code, but more scalable.

Many thanks to people reporting the issue, and especially to
@sskorgal for investigating the issue in depth.

Hopefully closes #3285.
2016-06-16 19:24:34 +02:00
antirez
58f1d446c3 redis-cli: really connect to the right server.
I recently introduced populating the autocomplete help array with the
COMMAND command if available. However this was performed before parsing
the arguments, defaulting to instance 6379. After the connection is
performed it remains stable.

The effect is that if there is an instance running on port 6339,
whatever port you specify is ignored and 6379 is connected to instead.
The right port will be selected only after a reconnection.

Close #3314.
2016-06-16 17:25:13 +02:00
Jan-Erik Rediger
b6007b324b Remove debug printing 2016-06-16 17:18:06 +02:00
antirez
f592b4d317 RESTORE: accept RDB dumps with older versions.
Reference issue #3218.

Checking the code I can't find a reason why the original RESTORE
code was so opinionated about restoring only the current version. The
code in to `rdb.c` appears to be capable as always to restore data from
older versions of Redis, and the only places where it is needed the
current version in order to correctly restore data, is while loading the
opcodes, not the values itself as it happens in the case of RESTORE.

For the above reasons, this commit enables RESTORE to accept older
versions of values payloads.
2016-06-16 15:56:29 +02:00
oranagra
047ced4473 CLIENT error message was out of date 2016-06-16 12:59:26 +02:00
oranagra
14e04847ac fix georadius returns multiple replies 2016-06-16 12:58:18 +02:00
antirez
bd23ea3f9f Minor aesthetic fixes to PR #3264.
Comment format fixed + local var modified from camel case to underscore
separators as Redis code base normally does (camel case is mostly used
for global symbols like structure names, function names, global vars,
...).
2016-06-16 12:56:28 +02:00
oranagra
2a3ee58ec7 check WRONGTYPE in BITFIELD before looping on the operations.
optimization: lookup key only once, and grow at once to the max need
fixes #3259 and #3221, and also an early return if wrongtype is discovered by SET
2016-06-16 12:56:17 +02:00
oranagra
a2e27b810e fix crash in BITFIELD GET on non existing key or wrong type see #3259
this was a bug in the recent refactoring: bee963c4459223d874e3294a0d8638a588d33c8e
2016-06-16 12:56:17 +02:00
MOON_CLJ
26555f5e00 fix check when can't send the command to the promoted slave 2016-06-15 17:24:43 +02:00
antirez
f1c237cb6a Test TOUCH and new TTL / TYPE behavior about object access time. 2016-06-15 17:16:13 +02:00
antirez
d4831e3287 GETRANGE: return empty string with negative, inverted start/end. 2016-06-15 16:05:54 +02:00
antirez
9942070f5a Remove additional round brackets from fix for #3282. 2016-06-15 16:05:39 +02:00
wenduo
f45fa5d05f bitcount bug:return non-zero value when start > end (both negative) 2016-06-15 16:05:33 +02:00
antirez
0cb86064e6 Regression test for #3282. 2016-06-15 16:04:44 +02:00
antirez
b23aa6706a TTL and TYPE LRU access fixed. TOUCH implemented. 2016-06-15 09:18:55 +02:00
antirez
6e4204fec9 redis-cli help.h updated. 2016-06-14 14:45:48 +02:00
antirez
bb43f4cab2 Fix GEORADIUS wrong output with radius > Earth radius.
Close #3266
2016-06-13 12:11:15 +02:00
antirez
16102bc0af Geo: fix typo in geohashEstimateStepsByRadius().
I'm the author of this line but I can't see a good reason for it to
don't be a typo, a step of 26 should be valid with 52 bits per
coordinate, moreover the line was:

    if (step > 26) step = 25;

So a step of 26 was actually already used, except when one of 27 was
computed (which is invalid) only then it was trimmed to 25 instead of
26.

All tests passing after the change.
2016-06-13 12:11:02 +02:00
antirez
014bf80442 Avoid undefined behavior in BITFIELD implementation.
Probably there is no compiler that will actaully break the code or raise
a signal for unsigned -> signed overflowing conversion, still it was
apparently possible to write it in a more correct way.

All tests passing.
2016-06-13 12:10:58 +02:00