3447 Commits

Author SHA1 Message Date
antirez
418d3d358a Clarify a comment in slaveTryPartialResynchronization(). 2014-01-08 14:11:02 +01:00
antirez
0a1a236e3e Log disconnection with slave only when ip:port is available. 2013-12-25 18:41:10 +01:00
antirez
27d06111db anetPeerToString / SockName: port can be NULL on errors too. 2013-12-25 18:39:49 +01:00
antirez
5b7c16137d anetTcpGenericConnect() bug introduced in 9d19977 fixed.
Durign a refactoring I mispelled _port for port.
This is one of the reasons I never used _varname myself.
2013-12-25 18:38:33 +01:00
antirez
d07d4a876c Remove useless goto from anetTcpGenericConnect(). 2013-12-25 18:24:04 +01:00
antirez
9d1997706c anetTcpGenericConnect() code improved + 1 bug fix.
Now the socket is closed if anetNonBlock() fails, and in general the
code structure makes it harder to introduce this kind of bugs in the
future.

Reference: pull request #1059.
2013-12-25 18:16:46 +01:00
antirez
4ad219adc8 Fix CONFIG REWRITE handling of unknown options.
There were two problems with the implementation.

1) "save" was not correctly processed when no save point was configured,
   as reported in issue #1416.
2) The way the code checked if an option existed in the "processed"
   dictionary was wrong, as we add the element with as a key associated
   with a NULL value, so dictFetchValue() can't be used to check for
   existance, but dictFind() must be used, that returns NULL only if the
   entry does not exist at all.
2013-12-23 12:50:52 +01:00
antirez
c6db326d1d Configuring port to 0 disables IP socket as specified.
This was no longer the case with 2.8 becuase of a bug introduced with
the IPv6 support. Now it is fixed.

This fixes issue #1287 and #1477.
2013-12-23 11:34:15 +01:00
antirez
4456ee1173 Make new masters inherit replication offsets.
Currently replication offsets could be used into a limited way in order
to understand, out of a set of slaves, what is the one with the most
updated data. For example this comparison is possible of N slaves
were replicating all with the same master.

However the replication offset was not transferred from master to slaves
(that are later promoted as masters) in any way, so for instance if
there were three instances A, B, C, with A master and B and C
replication from A, the following could happen:

C disconnects from A.
B is turned into master.
A is switched to master of B.
B receives some write.

In this context there was no way to compare the offset of A and C,
because B would use its own local master replication offset as
replication offset to initialize the replication with A.

With this commit what happens is that when B is turned into master it
inherits the replication offset from A, making A and C comparable.
In the above case assuming no inconsistencies are created during the
disconnection and failover process, A will show to have a replication
offset greater than C.

Note that this does not mean offsets are always comparable to understand
what is, in a set of instances, since in more complex examples the
replica with the higher replication offset could be partitioned away
when picking the instance to elect as new master. However this in
general improves the ability of a system to try to pick a good replica
to promote to master.
2013-12-22 11:54:10 +01:00
antirez
5fa937d956 Slave disconnection is an event worth logging. 2013-12-22 10:16:02 +01:00
antirez
9ba9d78ada Log when a slave lose the connection with its master. 2013-12-21 00:24:29 +01:00
antirez
d33f3689d7 Clarify include directive behavior in example redis.conf. 2013-12-19 16:03:52 +01:00
antirez
fb8a480f54 CONFIG REWRITE: no special handling or include and rename-command.
CONFIG REWRITE is now wiser and does not touch what it does not
understand inside redis.conf.
2013-12-19 16:03:49 +01:00
Yubao Liu
d62bbfda16 CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 16:03:37 +01:00
antirez
090bcc946f CONFIG REWRITE: old development comments removed. 2013-12-19 16:02:50 +01:00
antirez
ddd529bc3b CONFIG REWRITE: don't wipe unknown options.
With this commit options not explicitly rewritten by CONFIG REWRITE are
not touched at all. These include new options that may not have support
for REWRITE, and other special cases like rename-command and include.
2013-12-19 16:02:46 +01:00
antirez
673c42bb4e Example redis.conf formatted to better show appendfilename option. 2013-12-19 10:19:13 +01:00
antirez
33f6f35f34 Makefile.dep updated. 2013-12-13 13:14:47 +01:00
antirez
8eb1cb3b52 SDIFF iterator misuse bug regression test added.
See commit c00453d for more info about the bug.
2013-12-13 11:37:35 +01:00
antirez
993e0ede76 SDIFF iterator misuse fixed in diff algorithm #1.
The bug could be easily triggered by:

    SADD foo a b c 1 2 3 4 5 6
    SDIFF foo foo

When the key was the same in two sets, an unsafe iterator was used to
check existence of elements in the same set we were iterating.
Usually this would just result into a wrong output, however with the
dict.c API misuse protection we have in place, the result was actually
an assertion failed that was triggered by the CI test, while creating
random datasets for the "MASTER and SLAVE consistency" test.
2013-12-13 11:29:59 +01:00
antirez
2507e366b2 Sentinel: dead code removed. 2013-12-13 11:01:20 +01:00
antirez
f9e9448a5b Makefile: remove odd syntax not compatible with some make versions.
See issue #1448.
2013-12-12 15:19:29 +01:00
Yubao Liu
45c60a903a fix typo in redis.conf and sentinel.conf 2013-12-12 11:32:05 +01:00
codeeply
5e75e681fe comment mistake fixed 2013-12-12 11:25:48 +01:00
antirez
8995634014 Redis 2.8.3. 2.8.3 2013-12-11 15:31:57 +01:00
antirez
9a8ae5a553 Replication: publish the slave_repl_offset when disconnected from master.
When a slave was disconnected from its master the replication offset was
reported as -1. Now it is reported as the replication offset of the
previous master, so that failover can be performed using this value in
order to try to select a slave with more processed data from a set of
slaves of the old master.
2013-12-11 15:24:50 +01:00
Yossi Gottlieb
0ff078d8d0 Return proper error on requests with an unbalanced number of quotes. 2013-12-11 13:21:55 +01:00
Yossi Gottlieb
25ba2e9607 Fix wrong repldboff type which causes dropped replication in rare cases. 2013-12-11 11:37:58 +01:00
antirez
563d6b3f98 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:42:22 +01:00
antirez
b6610a569d dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:18:36 +01:00
antirez
26cf5c8ac6 Log empty DB + Loading data into two separated messages. 2013-12-10 17:51:16 +01:00
antirez
303cc97ff9 Don't send more than 1 newline/sec while loading RDB. 2013-12-10 17:50:26 +01:00
antirez
75bf5a4a4a Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-10 15:40:36 +01:00
antirez
8d0083ba25 Handle inline requested terminated with just \n. 2013-12-10 15:40:33 +01:00
antirez
fba0b23e72 Sentinel: fix reported role info sampling.
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.

This commit fixes issue #1445.
2013-12-06 12:49:27 +01:00
antirez
dceaca1f69 Sentinel: fix reported role fields when master is reset.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.

Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.

This fixes isue #1446, that describes the effects of this bug in
practice.
2013-12-06 12:49:23 +01:00
antirez
bf307cfb4d Fixed typo in redis.conf. 2013-12-06 10:48:40 +01:00
Anurag Ramdasan
4f9d30b33b Grammar fix. 2013-12-05 18:54:01 +01:00
Anurag Ramdasan
b67f39da09 fixed typo 2013-12-05 17:18:37 +01:00
Anurag Ramdasan
4c53178c6c Fixed grammar: 'usually' to 'usual' 2013-12-05 16:42:28 +01:00
antirez
7f6743a581 Fixed grammar: before H the article is a, not an. 2013-12-05 16:37:21 +01:00
antirez
edae78999c Fixed typos in redis.conf file. 2013-12-05 16:30:15 +01:00
antirez
3d7263aa41 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:15 +01:00
antirez
333453646c Grammar fix in freeClient(). 2013-12-03 13:40:51 +01:00
antirez
4d650a3baa Redis 2.8.2. 2.8.2 2013-12-02 16:07:46 +01:00
antirez
dd0ac4ac72 Sentinel: don't write HZ when flushing config.
See issue #1419.
2013-12-02 15:56:42 +01:00
antirez
75347ada7f Sentinel: better time desynchronization.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.

Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.

See issue #1419.
2013-12-02 15:56:39 +01:00
antirez
83333b08d0 Stop writes on MISCONF only if instance is a master.
From the point of view of the slave not accepting writes from the master
can only create a bigger consistency issue.
2013-11-28 16:25:49 +01:00
antirez
50d140e90b SLAVEOF command refactored into a proper API.
We now have replicationSetMaster() and replicationUnsetMaster() that can
be called in other contexts (for instance Redis Cluster).
2013-11-28 16:19:16 +01:00
antirez
2eb8a46061 Reply to PING with error when there is a MISCONF state. 2013-11-28 16:16:58 +01:00