113 Commits

Author SHA1 Message Date
antirez
6bcc370c9d Add all the configurable fields to addReplySentinelRedisInstance().
Note: the auth password with the master is voluntarily not exposed.
2014-01-13 16:39:15 +01:00
antirez
ea2bffa030 Trip comment to 80 cols in SentinelCommand(). 2014-01-13 16:39:11 +01:00
antirez
2507e366b2 Sentinel: dead code removed. 2013-12-13 11:01:20 +01:00
antirez
b6610a569d dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:18:36 +01:00
antirez
fba0b23e72 Sentinel: fix reported role info sampling.
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.

This commit fixes issue #1445.
2013-12-06 12:49:27 +01:00
antirez
dceaca1f69 Sentinel: fix reported role fields when master is reset.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.

Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.

This fixes isue #1446, that describes the effects of this bug in
practice.
2013-12-06 12:49:23 +01:00
antirez
7f6743a581 Fixed grammar: before H the article is a, not an. 2013-12-05 16:37:21 +01:00
antirez
dd0ac4ac72 Sentinel: don't write HZ when flushing config.
See issue #1419.
2013-12-02 15:56:42 +01:00
antirez
75347ada7f Sentinel: better time desynchronization.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.

Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.

See issue #1419.
2013-12-02 15:56:39 +01:00
antirez
a46f841df3 Sentinel: log vote received from other Sentinels. 2013-11-28 15:23:51 +01:00
huangz1990
e5c577e679 fix a bug in sentinel.c about pub/sub link 2013-11-26 15:11:58 +01:00
antirez
d13635b2a9 Sentinel: fixes inverted strcmp() test preventing config updates.
The result of this one-char bug was pretty serious, if the new master
had the same port of the previous master, but just a different IP
address, non-leader Sentinels would not be able to recognize the
configuration change.

This commit fixes issue #1394.

Many thanks to @shanemadden that reported the bug and helped
investigating it.
2013-11-25 10:57:20 +01:00
antirez
d240202261 Sentinel: fix type specifier for Hello msg generation.
This fixes issue #1395.
2013-11-25 10:25:10 +01:00
antirez
312ca4dacc Sentinel: different comments updated to new implementation. 2013-11-21 16:23:11 +01:00
antirez
c84275ee60 Sentinel: cleanup around SENTINEL_INFO_VALIDITY_TIME. 2013-11-21 16:06:07 +01:00
antirez
0b2639123b Sentinel: removed mem leak and useless code. 2013-11-21 15:44:10 +01:00
antirez
750b007d7c Sentinel: manual failover works again. 2013-11-21 15:24:27 +01:00
antirez
4bd1dd1c53 Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 15:24:22 +01:00
antirez
46e2f3468c Sentinel: check for disconnected links in sentinelSendHello().
Does not fix any bug as the test is performed by the caller, but better
to have the check.
2013-11-21 15:24:15 +01:00
antirez
87098a39f0 Sentinel: Hello message sending code refactored. 2013-11-21 15:24:11 +01:00
antirez
9ecd819ecb Sentinel: select slave with best (greater) replication offset. 2013-11-21 15:24:08 +01:00
antirez
eeb8cb3305 Sentinel: take the replication offset in slaves state. 2013-11-21 15:24:03 +01:00
antirez
812f76a850 Sentinel: distinguish between is-master-down-by-addr requests.
Some are just to know if the master is down, and in this case the runid
in the request is set to "*", others are actually in order to seek for a
vote and get elected. In the latter case the runid is set to the runid
of the instance seeking for the vote.
2013-11-21 15:22:55 +01:00
antirez
fc93198ff9 Sentinel: various fixes to leader election implementation. 2013-11-21 15:22:52 +01:00
antirez
44b3684633 Sentinel: failover script execution fixed. 2013-11-21 15:22:48 +01:00
antirez
6d0400f569 Sentinel: no longer used defines removed. 2013-11-21 15:22:44 +01:00
antirez
1bbce5bf01 Sentinel: when writing config on disk, remember sentinels runid. 2013-11-21 15:22:27 +01:00
antirez
679dc8b09d Sentinel: arity of known-sentinel/slave is 4 not 3. 2013-11-21 15:22:21 +01:00
antirez
3e1fc6278d Sentinel: rewriteConfigSentinelOption() sub-iterators var typo fixed. 2013-11-21 15:22:18 +01:00
antirez
9a9c0cfaa6 Sentinel: call sentinelFlushConfig() to persist state when needed.
Also the sentinel configuration rewriting was modified in order to
account for failover in progress, where we need to provide the promoted
slave address as master address, and the old master address as one of
the slaves address.
2013-11-21 15:22:14 +01:00
antirez
93d924ff1c Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync. 2013-11-21 15:22:11 +01:00
antirez
a52909c5f2 Sentinel: CONFIG REWRITE support for Sentinel config. 2013-11-21 15:22:07 +01:00
antirez
8c3e197040 Sentinel: can-failover option removed, many comments fixed. 2013-11-21 15:22:02 +01:00
antirez
0b9853ecdc Sentinel: added config options useful to take state on config rewrite.
We'll use CONFIG REWRITE (internally) in order to store the new
configuration of a Sentinel after the internal state changes. In order
to do so, we need configuration options (that usually the user will not
touch at all) about config epoch of the master, Sentinels and Slaves
known for this master, and so forth.
2013-11-21 15:21:55 +01:00
antirez
737062745d Sentinel: failover abort function simplified. 2013-11-21 15:21:50 +01:00
antirez
66b03c1a40 Sentinel: slaves reconfig delay modified.
The time Sentinel waits since the slave is detected to be configured to
the wrong master, before reconfiguring it, is now the failover_timeout
time as this makes more sense in order to give the Sentinel performing
the failover enoung time to reconfigure the slaves slowly (if required
by the configuration).

Also we now PUBLISH more frequently the new configuraiton as this allows
to switch the reapprearing master back to slave faster.
2013-11-21 15:21:46 +01:00
antirez
8ba31c218b Sentinel: failover restart time is now multiple of failover timeout.
Also defaulf failover timeout changed to 3 minutes as the failover is a
fairly fast procedure most of the times, unless there are a very big
number of slaves and the user picked to configure them sequentially (in
that case the user should change the failover timeout accordingly).
2013-11-21 15:21:41 +01:00
antirez
ccaba966bc Sentinel: state machine and timeouts simplified. 2013-11-21 15:21:37 +01:00
antirez
3c4497e83c Sentinel: election timeout define. 2013-11-21 15:21:34 +01:00
antirez
e15ba6a697 Sentinel: fix address of master in Hello messages.
Once we switched configuration during a failover, we should advertise
the new address.

This was a serious race condition as the Sentinel performing the
failover for a moment advertised the old address with the new
configuration epoch: once trasmitted to the other Sentinels the broken
configuration would remain there forever, until the next failover
(because a greater configuration epoch is required to overwrite an older
one).
2013-11-21 15:21:30 +01:00
antirez
1a6abe7d79 Sentinel: master address selection in get-master-address refactored. 2013-11-21 15:21:26 +01:00
antirez
0eeb0a0782 Sentinel: fix conditional to only affect slaves with wrong master. 2013-11-21 15:21:21 +01:00
antirez
64c8de8657 Sentinel: simplify and refactor slave reconfig code. 2013-11-21 15:20:56 +01:00
antirez
782f9cacaf Sentinel: reconfigure slaves to right master. 2013-11-21 15:20:44 +01:00
antirez
7dbc0a63f5 Sentinel: remember last time slave changed master. 2013-11-21 15:20:40 +01:00
antirez
612dbb2a91 Sentinel: redirect-to-master is not ok with new algorithm.
Now Sentinel believe the current configuration is always the winner and
should be applied by Sentinels instead of trying to adapt our view of
the cluster based on what we observe.

So the only way to modify what a Sentinel believe to be the truth is to
win an election and advertise the new configuration via Pub / Sub with a
greater configuration epoch.
2013-11-21 15:20:36 +01:00
antirez
4ccf807abc Sentinel: safer slave reconfig, master reported role should match. 2013-11-21 15:20:31 +01:00
antirez
e98d82c639 Sentinel: role reporting fixed and added in SENTINEL output. 2013-11-21 15:20:25 +01:00
antirez
be19e5450c Sentinel: being a master and reporting as slave is considered SDOWN. 2013-11-21 15:20:20 +01:00
antirez
8c551d65a1 Sentinel: make sure role_reported is always updated. 2013-11-21 15:20:15 +01:00