44c92f5aeb
Cluster: slave failover implemented.
2013-03-15 16:11:34 +01:00
1d8f302e0d
Cluster: election -> promotion in two comments.
2013-03-15 15:44:49 +01:00
bf82195467
Cluster: added function to broadcast pings.
...
See the function top-comment for info why this is useful sometimes.
2013-03-15 15:43:58 +01:00
892e98548a
Cluster: don't broadcast messages to HANDSHAKE nodes.
...
Also don't check for NOADDR as we check that node->link is not NULL
that's enough.
2013-03-15 15:36:36 +01:00
76a3954f4a
Cluster: fix clusterHandleSlaveFailover() conditional: quorum is enough.
2013-03-15 13:20:34 +01:00
90e99a2082
Cluster: two lame bugs fixed in FAILOVER AUTH messages generation.
2013-03-14 21:27:12 +01:00
aeacaa57e6
Cluster: code to process messages moved in the right if-else chain.
2013-03-14 21:21:58 +01:00
35f05c66b6
Cluster: handle FAILOVER_AUTH_ACK messages.
...
That's trivial as we just need to increment the count of masters that
received with an ACK.
2013-03-14 16:43:13 +01:00
c2595500ac
Cluster: request failover authorization, log if we have quorum.
...
However the failover is yet not really performed.
2013-03-14 16:39:02 +01:00
7fa42b801d
Cluster: clusterSendFailoverAuth() implementation.
2013-03-14 16:31:57 +01:00
f59ff6fe61
Cluster: clusterSendFailoverAuthIfNeeded() work in progress.
2013-03-13 19:08:03 +01:00
44f6fdab60
Cluster: handle FAILOVER_AUTH_REQUEST in clusterProcessPacket().
...
However currently the control is passed to a function doing nothing at
all.
2013-03-13 18:38:08 +01:00
ece95b2dea
Cluster: sanity check FAILOVER_AUTH_REQUEST messages for proper length.
2013-03-13 17:31:26 +01:00
66144337bf
Cluster: use 'else if' for mutually exclusive conditionals.
2013-03-13 17:27:06 +01:00
db7c17e969
Cluster: FAILOVER_AUTH_REQUEST message type introduced.
...
This message is sent by a slave that is ready to failover its master to
other nodes to get the authorization from the majority of masters.
2013-03-13 17:21:20 +01:00
575cbc9990
Cluster: clusterHandleSlaveFailover() stub.
2013-03-13 13:10:49 +01:00
3d448bda39
Cluster: call clusterHandleSlaveFailover() when our master is down.
2013-03-13 12:44:02 +01:00
f0b807cd47
Cluster: update cluster state on PFAIL flag set/cleared on nodes.
2013-03-07 15:40:53 +01:00
299b8f76c2
Cluster: mark cluster state as fail of majority of masters is unreachable.
2013-03-07 15:36:59 +01:00
abf06fd5ff
Cluster: log global cluster state change.
2013-03-07 15:22:32 +01:00
3dad8196b7
Cluster: clusterUpdateState() function simplified.
...
Also the NEEDHELP Cluster state was removed as it will no longer be
used by Redis Cluster.
2013-03-06 18:25:40 +01:00
011fa89ac9
Cluster: sdssplitargs_free() -> sdsfreesplitres().
2013-03-06 12:38:06 +01:00
1025dd7786
Cluster: connect to our master ASAP after startup if we are a slave node.
2013-03-05 16:12:08 +01:00
bac57ad14b
Cluster: more robust FAIL flag cleaup.
...
If we have a master in FAIL state that's reachable again, and apparently
no one is going to serve its slots, clear the FAIL flag and let the
cluster continue with its operations again.
2013-03-05 15:05:32 +01:00
1a02b7440a
Cluster: new node field fail_time.
...
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.
The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
e4b481a5f6
Cluster: A comment updated in clusterCron().
2013-03-05 12:17:30 +01:00
d728ec6dee
Cluster: send a ping to every node we never contacted in timeout/2 seconds.
...
Usually we try to send just 1 ping every second, however when we detect
we are going to have unreliable failure detection because we can't ping
some node in time, send an additional ping.
This should only happen with very large clusters or when the the node
timeout is set to a very low value.
2013-03-05 12:16:02 +01:00
e7628be2a7
Cluster: set node->slaveof correctly when a node state is updated.
2013-03-05 11:50:11 +01:00
d6457577d4
Cluster: don't perform startup slots sanity check for slaves.
...
If we are a cluster node the DB content will not match our configured
slots. Don't do the check at all.
2013-03-04 19:47:00 +01:00
d334897e80
Cluster: fix maximum line length when loading config.
...
There are pathological cases where the line can be even longer a single
node may contain all the slots in importing/migrating state.
2013-03-04 19:45:36 +01:00
b8a28bf442
Cluster: actually setup replication in CLUSTER REPLICATE.
2013-03-04 15:27:58 +01:00
0c01088b51
Cluster: REPLICATE subcommand and stub for clusterSetMaster().
2013-03-04 13:15:09 +01:00
bc84c399f8
adding check error code
...
adding check error code
2013-03-04 11:20:11 +01:00
caf9b24a7d
Cluster: don't set the slot as unassigned because of PONG info.
...
As stated in the comment this is usually due to a resharding in progress
so the client should be still redirected to the old node that will
handle the redirection elsewhere.
2013-02-28 15:54:29 +01:00
0d77440b26
Cluster: better handling of slots changes in PONG packets.
...
The new code makes sure that the node slots bitmap is always consistent
with the cluster->slots array.
2013-02-28 15:41:54 +01:00
5f8fd27ace
Cluster: refactoring of clusterNode*Bit to use helper bitmap functions.
2013-02-28 15:23:09 +01:00
d21d6b666f
Cluster: use node->numslots instead of popcount() where possible.
2013-02-28 15:13:32 +01:00
4521115b17
Cluster: new field in cluster node structure, "numslots".
...
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
a2566d6618
Cluster: don't gossip about nodes that are not useful to the cluster.
2013-02-28 15:00:09 +01:00
d45d184118
Cluster: CLUSTER FORGET implemented.
2013-02-27 17:55:59 +01:00
d2b8281b3f
Cluster: added a missing return on CLUSTER SETSLOT.
2013-02-27 17:53:48 +01:00
d20dea3eb7
Cluster: blank node address when flagging it as NOADDR.
2013-02-27 17:09:33 +01:00
2dcb5ab72b
Cluster: add comments in sub-sections of CLUSTER command.
2013-02-27 16:12:59 +01:00
f9b5ca29fd
Use GCC printf format attribute for redisLog().
...
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
d0992d6e8b
Cluster: a few random fixes to the new failure detection.
2013-02-26 15:15:44 +01:00
f288b07563
Cluster: log the event when we clear the FAIL flag.
2013-02-26 15:03:38 +01:00
97ffcd351b
Cluster: use the failure report API to reimplement failure detection.
...
The new system detects a failure only when there is quorum from masters.
2013-02-26 14:58:39 +01:00
1b1b3f6c06
Cluster: invert two functions declarations in more natural order.
2013-02-26 11:19:48 +01:00
d5e8b0a47f
Cluster: cleanup idle failure reports every time we remove one.
...
This is not very important as anyway when the function counting the
number of reports is called the cleanup is performed. However with this
change if only part of the nodes that reported the failure will report
the node is back ok, we'll cleanup the older entries ASAP. In complex
split net split scenarios, and when we are dealing with clusters having
nodes in the order of ~ 1000, this can save some CPU.
2013-02-26 11:15:18 +01:00
9cb578ced0
Cluster: new function clusterNodeDelFailureReport() for failure reports.
...
This is the missing part of the API that will be used to reimplement
failure detection of Cluster nodes.
2013-02-25 19:13:22 +01:00