Commit Graph

573 Commits

Author SHA1 Message Date
d7374032c0 Redis Cluster: handshake code refactoring + Gossip IP switch detection.
This commit makes it simple to start an handshake with a specific node
address, and uses this in order to detect a node IP change and start a
new handshake in order to fix the IP if possible.
2013-12-20 12:38:03 +01:00
a2c938c834 Redis Cluster: delay state change when in the majority again.
As specified in the Redis Cluster specification, when a node can reach
the majority again after a period in which it was partitioend away with
the minorty of masters, wait some time before accepting queries, to
provide a reasonable amount of time for other nodes to upgrade its
configuration.

This lowers the probabilities of both a client and a master with not
updated configuration to rejoin the cluster at the same time, with a
stale master accepting writes.
2013-12-20 09:56:18 +01:00
7a666ac419 Cluster: set n->slaves to NULL in clusterNodeResetSlaves().
The value was otherwise undefined, so next time the node was promoted
again from slave to master, adding a slave to the list of slaves
would likely crash the server or result into undefined behavior.
2013-12-17 14:50:24 +01:00
fda91dbde3 Cluster: check link is valid before sending UPDATE. 2013-12-17 12:28:37 +01:00
f57bb36ce7 Cluster: initialize todo_before_sleep flags to 0. 2013-12-17 12:22:02 +01:00
c70c0c6db7 Cluster: use proper type mstime_t for ping delay var. 2013-12-17 10:27:36 +01:00
47815d38e0 Fixed clearNodeFailureIfNeeded() time type to mstime_t.
This prevented 32bit cluster instances from clearing the FAIL flag when
needed.
2013-12-17 09:45:52 +01:00
e88e6a6334 Cluster: use long long for timestamps in clusterGenNodesDescription().
Ping sent and pong received fields need to be casted to long long to be
printed correctly into 32 bit systems.
2013-12-17 09:38:11 +01:00
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
6fa42b7507 Cluster: nodes re-addition blacklist API. 2013-12-02 11:12:23 +01:00
8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
3db825fde4 Cluster: some code about clusterHandleSlaveFailover() marginally improved.
80 cols friendly, some minor change to the code to make it simpler.
2013-11-29 16:17:05 +01:00
a5e7358a12 Cluster: removed not needed newline at end of redisLog() msg. 2013-11-08 17:28:02 +01:00
28071caf38 Cluster: send a single UPDATE packet for now. 2013-11-08 17:25:49 +01:00
d289c628b1 Cluster: replace hardcoded 4096 for bus msg len with sizeof(). 2013-11-08 17:19:19 +01:00
94a07d5901 Cluster: slots update refactored + UPDATE msg processing.
Now there is a function that handles the update of the local slot
configuration every time we have some new info about a node and its set
of served slots and configEpoch.

Moreoever the UPDATE packets are now processed when received (it was a
work in progress in the previous commit).
2013-11-08 17:02:10 +01:00
dc43f66eac Cluster: UPDATE msg data structure and sending function. 2013-11-08 16:26:50 +01:00
6c6572be95 Cluster: refactoring of slots update code and more.
The commit also introduces detection of nodes publishing not updated
configuration. More work in progress to send an UPDATE packet to inform
of the config change.
2013-11-08 10:32:16 +01:00
1a0cea33a0 Cluster: initialize senderConfigEpoch and senderCurrentEpoch for warnings suppression. 2013-11-05 12:01:07 +01:00
0c9f60a628 Cluster: there is a lower limit for the handshake timeout. 2013-10-11 10:34:32 +02:00
1447d28c0f Cluster: data_age conversion to milliseconds fixed. 2013-10-09 16:36:06 +02:00
573c2fea91 Cluster: clusterCron() freq is now 10h. Still ping 1 node every sec.
After the change in clusterCron() frequency of call, we still want to
ping just one random node every second.
2013-10-09 16:29:17 +02:00
ba42428633 Cluster: time switched from seconds to milliseconds.
All the internal state of cluster involving time is now using mstime_t
and mstime() in order to use milliseconds resolution.

Also the clusterCron() function is called with a 10 hz frequency instead
of 1 hz.

The cluster node_timeout must be also configured in milliseconds by the
user in redis.conf.
2013-10-09 16:19:26 +02:00
929b6a4480 Cluster: cluster stuff moved from redis.h to cluster.h. 2013-10-09 15:38:05 +02:00
ae2763f564 Cluster: masters don't vote for a slave with stale config.
When a slave requests our vote, the configEpoch he claims for its master
and the set of served slots must be greater or equal to the configEpoch
of the nodes serving these slots in the current configuraiton of the
master granting its vote.

In other terms, masters don't vote for slaves having a stale
configuration for the slots they want to serve.
2013-10-08 12:45:35 +02:00
f7d6ad4366 Cluster: fix slave data age computation when master is still connected. 2013-10-07 16:07:13 +02:00
2c3301b9f5 Cluster: log message improved when FAIL is cleared from a slave node. 2013-10-07 15:44:58 +02:00
72f38cd70f Cluster: slave nodes advertise master slots bitmap and configEpoch. 2013-10-07 11:31:12 +02:00
7afc0dd59a Cluster: new clusterDoBeforeSleep() API.
The new API is able to remember operations to perform before returning
to the event loop, such as checking if there is the failover quorum for
a slave, save and fsync the configuraiton file, and so forth.

Because this operations are performed before returning on the event
loop we are sure that messages that are sent in the same event loop run
will be delivered *after* the configuration is already saved, that is a
requirement sometimes. For instance we want to publish a new epoch only
when it is already stored in nodes.conf in order to avoid returning back
in the logical clock when a node is restarted.

This new API provides a big performance advantage compared to saving and
possibly fsyncing the configuration file multiple times in the same
event loop run, especially in the case of big clusters with tens or
hundreds of nodes.
2013-10-03 09:58:06 +02:00
211dcbe339 Cluster: update cluster config when slave changes master. 2013-10-02 12:27:12 +02:00
6c4d904baf Cluster: bus messages stats in CLUSTER info. 2013-10-02 10:10:08 +02:00
abe81781ae Cluster: FAIL messages from unknown senders are handled better.
Previously the event was not logged but instead the node reported an
unknown packet type received.
2013-10-02 09:42:45 +02:00
7970ebd80a Cluster: senderCurrentEpoch == node currentEpoch was too strict.
We can accept a vote as long as its epoch is >= the epoch at which we
started the voting process. There is no need for it to be exactly the
same.
2013-10-01 17:21:28 +02:00
f1bfd8233b Cluster: fix typo in clusterProcessPacket() comment. 2013-10-01 15:40:20 +02:00
1dedf9aa36 Cluster: time field removed from cluster messages header.
The new algorithm does not check replies time as checking for the
currentEpoch in the reply ensures that the reply is about the current
election process.
2013-09-30 16:19:44 +02:00
2d0844ee37 Cluster: log message shortened. 2013-09-30 11:51:58 +02:00
4dc247eb31 Cluster: detect cluster reconfiguration when master slots drop to 0.
The old algorithm used a PROMOTED flag and explicitly checks about
slave->master convertions. Wit the new cluster meta-data propagation
algorithm we just look at the configEpoch to check if we need to
reconfigure slots, then:

1) If a node is a master but it reaches zero served slots becuase of
reconfiguration.
2) If a node is a slave but the master reaches zero served slots because
of a reconfiguration.

We switch as a replica of the new slots owner.
2013-09-30 11:45:26 +02:00
62b1591439 Cluster: re-order failover operations to make it safer.
We need to:

1) Increment the configEpoch.
2) Save it to disk and fsync the file.
3) Broadcast the PONG with the new configuration.

If other nodes will receive the updated configuration we need to be sure
to restart with this new config in the event of a crash.
2013-09-30 10:16:48 +02:00
b187517719 Cluster: when upading the configEpoch for a node, save config on disk ASAP. 2013-09-30 10:16:25 +02:00
03ca903983 Cluster: fsync data when saving the cluster config. 2013-09-30 10:13:07 +02:00
026e63392e Cluster: update the node configEpoch when newer is detected. 2013-09-27 09:55:41 +02:00
7c4b8f29e7 Cluster: react faster when a slave wins an election. 2013-09-26 16:54:43 +02:00
42fa46e49a Cluster: removed an old source of delay to start the slave failover. 2013-09-26 13:28:19 +02:00
a445aa30a0 Cluster: master node now uses new protocol to vote. 2013-09-26 13:00:41 +02:00
fb9b76fe14 Cluster: slave node now uses the new protocol to get elected. 2013-09-26 11:13:17 +02:00
32b5410af9 Cluster: add currentEpoch to CLUSTER INFO. 2013-09-25 12:38:36 +02:00
6ec795d2cf Cluster: update our currentEpoch when a greater one is seen. 2013-09-25 12:36:29 +02:00
d426ada891 Cluster: broadcast currentEpoch and configEpoch in packets header. 2013-09-25 11:53:35 +02:00
12483b0061 Cluster: configEpoch added in cluster nodes description. 2013-09-25 11:47:13 +02:00
3c9bb8751a Cluster: PFAIL -> FAIL transition allowed for slaves.
First change: now there is no need to be a master in order to detect a
failure, however the majority of masters signaling PFAIL or FAIL is needed.

This change is important because it allows slaves rejoining the cluster
after a partition to sense the FAIL condition so that eventually all the
nodes agree on failures.
2013-09-20 11:26:44 +02:00