433 Commits

Author SHA1 Message Date
antirez
8c6e86805c Cluster: use clusterSetNodeAsMaster() during slave failover.
clusterHandleSlaveFailover() was reimplementing what
clusterSetNodeAsMaster() without any good reason.
2014-05-20 17:45:50 +02:00
antirez
41a7241603 Cluster: clear todo_before_sleep flags when executing actions.
Thanks to this change, when there is some code like:

    clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|...);
    ... and later before returning to the event loop ...
    clusterUpdateState();

The clusterUpdateState() function will clar the flag and will not be
repeated in the clusterBeforeSleep() function. This especially important
for config save/fsync flags which are slow to execute and not a good
idea to repeat without a good reason.

This is implemented for all the CLUSTER_TODO flags.
2014-05-20 17:45:50 +02:00
antirez
5685d15af1 Fixed typo in CLUSTER RESET implementation. 2014-05-20 17:45:50 +02:00
antirez
5efa5501e8 CLUSTER RESET implemented.
The new command is able to reset a cluster node so that it starts again
as a fresh node. By default the command performs a soft reset (the same
as calling it as CLUSTER RESET SOFT), and the following steps are
performed:

1) All slots are set as unassigned.
2) The list of known nodes is flushed.
3) Node is set as master if it is a slave.

When an hard reset is performed with CLUSTER RESET HARD the following
additional operations are performed:

4) A new Node ID is created at random.
5) Epochs are set to 0.

CLUSTER RESET is useful both when the sysadmin wants to reconfigure a
node with a different role (for example turning a slave into a master)
and for testing purposes.

It also may play a role in automatically provisioned Redis Clusters,
since it allows to reset a node back to the initial state in order to be
reconfigured.
2014-05-20 17:45:50 +02:00
antirez
687f84a377 Remove trailing spaces from cluster.c file. 2014-05-20 17:45:50 +02:00
antirez
be1594905f Cluster: don't accept cluster bus connections during startup. 2014-05-20 17:45:50 +02:00
antirez
b8a71e5a77 Cluster: better handling of stolen slots.
The previous code handling a lost slot (by another master with an higher
configuration for the slot) was defensive, considering it an error and
putting the cluster in an odd state requiring redis-cli fix.

This was changed, because actually this only happens either in a
legitimate way, with failovers, or when the admin messed with the config
in order to reconfigure the cluster. So the new code instead will try to
make sure that the keys stored match the new slots map, by removing all
the keys in the slots we lost ownership from.

The function that deletes the keys from the lost slots is called only
if the node does not lose all its slots (resulting in a reconfiguration
as a slave of the node that got ownership). This is an optimization
since the replication code will anyway flush all the instance data in
a faster way.
2014-05-20 17:45:49 +02:00
antirez
e10ee0728a Cluster: fixed data_age computation / check integer overflow. 2014-05-20 17:45:49 +02:00
antirez
e84dcabf83 Cluster: forced failover implemented.
Using CLUSTER FAILOVER FORCE it is now possible to failover a master in
a forced way, which means:

1) No check to understand if the master is up is performed.
2) No data age of the slave is checked. Evan a slave with very old data
   can manually failover a master in this way.
3) No chat with the master is attempted to reach its replication offset:
   the master can just be down.
2014-05-20 17:45:49 +02:00
antirez
b5cdd42bf3 Cluster: bypass data_age check for manual failovers.
Automatic failovers only happen in Redis Cluster if the slave trying to
be elected was disconnected from its master for no more than 10 times
the node-timeout value. However there should be no such a check for
manual failovers, since these are initiated by the sysadmin that, in
theory, knows what she is doing when a slave is selected to be promoted.
2014-05-20 17:45:49 +02:00
antirez
0a707babb1 RESTORE: reply with -BUSYKEY special error code.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
2014-05-12 15:56:17 +02:00
antirez
fafe29cd4c CLUSTER MEET: better error messages when address is invalid.
Fixes issue #1734.
2014-05-12 15:56:17 +02:00
antirez
83c92d50a6 Cluster: bulk-accept new nodes connections.
The same change was operated for normal client connections. This is
important for Cluster as well, since when a node rejoins the cluster,
when a partition heals or after a restart, it gets flooded with new
connection attempts by all the other nodes trying to form a full
mesh again.
2014-05-09 15:05:26 +02:00
antirez
ed18587f52 Cluster: clusterAcceptHandler() comments updated to match the code. 2014-05-09 15:05:26 +02:00
antirez
c4c7389fea CLUSTER SET-CONFIG-EPOCH implemented.
Initially Redis Cluster accepted that after cluster creation all the
nodes were at configEpoch 0, evolving from zero as failovers happen.

However later the semantic was made more strict in order to make sure a
cluster has always all the master nodes with a different configEpoch,
which is more robust in some corner case (especially resulting from
errors by the system administrator).

To assign different configEpochs to different nodes at startup was a
task performed naturally by the config conflicts resolution algorithm
(see the Cluster specification). However this works well only for small
clusters or when there are actually just a few collisions, since it is
designed for exceptional cases.

When a large cluster is created hundred of nodes can be at epoch 0, so
the conflict resolution code is slow to provide an unique config to each
node. For this reason this new command was introduced. It can be called
only when a node is totally fresh: no other nodes known, and configEpoch
set to zero, so it is safe even against misuses.

redis-trib will use the new command in order to start the cluster
already setting an incremental unique config to every node.
2014-05-05 09:37:39 +02:00
antirez
b008863edb clusterLoadConfig() REDIS_ERR retval semantics refined.
We should return REDIS_ERR to signal we can't read the configuration
because there is no config file only after checking errno, othewise
we risk to rewrite an existing file that was not accessible for some
other reason.
2014-04-28 18:17:02 +02:00
antirez
71d71814a9 Lock nodes.conf to avoid multiple processes using the same file.
This was a common source of problems among users.
The solution adopted is not bullet-proof as if the user deletes the
nodes.conf file manually, and starts a new instance with the same
nodes.conf file path, two instances will use the same file. However
following this reasoning the user may drop a nuclear bomb into the
datacenter as well.
2014-04-28 18:17:02 +02:00
kingsumos
19ac9af09f fix cluster node description showing wrong slot allocation 2014-04-23 15:35:07 +02:00
antirez
437cddee0d Add casting to match printf format.
adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were
assuming uint64_t is the same as unsigned long long, which is true
probably for all the systems out there that we target, but still GCC
emitted a warning since technically they are two different types.
2014-04-16 15:09:46 +02:00
antirez
8c6ce3dd6a Cluster: last_vote_epoch -> lastVoteEpoch.
Use cammel case for epochs that are persisted on disk.
2014-04-16 15:09:44 +02:00
antirez
1080e272de Cluster: save/restore vars that must persist after recovery.
This fixes issue #1479.
2014-04-16 15:09:44 +02:00
antirez
47fbbd9fbc Cluster: handshake "already known" error logged to VERBOSE.
This is not really an error but something that always happens for
example when creating a new cluster, or if the sysadmin rejoins manually
a node that is already known.

Since useless logs don't help, moved to VERBOSE level.
2014-04-16 15:09:44 +02:00
antirez
6875a15858 Cluster: clusterHandleConfigEpochCollision() fixed.
New config epochs must always be obtained incrementing the currentEpoch,
that is itself guaranteed to be >= the max configEpoch currently known
to the node.
2014-04-16 15:09:44 +02:00
antirez
74fd89b321 Cluster: better logging for clusterUpdateSlotsConfigWith(). 2014-04-16 15:09:44 +02:00
antirez
cf8f72c184 Cluster: CLUSTER SETSLOT implementation comment updated.
Update the comment since the implementation details changed.
2014-04-16 15:09:44 +02:00
antirez
431573ae98 Cluster: configEpoch collisions resolution.
The slave election in Redis Cluster guarantees that slaves promoted to
masters always end with unique config epochs, however failures during
manual reshardings, software bugs and operational errors may in theory
cause two nodes to have the same configEpoch.

This commit introduces a mechanism to eventually always end with different
configEpochs if a collision ever happens.

As a (wanted) side effect, this also ensures that after a new cluster
is created, all nodes will end with a different configEpoch automatically.
2014-04-16 15:09:44 +02:00
antirez
8a3dde208d Cluster: stay within 80 cols. 2014-04-16 15:09:44 +02:00
antirez
10c8d86242 struct dictEntry -> dictEntry. 2014-03-21 09:57:02 +01:00
antirez
412a759a83 Cluster: update node configEpoch on UPDATE messages.
The UPDATE message contains the configEpoch of the node configuration
advertised in the packet. Update it if needed.
2014-03-11 15:02:41 +01:00
antirez
48702e0011 Cluster: set slot error if we receive an update for a busy slot.
By manually modifying nodes configurations in random ways, it is possible
to create the following scenario:

A is serving keys for slot 10
B is manually configured to serve keys for slot 10

A receives an update from B (or another node) where it is informed that
the slot 10 is now claimed by B with a greater configuration epoch,
however A still has keys from slot 10.

With this commit A will put the slot in error setting it in IMPORTING
state, so that redis-trib can detect the issue.
2014-03-11 15:02:41 +01:00
antirez
efd0346ea3 Cluster: clarified a comment in clusterUpdateSlotsConfigWith(). 2014-03-11 15:02:41 +01:00
antirez
47e3f1f16c Cluster: flush importing/migrating state when master is turned into slave. 2014-03-11 15:02:41 +01:00
antirez
117557192e Cluster: clusterCloseAllSlots() added. 2014-03-11 15:02:41 +01:00
antirez
a2a72b87e0 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-11 11:10:09 +01:00
antirez
d610d2343d Cluster: abort on port too high error.
It also fixes multi-line comment style to be consistent with the rest of
the code base.

Related to #1555.
2014-03-11 11:10:09 +01:00
antirez
2a951ce502 Cluster: be explicit about passing NULL as bind addr for connect.
The code was already correct but it was using that bindaddr[0] is set to
NULL as a side effect of current implementation if no bind address is
configured. This is not guarnteed to hold true in the future.
2014-03-11 11:10:09 +01:00
antirez
9c9914d779 Cluster: log error when anetTcpNonBlockBindConnect() fails. 2014-03-11 11:10:09 +01:00
antirez
3119f4f694 Cluster: better timeout and retry time for failover.
When node-timeout is too small, in the order of a few milliseconds,
there is no way the voting process can terminate during that time, so we
set a lower limit for the failover timeout of two seconds.

The retry time is set to two times the failover timeout time, so it is
at least 4 seconds.
2014-03-11 11:10:09 +01:00
antirez
afe28cfd75 Cluster: fix conditional generating TRYAGAIN error. 2014-03-11 11:10:09 +01:00
antirez
aa5898f53e Redis Cluster: support for multi-key operations. 2014-03-11 11:10:09 +01:00
Matt Stancliff
4b3c87a027 Remove redundant IP length definition
REDIS_CLUSTER_IPLEN had the same value as
REDIS_IP_STR_LEN.  They were both #define'd
to the same INET6_ADDRSTRLEN.
2014-03-11 11:10:09 +01:00
Matt Stancliff
7c8964a8cf Remove some redundant code
Function nodeIp2String in cluster.c is exactly
anetPeerToString with a pre-extracted fd.
2014-03-11 11:09:37 +01:00
Matt Stancliff
7c359449d5 Fix return value check for anetTcpAccept
anetTcpAccept returns ANET_ERR, not AE_ERR.

This isn't a physical error since both ANET_ERR
and AE_ERR are -1, but better to be consistent.
2014-03-11 11:09:37 +01:00
Matt Stancliff
9a7cf31960 Bind source address for cluster communication
The first address specified as a bind parameter
(server.bindaddr[0]) gets used as the source IP
for cluster communication.

If no bind address is specified by the user, the
behavior is unchanged.

This patch allows multiple Redis Cluster instances
to communicate when running on the same interface
of the same host.
2014-03-11 11:09:37 +01:00
Matt Stancliff
a0ea8f235e Cluster: error out quicker if port is unusable
The default cluster control port is 10,000 ports higher than
the base Redis port.  If Redis is started on a too-high port,
Cluster can't start and everything will exit later anyway.
2014-03-11 11:09:37 +01:00
antirez
e4833ed8bf Fix configEpoch assignment when a cluster slot gets "closed".
This is still code to rework in order to use agreement to obtain a new
configEpoch when a slot is migrated, however this commit handles the
special case that happens when the nodes are just started and everybody
has a configEpoch of 0. In this special condition to have the maximum
configEpoch is not enough as the special epoch 0 is not unique (all the
others are).

This does not fixes the intrinsic race condition of a failover happening
while we are resharding, that will be addressed later.
2014-03-05 10:22:07 +01:00
antirez
0725988a07 Cluster: clusterDelNode(): remove node from master's slaves. 2014-02-11 10:34:14 +01:00
antirez
4513d8fcd4 Cluster: UPDATE messages are the norm and verbose.
Logging them at WARNING level was of little utility and of sure disturb.
2014-02-11 10:22:05 +01:00
antirez
6d550f2de4 Cluster: configEpoch assignment in SETNODE improved.
Avoid to trash a configEpoch for every slot migrated if this node has
already the max configEpoch across the cluster.

Still work to do in this area but this avoids both ending with a very
high configEpoch without any reason and to flood the system with fsyncs.
2014-02-11 10:21:58 +01:00
antirez
585e9fb886 Cluster: clusterSetStartupEpoch() made more generally useful.
The actual goal of the function was to get the max configEpoch found in
the cluster, so make it general by removing the assignment of the max
epoch to currentEpoch that is useful only at startup.
2014-02-11 10:21:55 +01:00