4005 Commits

Author SHA1 Message Date
antirez
9997b51ff8 Sentinel failure detection implementation improved.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
2014-03-21 09:57:02 +01:00
antirez
04cd7f773d Sentinel: use CLIENT SETNAME when connecting to Redis.
This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.
2014-03-21 09:57:02 +01:00
antirez
32f188cdf7 redis-trib: call MIGRATE via r.client.call as fix for redis-rb API changes.
See issue #1593.

Thanks to @badboy for suggesting the direct client.call fix.
2014-03-15 11:02:59 +01:00
Matt Stancliff
0c8f0079f9 Fix segfault from accessing array out of bounds
argc == 2; argv[2] == crash
2014-03-14 22:57:29 +01:00
antirez
668a95a583 Sentinel: be safe under crash-recovery assumptions.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
2014-03-14 22:57:25 +01:00
antirez
ca2a3b3240 Sentinel: fake PUBLISH command to receive HELLO messages.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
2014-03-14 11:09:32 +01:00
antirez
ee513ee958 Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). 2014-03-14 11:09:32 +01:00
antirez
9e82461af0 Linenoise updated, multiline mode enabled in redis-cli. 2014-03-13 15:11:41 +01:00
antirez
e0ef510630 Redis 2.9.51 (Redis 3.0.0 beta-2). 3.0.0-beta2 2014-03-11 16:13:39 +01:00
antirez
e917de12d0 Cluster: flag the transaction as dirty for the new redirections. 2014-03-11 15:19:00 +01:00
antirez
07036ad45e redis-trib: new subcommand 'call'. Exec command in all nodes.
Example:

./redis-trib.rb call 192.168.1.11:7000 config get cluster-node-timeout
2014-03-11 15:02:41 +01:00
antirez
a02424e29b redis-trib: create subcommand is now able to assign spare slaves.
Example: if the user will try to configure a cluster with 9 nodes,
asking for 1 slave for master, redis-trib will configure a 4 masters
cluster with 1 slave each as usually, but this time will assign the
spare node as a slave of one of the masters.
2014-03-11 15:02:41 +01:00
antirez
412a759a83 Cluster: update node configEpoch on UPDATE messages.
The UPDATE message contains the configEpoch of the node configuration
advertised in the packet. Update it if needed.
2014-03-11 15:02:41 +01:00
antirez
48702e0011 Cluster: set slot error if we receive an update for a busy slot.
By manually modifying nodes configurations in random ways, it is possible
to create the following scenario:

A is serving keys for slot 10
B is manually configured to serve keys for slot 10

A receives an update from B (or another node) where it is informed that
the slot 10 is now claimed by B with a greater configuration epoch,
however A still has keys from slot 10.

With this commit A will put the slot in error setting it in IMPORTING
state, so that redis-trib can detect the issue.
2014-03-11 15:02:41 +01:00
antirez
efd0346ea3 Cluster: clarified a comment in clusterUpdateSlotsConfigWith(). 2014-03-11 15:02:41 +01:00
antirez
47e3f1f16c Cluster: flush importing/migrating state when master is turned into slave. 2014-03-11 15:02:41 +01:00
antirez
117557192e Cluster: clusterCloseAllSlots() added. 2014-03-11 15:02:41 +01:00
antirez
01eee56f4f DEBUG ERROR implemented.
The new "error" subcommand of the DEBUG command can reply with an user
selected error, specified as its sole argument:

    DEBUG ERROR "LOADING please wait..."

The error is generated just prefixing the command argument with a "-"
character, and replacing newlines with spaces (since error replies can't
include newlines).

The goal of the command is to help in Client libraries unit tests by
making simple to simulate a command call triggering a given error.
2014-03-11 11:10:33 +01:00
antirez
ab8e1bbcdc DEBUG CMDKEYS: provide some guarantee to getKeysFromCommand().
getKeysFromCommand() is designed to be called with the command arguments
passing the basic arity checks described in the command table.

DEBUG CMDKEYS must provide the same guarantees for calling
getKeysFromCommand() to be safe.
2014-03-11 11:10:33 +01:00
antirez
94129415bf Cluster: make sortGetKeys() able to handle multiple STORE options.
It does not make sense to pass multiple store options, so, better to
handle it ;-)
2014-03-11 11:10:33 +01:00
antirez
db0d9f4326 DEBUG CMDKEYS added for getKeysFromCommand() testing.
Examples:

    redis 127.0.0.1:6379> debug cmdkeys set foo bar
    1) "foo"
    redis 127.0.0.1:6379> debug cmdkeys mget a b c
    1) "a"
    2) "b"
    3) "c"
    redis 127.0.0.1:6379> debug cmdkeys zunionstore foo 2 a b
    1) "a"
    2) "b"
    3) "foo"
    redis 127.0.0.1:6379> debug cmdkeys ping
    (empty list or set)
2014-03-11 11:10:33 +01:00
antirez
3b80e0a41d Cluster: don't allow BY option of SORT as well.
There is the exception of a "constant" BY pattern that is used in order
to signal to don't sort at all. In this case no lookup is needed so it
is possible to support this case in Cluster mode.
2014-03-11 11:10:33 +01:00
antirez
399fca8f45 Cluster: SORT get keys helper implemented. 2014-03-11 11:10:33 +01:00
antirez
0c38b3c934 Cluster: evalGetKeys() fixed: was not setting keys count. 2014-03-11 11:10:33 +01:00
antirez
d7451a0110 Cluster: don't allow GET option in cluster mode.
The commit also refactors a bit the error handling during SORT option
parsing.
2014-03-11 11:10:33 +01:00
antirez
c3aff0c20a Fixed memory leak in SORT LIMIT option argument parsing on error. 2014-03-11 11:10:33 +01:00
antirez
fdf737e132 Cluster: getKeysFromCommand() top comment improved. 2014-03-11 11:10:33 +01:00
antirez
81efa0d296 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-11 11:10:09 +01:00
antirez
0618d26b89 Cluster: getKeysFromCommand() and related: top-comments added. 2014-03-11 11:10:09 +01:00
antirez
a2a72b87e0 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-11 11:10:09 +01:00
antirez
c8485703f5 Cluster: some zunionInterGetKeys() comment trimmed.
Everything was pretty clear again from the initial statements.
2014-03-11 11:10:09 +01:00
antirez
d610d2343d Cluster: abort on port too high error.
It also fixes multi-line comment style to be consistent with the rest of
the code base.

Related to #1555.
2014-03-11 11:10:09 +01:00
antirez
2a951ce502 Cluster: be explicit about passing NULL as bind addr for connect.
The code was already correct but it was using that bindaddr[0] is set to
NULL as a side effect of current implementation if no bind address is
configured. This is not guarnteed to hold true in the future.
2014-03-11 11:10:09 +01:00
antirez
9c9914d779 Cluster: log error when anetTcpNonBlockBindConnect() fails. 2014-03-11 11:10:09 +01:00
antirez
3119f4f694 Cluster: better timeout and retry time for failover.
When node-timeout is too small, in the order of a few milliseconds,
there is no way the voting process can terminate during that time, so we
set a lower limit for the failover timeout of two seconds.

The retry time is set to two times the failover timeout time, so it is
at least 4 seconds.
2014-03-11 11:10:09 +01:00
Matt Stancliff
14f77b343a Fix key extraction for z{union,inter}store
The previous implementation wasn't taking into account
the storage key in position 1 being a requirement (it
was only counting the source keys in positions 3 to N).

Fixes antirez/redis#1581
2014-03-11 11:10:09 +01:00
antirez
63fc5dc8b1 Typo in sentinel.conf, exists -> exits. 2014-03-11 11:10:09 +01:00
antirez
afe28cfd75 Cluster: fix conditional generating TRYAGAIN error. 2014-03-11 11:10:09 +01:00
antirez
aa5898f53e Redis Cluster: support for multi-key operations. 2014-03-11 11:10:09 +01:00
Matt Stancliff
c0915ad1a0 Reset op_sec_last_sample_ops when reset requested
This value needs to be set to zero (in addition to
stat_numcommands) or else people may see
a negative operations per second count after they
run CONFIG RESETSTAT.

Fixes antirez/redis#1577
2014-03-11 11:10:09 +01:00
Matt Stancliff
4b3c87a027 Remove redundant IP length definition
REDIS_CLUSTER_IPLEN had the same value as
REDIS_IP_STR_LEN.  They were both #define'd
to the same INET6_ADDRSTRLEN.
2014-03-11 11:10:09 +01:00
Matt Stancliff
7c8964a8cf Remove some redundant code
Function nodeIp2String in cluster.c is exactly
anetPeerToString with a pre-extracted fd.
2014-03-11 11:09:37 +01:00
Matt Stancliff
7c359449d5 Fix return value check for anetTcpAccept
anetTcpAccept returns ANET_ERR, not AE_ERR.

This isn't a physical error since both ANET_ERR
and AE_ERR are -1, but better to be consistent.
2014-03-11 11:09:37 +01:00
Jan-Erik Rediger
6766fc561e Small typo fixed 2014-03-11 11:09:37 +01:00
Matt Stancliff
9a7cf31960 Bind source address for cluster communication
The first address specified as a bind parameter
(server.bindaddr[0]) gets used as the source IP
for cluster communication.

If no bind address is specified by the user, the
behavior is unchanged.

This patch allows multiple Redis Cluster instances
to communicate when running on the same interface
of the same host.
2014-03-11 11:09:37 +01:00
zhanghailei
503938022f refer to updateLRUClock's comment REDIS_LRU_CLOCK_MAX is 22 bits,but #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) only 21 bits 2014-03-11 11:09:37 +01:00
zhanghailei
7eec424953 FIXED a typo more thank should be more than 2014-03-11 11:09:37 +01:00
zhanghailei
0abe98cb4d According to context,the size should be 16 rather than 64 2014-03-11 11:09:37 +01:00
Matt Stancliff
a0ea8f235e Cluster: error out quicker if port is unusable
The default cluster control port is 10,000 ports higher than
the base Redis port.  If Redis is started on a too-high port,
Cluster can't start and everything will exit later anyway.
2014-03-11 11:09:37 +01:00
Matt Stancliff
6f4b5ef6d5 Fix "can't bind to address" error reporting.
Report the actual port used for the listening attempt instead of
server.port.

Originally, Redis would just listen on server.port.
But, with clustering, Redis uses a Cluster Port too,
so we can't say server.port is always where we are listening.

If you tried to launch Redis with a too-high port number (any
port where Port+10000 > 65535), Redis would refuse to start, but
only print an error saying it can't connect to the Redis port.

This patch fixes much confusions.
2014-03-11 11:09:37 +01:00