4535 Commits

Author SHA1 Message Date
Matt Stancliff
4af72d326e Cluster: Restore proper trib master iteration
This got removed in 2e5c394 during a new feature addition.

The prior commit had "break if masters.length == masters_count"
but we are guaranteed to aready have that condition met since
otherwise we would haven't gotten this far.

Without this break statement, it's possible some masters may
be forgotten and have zero replicas while other masters have
more than their requested number of replicas.

Thanks to carlos for pointing out this regression at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 21:10:00 +01:00
Matt Stancliff
ee18135a19 Cluster: Fix trib create when masters==replicas
This bug was introduced in 2e5c394f during a refactor.

It took me a while to understand what was going on with
the code, so I've refactored it further by:
  - Replacing boolean values with meaningful symbols
  - Replacing 'i' with a meaningful variable name
  - Adding the proper abort check
  - Factoring out now duplicated conditionals
  - Adding optional verbose logging (we're inside *four*
    different looping constructs, so it takes a while to
    figure out where all the moving parts are)
  - Updating comment for the section

This fixes a problem when the number of master instances
equaled the number of replica instances.  Before, when
there were equal numbers of both, nodes_count would go to
zero, but the while loop would spin in i < @replicas because
i would never be updated (because the nodes_list of each ip
was length == 0, which triggered an endless loop of
next -> i = 0 -> 0 < 1? -> true -> next -> i = 0 ...)

Thanks to carlo who found this problem at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 21:10:00 +01:00
Jan-Erik Rediger
9fa96697a9 Fixed a few typos. 2014-03-24 21:10:00 +01:00
Matt Stancliff
a7478079a1 Cluster: remove variable causing warning
GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;
2014-03-24 21:10:00 +01:00
Jan-Erik Rediger
46904ae22b Finally fix the install_server.sh script.
Includes changes from a dozen bug reports and pull requests.
Was tested on Ubuntu, Debian and CentOS.
2014-03-24 18:12:55 +01:00
antirez
2dd8c46249 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:03:41 +01:00
antirez
ae211be98b sdscatvprintf(): Try to use a static buffer.
For small content the function now tries to use a static buffer to avoid
a malloc/free cycle that is too costly when the function is used in the
context of performance critical code path such as INFO output generation.

This change was verified to have positive effects in the execution speed
of the INFO command.
2014-03-24 10:23:55 +01:00
antirez
571d6b01de Cache uname() output across INFO calls.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.

The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
2014-03-24 10:02:55 +01:00
antirez
06b2a389eb sdscatvprintf(): guess buflen using format length.
sdscatvprintf() uses a loop where it tries to output the formatted
string in a buffer of the initial length, if there was not enough room,
a buffer of doubled size is tried and so forth.

The initial guess for the buffer length was very poor, an hardcoded
"16". This caused the printf to be processed multiple times without a
good reason. Given that printf functions are already not fast, the
overhead was significant.

The new heuristic is to use a buffer 4 times the length of the format
buffer, and 32 as minimal size. This appears to be a good balance for
typical uses of the function inside the Redis code base.

This change improved INFO command performances 3 times.
2014-03-24 09:44:49 +01:00
antirez
fd70c68c97 Add test-lru.rb to utils.
This is a program useful to evaluate the Redis LRU algorithm behavior.
2014-03-21 09:57:02 +01:00
antirez
24c350245d Use getLRUClock() instead of server.lruclock to create objects.
Thanks to Matt Stancliff for noticing this error. It was in the original
code but somehow I managed to remove the change from the commit...
2014-03-21 09:57:02 +01:00
antirez
7f274e3bfe The default maxmemory policy is now noeviction.
This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
2014-03-21 09:57:02 +01:00
antirez
a219339ee0 Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-21 09:57:02 +01:00
antirez
9928b53102 Default LRU samples is now 5. 2014-03-21 09:57:02 +01:00
antirez
fd5e8c0132 Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-21 09:57:02 +01:00
antirez
10c8d86242 struct dictEntry -> dictEntry. 2014-03-21 09:57:02 +01:00
antirez
26292670fa Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-21 09:57:02 +01:00
antirez
c641074a21 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-21 09:57:02 +01:00
antirez
c9ac817cc0 Fix OBJECT IDLETIME return value converting to seconds.
estimateObjectIdleTime() returns a value in milliseconds now, so we need
to scale the output of OBJECT IDLETIME to seconds.
2014-03-21 09:57:02 +01:00
antirez
205c2ccc0c Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-21 09:57:02 +01:00
antirez
561e793472 Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-21 09:57:02 +01:00
antirez
8f0b74910a Specify LRU resolution in milliseconds. 2014-03-21 09:57:02 +01:00
antirez
63aacbe86c Set LRU parameters via REDIS_LRU_BITS define. 2014-03-21 09:57:02 +01:00
antirez
8b6a674a5d Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-21 09:57:02 +01:00
antirez
128dcee43d Sentinel: sentinelRefreshInstanceInfo() minor refactoring.
Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.
2014-03-21 09:57:02 +01:00
antirez
b104015fb3 Sentinel test: 02 unit better coverage + refactoring. 2014-03-21 09:57:02 +01:00
antirez
f5f281f906 Sentinel test: foreach_instance_id implements 'break'. 2014-03-21 09:57:02 +01:00
antirez
f308f67703 Sentinel: instance_is_killed proc added to sentinel.tcl. 2014-03-21 09:57:02 +01:00
antirez
0774d4928a Sentinel: propagate down-after-ms changes to slaves and sentinels. 2014-03-21 09:57:02 +01:00
antirez
a86e24de93 Sentinel: down-after-milliseconds is not master-specific.
addReplySentinelRedisInstance() modified so that this field is displayed
for all the kind of instances: Sentinels, Masters, Slaves.
2014-03-21 09:57:02 +01:00
antirez
9997b51ff8 Sentinel failure detection implementation improved.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
2014-03-21 09:57:02 +01:00
antirez
04cd7f773d Sentinel: use CLIENT SETNAME when connecting to Redis.
This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.
2014-03-21 09:57:02 +01:00
antirez
32f188cdf7 redis-trib: call MIGRATE via r.client.call as fix for redis-rb API changes.
See issue #1593.

Thanks to @badboy for suggesting the direct client.call fix.
2014-03-15 11:02:59 +01:00
Matt Stancliff
0c8f0079f9 Fix segfault from accessing array out of bounds
argc == 2; argv[2] == crash
2014-03-14 22:57:29 +01:00
antirez
668a95a583 Sentinel: be safe under crash-recovery assumptions.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
2014-03-14 22:57:25 +01:00
antirez
ca2a3b3240 Sentinel: fake PUBLISH command to receive HELLO messages.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
2014-03-14 11:09:32 +01:00
antirez
ee513ee958 Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). 2014-03-14 11:09:32 +01:00
antirez
9e82461af0 Linenoise updated, multiline mode enabled in redis-cli. 2014-03-13 15:11:41 +01:00
antirez
e0ef510630 Redis 2.9.51 (Redis 3.0.0 beta-2). 3.0.0-beta2 2014-03-11 16:13:39 +01:00
antirez
e917de12d0 Cluster: flag the transaction as dirty for the new redirections. 2014-03-11 15:19:00 +01:00
antirez
07036ad45e redis-trib: new subcommand 'call'. Exec command in all nodes.
Example:

./redis-trib.rb call 192.168.1.11:7000 config get cluster-node-timeout
2014-03-11 15:02:41 +01:00
antirez
a02424e29b redis-trib: create subcommand is now able to assign spare slaves.
Example: if the user will try to configure a cluster with 9 nodes,
asking for 1 slave for master, redis-trib will configure a 4 masters
cluster with 1 slave each as usually, but this time will assign the
spare node as a slave of one of the masters.
2014-03-11 15:02:41 +01:00
antirez
412a759a83 Cluster: update node configEpoch on UPDATE messages.
The UPDATE message contains the configEpoch of the node configuration
advertised in the packet. Update it if needed.
2014-03-11 15:02:41 +01:00
antirez
48702e0011 Cluster: set slot error if we receive an update for a busy slot.
By manually modifying nodes configurations in random ways, it is possible
to create the following scenario:

A is serving keys for slot 10
B is manually configured to serve keys for slot 10

A receives an update from B (or another node) where it is informed that
the slot 10 is now claimed by B with a greater configuration epoch,
however A still has keys from slot 10.

With this commit A will put the slot in error setting it in IMPORTING
state, so that redis-trib can detect the issue.
2014-03-11 15:02:41 +01:00
antirez
efd0346ea3 Cluster: clarified a comment in clusterUpdateSlotsConfigWith(). 2014-03-11 15:02:41 +01:00
antirez
47e3f1f16c Cluster: flush importing/migrating state when master is turned into slave. 2014-03-11 15:02:41 +01:00
antirez
117557192e Cluster: clusterCloseAllSlots() added. 2014-03-11 15:02:41 +01:00
antirez
01eee56f4f DEBUG ERROR implemented.
The new "error" subcommand of the DEBUG command can reply with an user
selected error, specified as its sole argument:

    DEBUG ERROR "LOADING please wait..."

The error is generated just prefixing the command argument with a "-"
character, and replacing newlines with spaces (since error replies can't
include newlines).

The goal of the command is to help in Client libraries unit tests by
making simple to simulate a command call triggering a given error.
2014-03-11 11:10:33 +01:00
antirez
ab8e1bbcdc DEBUG CMDKEYS: provide some guarantee to getKeysFromCommand().
getKeysFromCommand() is designed to be called with the command arguments
passing the basic arity checks described in the command table.

DEBUG CMDKEYS must provide the same guarantees for calling
getKeysFromCommand() to be safe.
2014-03-11 11:10:33 +01:00
antirez
94129415bf Cluster: make sortGetKeys() able to handle multiple STORE options.
It does not make sense to pass multiple store options, so, better to
handle it ;-)
2014-03-11 11:10:33 +01:00