3040 Commits

Author SHA1 Message Date
antirez
374eed7d2a Sentinel: abort failover if no good slave is available.
The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.

What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.

This commit also refactors the code we use to abort a failover.
2012-09-27 13:04:28 +02:00
antirez
2085fdb1f4 Sentinel: reset pending_commands in a more generic way. 2012-09-27 13:04:24 +02:00
antirez
f8a19e32e2 Prevent a spurious +sdown event on switch.
When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.

Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.
2012-09-27 13:04:19 +02:00
antirez
7c39b55d42 Sentinel: debugging message removed. 2012-09-27 13:04:16 +02:00
antirez
e47236d8d4 Sentinel: changes to connection handling and redirection.
We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.

Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.
2012-09-27 13:04:12 +02:00
antirez
8ab7e998d1 Sentinel: check that instance still exists in reply callbacks.
We can't be sure the instance object still exists when the reply
callback is called.
2012-09-27 13:04:08 +02:00
antirez
e01a415d37 Sentinel: more robust failover detection as observer.
Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.

This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.

This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).

The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.
2012-09-27 13:04:02 +02:00
antirez
d26a8fb4db Fixed an error in the example sentinel.conf. 2012-09-27 13:03:55 +02:00
antirez
5b5eb192f5 Typo. 2012-09-27 13:03:49 +02:00
antirez
120ba3922d First implementation of Redis Sentinel.
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.

More info at http://redis.io/topics/sentinel
2012-09-27 13:03:41 +02:00
antirez
2812b945f0 Test for SRANDMEMBER with <count>. 2012-09-21 11:56:04 +02:00
antirez
31fe053a62 SRANDMEMBER <count> leak fixed.
For "CASE 4" (see code) we need to free the element if it's already in
the result dictionary and adding it failed.
2012-09-21 11:56:00 +02:00
antirez
dd94771578 Added the SRANDMEMBER key <count> variant.
SRANDMEMBER called with just the key argument can just return a single
random element from a Redis Set. However many users need to return
multiple unique elements from a Set, this is not a trivial problem to
handle in the client side, and for truly good performance a C
implementation was required.

After many requests for this feature it was finally implemented.

The problem implementing this command is the strategy to follow when
the number of elements the user asks for is near to the number of
elements that are already inside the set. In this case asking random
elements to the dictionary API, and trying to add it to a temporary set,
may result into an extremely poor performance, as most add operations
will be wasted on duplicated elements.

For this reason this implementation uses a different strategy in this
case: the Set is copied, and random elements are returned to reach the
specified count.

The code actually uses 4 different algorithms optimized for the
different cases.

If the count is negative, the command changes behavior and allows for
duplicated elements in the returned subset.
2012-09-21 11:55:56 +02:00
antirez
8b6b1b27cc Fix compilation on FreeBSD. Thanks to @koobs on twitter. 2012-09-17 12:45:57 +02:00
antirez
4403862623 Redis 2.5.13 (2.6.0 RC7). 2.6.0-rc7 2012-09-17 11:02:49 +02:00
antirez
174518ffb7 .gitignore modified to be more general with less entries. 2012-09-17 10:50:07 +02:00
antirez
f444e2afdc A reimplementation of blocking operation internals.
Redis provides support for blocking operations such as BLPOP or BRPOP.
This operations are identical to normal LPOP and RPOP operations as long
as there are elements in the target list, but if the list is empty they
block waiting for new data to arrive to the list.

All the clients blocked waiting for th same list are served in a FIFO
way, so the first that blocked is the first to be served when there is
more data pushed by another client into the list.

The previous implementation of blocking operations was conceived to
serve clients in the context of push operations. For for instance:

1) There is a client "A" blocked on list "foo".
2) The client "B" performs `LPUSH foo somevalue`.
3) The client "A" is served in the context of the "B" LPUSH,
synchronously.

Processing things in a synchronous way was useful as if "A" pushes a
value that is served by "B", from the point of view of the database is a
NOP (no operation) thing, that is, nothing is replicated, nothing is
written in the AOF file, and so forth.

However later we implemented two things:

1) Variadic LPUSH that could add multiple values to a list in the
context of a single call.
2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH"
side effect when receiving data.

This forced us to make the synchronous implementation more complex. If
client "B" is waiting for data, and "A" pushes three elemnents in a
single call, we needed to propagate an LPUSH with a missing argument
in the AOF and replication link. We also needed to make sure to
replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not
happened to serve another blocking client into another list ;)

This were complex but with a few of mutually recursive functions
everything worked as expected... until one day we introduced scripting
in Redis.

Scripting + synchronous blocking operations = Issue #614.

Basically you can't "rewrite" a script to have just a partial effect on
the replicas and AOF file if the script happened to serve a few blocked
clients.

The solution to all this problems, implemented by this commit, is to
change the way we serve blocked clients. Instead of serving the blocked
clients synchronously, in the context of the command performing the PUSH
operation, it is now an asynchronous and iterative process:

1) If a key that has clients blocked waiting for data is the subject of
a list push operation, We simply mark keys as "ready" and put it into a
queue.
2) Every command pushing stuff on lists, as a variadic LPUSH, a script,
or whatever it is, is replicated verbatim without any rewriting.
3) Every time a Redis command, a MULTI/EXEC block, or a script,
completed its execution, we run the list of keys ready to serve blocked
clients (as more data arrived), and process this list serving the
blocked clients.
4) As a result of "3" maybe more keys are ready again for other clients
(as a result of BRPOPLPUSH we may have push operations), so we iterate
back to step "3" if it's needed.

The new code has a much simpler semantics, and a simpler to understand
implementation, with the disadvantage of not being able to "optmize out"
a PUSH+BPOP as a No OP.

This commit will be tested with care before the final merge, more tests
will be added likely.
2012-09-17 10:26:50 +02:00
antirez
b58f03a0e8 Make sure that SELECT argument is an integer or return an error.
Unfortunately we had still the lame atoi() without any error checking in
place, so "SELECT foo" would work as "SELECT 0". This was not an huge
problem per se but some people expected that DB can be strings and not
just numbers, and without errors you get the feeling that they can be
numbers, but not the behavior.

Now getLongFromObjectOrReply() is used as almost everybody else across
the code, generating an error if the number is not an integer or
overflows the long type.

Thanks to @mipearson for reporting that on Twitter.
2012-09-11 10:34:48 +02:00
antirez
efb54f0593 Match printf format with actual type in genRedisInfoString(). 2012-09-10 12:43:20 +02:00
antirez
58889867bb BITCOUNT regression test for #582 fixed for 32 bit target.
Bug #582 was not present in 32 bit builds of Redis as
getObjectFromLong() will return an error for overflow.

This commit makes sure that the test does not fail because of the error
returned when running against 32 bit builds.
2012-09-05 17:50:24 +02:00
Haruto Otake
4c3d419013 BITCOUNT: fix segmentation fault.
remove unsafe and unnecessary cast.
until now, this cast may lead segmentation fault when end > UINT_MAX

setbit foo 0 1
bitcount  0 4294967295
=> ok
bitcount  0 4294967296
=> cause segmentation fault.

Note by @antirez: the commit was modified a bit to also change the
string length type to long, since it's guaranteed to be at max 512 MB in
size, so we can work with the same type across all the code path.

A regression test was also added.
2012-09-05 16:20:21 +02:00
Saj Goonatilleke
0671d88cab Bug fix: slaves being pinged every second
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves.  This period, which defaults
to 10, is measured in seconds.

Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.

The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*.  (master_last_io_seconds_ago never
approaches 10.)  I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
2012-09-05 16:01:01 +02:00
antirez
5ddee9b7d5 Scripting: Force SORT BY constant determinism inside SORT itself.
SORT is able to return (faster than when ordering) unordered output if
the "BY" clause is used with a constant value. However we try to play
well with scripting requirements of determinism providing always sorted
outputs when SORT (and other similar commands) are called by Lua
scripts.

However we used the general mechanism in place in scripting in order to
reorder SORT output, that is, if the command has the "S" flag set, the
Lua scripting engine will take an additional step when converting a
multi bulk reply to Lua value, calling a Lua sorting function.

This is suboptimal as we can do it faster inside SORT itself.
This is also broken as issue #545 shows us: basically when SORT is used
with a constant BY, and additionally also GET is used, the Lua scripting
engine was trying to order the output as a flat array, while it was
actually a list of key-value pairs.

What we do know is to recognized if the caller of SORT is the Lua client
(since we can check this using the REDIS_LUA_CLIENT flag). If so, and if
a "don't sort" condition is triggered by the BY option with a constant
string, we force the lexicographical sorting.

This commit fixes this bug and improves the performance, and at the same
time simplifies the implementation. This does not mean I'm smart today,
it means I was stupid when I committed the original implementation ;)
2012-09-05 01:19:47 +02:00
antirez
fd2a8951bf Send an async PING before starting replication with master.
During the first synchronization step of the replication process, a Redis
slave connects with the master in a non blocking way. However once the
connection is established the replication continues sending the REPLCONF
command, and sometimes the AUTH command if needed. Those commands are
send in a partially blocking way (blocking with timeout in the order of
seconds).

Because it is common for a blocked master to accept connections even if
it is actually not able to reply to the slave requests, it was easy for
a slave to block if the master had serious issues, but was still able to
accept connections in the listening socket.

For this reason we now send an asynchronous PING request just after the
non blocking connection ended in a successful way, and wait for the
reply before to continue with the replication process. It is very
unlikely that a master replying to PING can't reply to the other
commands.

This solution was proposed by Didier Spezia (Thanks!) so that we don't
need to turn all the replication process into a non blocking affair, but
still the probability of a slave blocked is minimal even in the event of
a failing master.

Also we now use getsockopt(SO_ERROR) in order to check errors ASAP
in the event handler, instead of waiting for actual I/O to return an
error.

This commit fixes issue #632.
2012-09-03 11:48:27 +02:00
antirez
42a239b888 Scripting: Reset Lua fake client reply_bytes after command execution.
Lua scripting uses a fake client in order to run commands in the context
of a client, accumulate the reply, and convert it into a Lua object
to return to the caller. This client is reused again and again, and is
referenced by the server.lua_client globally accessible pointer.

However after every call to redis.call() or redis.pcall(), that is
handled by the luaRedisGenericCommand() function, the reply_bytes field
of the client was not set back to zero. This filed is used to estimate
the amount of memory currently used in the reply. Because of the lack of
reset, script after script executed, this value used to get bigger and
bigger, and in the end on 32 bit systems it triggered the following
assert:

    redisAssert(c->reply_bytes < ULONG_MAX-(1024*64));

On 64 bit systems this does not happen because it takes too much time to
reach values near to 2^64 for users to see the practical effect of the
bug.

Now in the cleanup stage of luaRedisGenericCommand() we reset the
reply_bytes counter to zero, avoiding the issue. It is not practical to
add a test for this bug, but the fix was manually tested using a
debugger.

This commit fixes issue #656.
2012-08-31 11:08:53 +02:00
antirez
851ac9d072 Sentinel: added documentation about slave-priority in redis.conf 2012-08-31 10:30:51 +02:00
antirez
48d26a483d Sentinel: Redis-side support for slave priority.
A Redis slave can now be configured with a priority, that is an integer
number that is shown in INFO output and can be get and set using the
redis.conf file or the CONFIG GET/SET command.

This field is used by Sentinel during slave election. A slave with lower
priority is preferred. A slave with priority zero is never elected (and
is considered to be impossible to elect even if it is the only slave
available).

A next commit will add support in the Sentinel side as well.
2012-08-31 10:30:29 +02:00
antirez
edfaa64f49 Scripting: require at least one argument for redis.call().
Redis used to crash with a call like the following:

    EVAL "redis.call()" 0

Now the explicit check for at least one argument prevents the problem.

This commit fixes issue #655.
2012-08-31 10:28:36 +02:00
antirez
13732168a5 Incrementally flush RDB on disk while loading it from a master.
This fixes issue #539.

Basically if there is enough free memory the OS may buffer the RDB file
that the slave transfers on disk from the master. The file may
actually be flused on disk at once by the operating system when it gets
closed by Redis, causing the close system call to block for a long time.

This patch is a modified version of one provided by yoav-steinberg of
@garantiadata (the original version was posted in the issue #539
comments), and tries to flush the OS buffers incrementally (every 8 MB
of loaded data).
2012-08-28 12:47:35 +02:00
antirez
06bd3b9acd Fix a forget zmalloc_oom() -> zmalloc_oom_handler() replacement. 2012-08-24 15:41:49 +02:00
antirez
5de75120ba Better Out of Memory handling.
The previous implementation of zmalloc.c was not able to handle out of
memory in an application-specific way. It just logged an error on
standard error, and aborted.

The result was that in the case of an actual out of memory in Redis
where malloc returned NULL (In Linux this actually happens under
specific overcommit policy settings and/or with no or little swap
configured) the error was not properly logged in the Redis log.

This commit fixes this problem, fixing issue #509.
Now the out of memory is properly reported in the Redis log and a stack
trace is generated.

The approach used is to provide a configurable out of memory handler
to zmalloc (otherwise the default one logging the event on the
standard output is used).
2012-08-24 13:03:40 +02:00
antirez
32095c4057 redis-benchmark: disable big buffer cleanup in hiredis context.
This new hiredis features allows us to reuse a previous context reader
buffer even if already very big in order to maximize performances with
big payloads (Usually hiredis re-creates buffers when they are too big
and unused in order to save memory).
2012-08-22 11:34:03 +02:00
antirez
7fcba9fd9a hiredis library updated.
This version of hiredis merges modifications of the Redis fork with
latest changes in the hiredis repository.

The same version was pushed on the hiredis repository and will probably
merged into the master branch in short time.
2012-08-22 11:33:57 +02:00
Pieter Noordhuis
2f44452612 Set p to its new offset before modifying it 2012-08-22 11:33:52 +02:00
Pieter Noordhuis
89bf6f58fd Add ziplist test for deleting next to last entries 2012-08-22 11:33:47 +02:00
Tobias Schwab
013189e7db Fix version numbers 2012-08-02 14:38:42 +02:00
antirez
73d3e8751b Redis 2.5.12 (2.6 RC6). 2.6.0-rc6 2012-08-01 12:06:03 +02:00
Michael Parker
628890e43e Use correct variable name for value to convert.
Note by @antirez: this code was never compiled because utils.c lacked the
float.h include, so we never noticed this variable was mispelled in the
past.

This should provide a noticeable speed boost when saving certain types
of databases with many sorted sets inside.
2012-07-31 11:50:51 +02:00
Saj Goonatilleke
4c0c1fff5a Truncate short write from the AOF
If Redis only manages to write out a partial buffer, the AOF file won't
load back into Redis the next time it starts up.  It is better to
discard the short write than waste time running redis-check-aof.
2012-07-31 10:58:16 +02:00
Saj Goonatilleke
f00b0844c9 New in INFO: aof_last_bgrewrite_status
Behaves like rdb_last_bgsave_status -- even down to reporting 'ok' when
no rewrite has been done yet.  (You might want to check that
aof_last_rewrite_time_sec is not -1.)
2012-07-31 10:58:12 +02:00
Steeve Lennmark
889e443ce5 Check that we have connection before enabling pipe mode 2012-07-22 17:19:05 +02:00
antirez
3bb3f12539 Allow Pub/Sub in contexts where other commands are blocked.
Redis loading data from disk, and a Redis slave disconnected from its
master with serve-stale-data disabled, are two conditions where
commands are normally refused by Redis, returning an error.

However there is no reason to disable Pub/Sub commands as well, given
that this layer does not interact with the dataset. To allow Pub/Sub in
as many contexts as possible is especially interesting now that Redis
Sentinel uses Pub/Sub of a Redis master as a communication channel
between Sentinels.

This commit allows Pub/Sub to be used in the above two contexts where
it was previously denied.
2012-07-22 17:18:12 +02:00
antirez
82675c86a6 Don't assume that "char" is signed.
For the C standard char can be either signed or unsigned, it's up to the
compiler, but Redis assumed that it was signed in a few places.

The practical effect of this patch is that now Redis 2.6 will run
correctly in every system where char is unsigned, notably the RaspBerry
PI and other ARM systems with GCC.

Thanks to Georgi Marinov (@eesn on twitter) that reported the problem
and allowed me to use his RaspBerry via SSH to trace and fix the issue!
2012-07-18 12:01:43 +02:00
jokea
8a8e01f4a7 mark fd as writable when EPOLLERR or EPOLLHUP is returned by epoll_wait. 2012-07-09 12:15:07 +02:00
antirez
d3d567428a Typo in comment. 2012-07-07 17:24:40 +02:00
antirez
dbd8c753c4 REPLCONF internal command introduced.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.

The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.

This allows clients to auto discover and query all the slaves attached
into a master.

Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:

    REPLCONF listening-prot 6380
    SYNC

Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.

In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.

NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
2012-07-07 17:24:33 +02:00
antirez
b3f28b90d2 Fixed comment typo into time_independent_strcmp(). 2012-06-21 14:26:00 +02:00
antirez
4b3865cbdb Fixed a timing attack on AUTH (Issue #560).
The way we compared the authentication password using strcmp() allowed
an attacker to gain information about the password using a well known
class of attacks called "timing attacks".

The bug appears to be practically not exploitable in most modern systems
running Redis since even using multiple bytes of differences in the
input at a time instead of one the difference in running time in in the
order of 10 nanoseconds, making it hard to exploit even on LAN. However
attacks always get better so we are providing a fix ASAP.

The new implementation uses two fixed length buffers and a constant time
comparison function, with the goal of:

1) Completely avoid leaking information about the content of the
password, since the comparison is always performed between 512
characters and without conditionals.
2) Partially avoid leaking information about the length of the
password.

About "2" we still have a stage in the code where the real password and
the user provided password are copied in the static buffers, we also run
two strlen() operations against the two inputs, so the running time
of the comparison is a fixed amount plus a time proportional to
LENGTH(A)+LENGTH(B). This means that the absolute time of the operation
performed is still related to the length of the password in some way,
but there is no way to change the input in order to get a difference in
the execution time in the comparison that is not just proportional to
the string provided by the user (because the password length is fixed).

Thus in practical terms the user should try to discover LENGTH(PASSWORD)
looking at the whole execution time of the AUTH command and trying to
guess a proportionality between the whole execution time and the
password length: this appears to be mostly unfeasible in the real world.

Also protecting from this attack is not very useful in the case of Redis
as a brute force attack is anyway feasible if the password is too short,
while with a long password makes it not an issue that the attacker knows
the length.
2012-06-21 12:01:07 +02:00
antirez
0c9cf45270 Redis 2.5.11 (2.6 RC5). 2.6.0-rc5 2012-06-15 13:44:17 +02:00
antirez
6fe9c402a2 Fix c->reply_bytes computation in setDeferredMultiBulkLength()
In order to implement reply buffer limits introduced in 2.6 and useful
to close the connection under user-selected circumastances of big output
buffers (for instance slow consumers in pub/sub, a blocked slave, and so
forth) Redis takes a counter with the amount of used memory in objects
inside the output list stored into c->reply.

The computation was broken in the function setDeferredMultiBulkLength(),
in the case the object was glued with the next one. This caused the
c->reply_bytes field to go out of sync, be subtracted more than needed,
and wrap back near to ULONG_MAX values.

This commit fixes this bug and adds an assertion that is able to trap
this class of problems.

This problem was discovered looking at the INFO output of an unrelated
issue (issue #547).
2012-06-15 10:11:27 +02:00