363 Commits

Author SHA1 Message Date
antirez
172eac504a More explicit panic message on out of memory. 2013-04-19 15:11:59 +02:00
antirez
ed2d988192 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:13:03 +02:00
antirez
d785413d86 Extended SET command implemented (issue #931). 2013-03-28 16:45:37 +01:00
antirez
02b9a72bae DEBUG set-active-expire added.
We need the ability to disable the activeExpireCycle() (active
expired key collection) call for testing purposes.
2013-03-28 12:48:47 +01:00
antirez
6cb7860658 Flag PUBLISH as read-only in the command table. 2013-03-27 09:07:43 +01:00
antirez
6818905406 Allow SELECT while loading the DB.
Fixes issue #1024.
2013-03-26 13:59:00 +01:00
antirez
4f8b18f3dd Replication: master_link_down_since_seconds initial value should be huge.
server.repl_down_since used to be initialized to the current time at
startup. This is wrong since the replication never started. Clients
testing this filed to check if data is uptodate should never believe
data is recent if we never ever connected to our master.
2013-03-13 12:55:06 +01:00
Damian Janowski
b9f8c2a5b0 Abort when opening the RDB file results in an error other than ENOENT.
This fixes cases where the RDB file does exist but can't be accessed for
any reason. For instance, when the Redis process doesn't have enough
permissions on the file.
2013-03-13 10:08:47 +01:00
antirez
18d16f8592 Set default for stop_writes_on_bgsave_err in initServerConfig().
It was placed for error in initServer() that's called after the
configuation is already loaded, causing issue #1000.
2013-03-12 18:36:07 +01:00
antirez
48f4f77189 activeExpireCycle() smarter with many DBs and under expire pressure.
activeExpireCycle() tries to test just a few DBs per iteration so that
it scales if there are many configured DBs in the Redis instance.
However this commit makes it a bit smarter when one a few of those DBs
are under expiration pressure and there are many many keys to expire.

What we do is to remember if in the last iteration had to return because
we ran out of time. In that case the next iteration we'll test all the
configured DBs so that we are sure we'll test again the DB under
pressure.

Before of this commit after some mass-expire in a given DB the function
tested just a few of the next DBs, possibly empty, a few per iteration,
so it took a long time for the function to reach again the DB under
pressure. This resulted in a lot of memory being used by already expired
keys and never accessed by clients.
2013-03-11 11:34:49 +01:00
antirez
b3281e93c3 In databasesCron() never test more DBs than we have. 2013-03-11 11:34:45 +01:00
antirez
2677c97d39 Make comment name match var name in activeExpireCycle(). 2013-03-11 11:34:34 +01:00
antirez
1d426bf53b Optimize inner loop of activeExpireCycle() for no-expires case. 2013-03-11 11:34:18 +01:00
antirez
da2dd8991b REDIS_DBCRON_DBS_PER_SEC -> REDIS_DBCRON_DBS_PER_CALL 2013-03-11 11:34:14 +01:00
antirez
13f84841b5 activeExpireCycle(): process only a small number of DBs per iteration.
This small number of DBs is set to 16 so actually in the default
configuraiton Redis should behave exactly like in the past.
However the difference is that when the user configures a very large
number of DBs we don't do an O(N) operation, consuming a non trivial
amount of CPU per serverCron() iteration.
2013-03-11 11:34:10 +01:00
antirez
ef3a95fa6f Use unsigned integers for DB ids, for defined wrap-to-zero. 2013-03-11 11:34:05 +01:00
antirez
665b819eb4 Only resize/rehash a few databases per cron iteration.
This is the first step to lower the CPU usage when many databases are
configured. The other is to also process a limited number of DBs per
call in the active expire cycle.
2013-03-11 11:33:59 +01:00
antirez
bb562a6414 Actually call databasesCron() inside serverCron(). 2013-03-11 11:30:26 +01:00
antirez
e166abad10 Move Redis databases background processing to databasesCron(). 2013-03-11 11:30:21 +01:00
antirez
aec5ea5d07 serverCron() frequency is now a runtime parameter (was REDIS_HZ).
REDIS_HZ is the frequency our serverCron() function is called with.
A more frequent call to this function results into less latency when the
server is trying to handle very expansive background operations like
mass expires of a lot of keys at the same time.

Redis 2.4 used to have an HZ of 10. This was good enough with almost
every setup, but the incremental key expiration algorithm was working a
bit better under *extreme* pressure when HZ was set to 100 for Redis
2.6.

However for most users a latency spike of 30 milliseconds when million
of keys are expiring at the same time is acceptable, on the other hand a
default HZ of 100 in Redis 2.6 was causing idle instances to use some
CPU time compared to Redis 2.4. The CPU usage was in the order of 0.3%
for an idle instance, however this is a shame as more energy is consumed
by the server, if not important resources.

This commit introduces HZ as a runtime parameter, that can be queried by
INFO or CONFIG GET, and can be modified with CONFIG SET. At the same
time the default frequency is set back to 10.

In this way we default to a sane value of 10, but allows users to
easily switch to values up to 500 for near real-time applications if
needed and if they are willing to pay this small CPU usage penalty.
2013-03-11 11:28:55 +01:00
antirez
e7a61d287e API to lookup commands with their original name.
A new server.orig_commands table was added to the server structure, this
contains a copy of the commant table unaffected by rename-command
statements in redis.conf.

A new API lookupCommandOrOriginal() was added that checks both tables,
new first, old later, so that rewriteClientCommandVector() and friends
can lookup commands with their new or original name in order to fix the
client->cmd pointer when the argument vector is renamed.

This fixes the segfault of issue #986, but does not fix a wider range of
problems resulting from renaming commands that actually operate on data
and are registered into the AOF file or propagated to slaves... That is
command renaming should be handled with care.
2013-03-06 16:36:52 +01:00
antirez
a3273dbfa6 Allow AUTH while loading the DB in memory.
While Redis is loading the AOF or RDB file in memory only a subset of
commands are allowed. This commit adds AUTH to this subset.
2013-03-06 11:51:26 +01:00
Stam He
74373cd930 add a check for aeCreateTimeEvent
1) Add a check for aeCreateTimeEvent in function initServer.
2013-03-04 10:52:53 +01:00
antirez
e1e8b1cd29 Set SO_KEEPALIVE on client sockets if configured to do so. 2013-02-11 11:47:23 +01:00
Pierre Chapuis
c7b9a57fb2 fix comments forgotten in #285 (zipmap -> ziplist) 2013-02-11 11:47:10 +01:00
antirez
d5d0b467db LASTSAVE is a "random" command. 2013-02-07 19:13:09 +01:00
antirez
cf0191dc3c TCP_NODELAY after SYNC: changes to the implementation. 2013-02-05 12:05:04 +01:00
charsyam
45b1b2f7ad Turn off TCP_NODELAY on the slave socket after SYNC.
Further details from @antirez:

It was reported by @StopForumSpam on Twitter that the Redis replication
link was strangely using multiple TCP packets for multiple commands.
This wastes a lot of bandwidth and is due to the TCP_NODELAY option we
enable on the socket after accepting a new connection.

However the master -> slave channel is a one-way channel since Redis
replication is asynchronous, so there is no point in trying to reduce
the latency, we should aim to reduce the bandwidth. For this reason this
commit introduces the ability to disable the nagle algorithm on the
socket after a successful SYNC.

This feature is off by default because the delay can be up to 40
milliseconds with normally configured Linux kernels.
2013-02-05 12:05:01 +01:00
antirez
da540228d8 Whitelist SIGUSR1 to avoid auto-triggering errors.
This commit fixes issue #875 that was caused by the following events:

1) There is an active child doing BGSAVE.
2) flushall is called (or any other condition that makes Redis killing
the saving child process).
3) An error is sensed by Redis as the child exited with an error (killed
by a singal), that stops accepting write commands until a BGSAVE happens
to be executed with success.

Whitelisting SIGUSR1 and making sure Redis always uses this signal in
order to kill its own children fixes the issue.
2013-01-19 13:30:41 +01:00
antirez
9e5b70162c Clear server.shutdown_asap on failed shutdown.
When a SIGTERM is received Redis schedules a shutdown. However if it
fails to perform the shutdown it must be clear the shutdown_asap flag
otehrwise it will try again and again possibly making the server
unusable.
2013-01-19 13:20:58 +01:00
antirez
dbde1d85ea Slowlog: don't log EXEC but just the executed commands.
The Redis Slow Log always used to log the slow commands executed inside
a MULTI/EXEC block. However also EXEC was logged at the end, which is
perfectly useless.

Now EXEC is no longer logged and a test was added to test this behavior.

This fixes issue #759.
2013-01-19 12:55:12 +01:00
guiquanz
560e049947 Fixed many typos. 2013-01-19 11:08:43 +01:00
antirez
023cbc3787 Comment in the call() function clarified a bit. 2013-01-10 12:04:52 +01:00
antirez
9120275dc9 EVALSHA is now case insensitive.
EVALSHA used to crash if the SHA1 was not lowercase (Issue #783).
Fixed using a case insensitive dictionary type for the sha -> script
map used for replication of scripts.
2012-11-22 15:51:01 +01:00
antirez
41f0f927c9 Safer handling of MULTI/EXEC on errors.
After the transcation starts with a MULIT, the previous behavior was to
return an error on problems such as maxmemory limit reached. But still
to execute the transaction with the subset of queued commands on EXEC.

While it is true that the client was able to check for errors
distinguish QUEUED by an error reply, MULTI/EXEC in most client
implementations uses pipelining for speed, so all the commands and EXEC
are sent without caring about replies.

With this change:

1) EXEC fails if at least one command was not queued because of an
error. The EXECABORT error is used.
2) A generic error is always reported on EXEC.
3) The client DISCARDs the MULTI state after a failed EXEC, otherwise
pipelining multiple transactions would be basically impossible:
After a failed EXEC the next transaction would be simply queued as
the tail of the previous transaction.
2012-11-22 10:36:20 +01:00
antirez
d288ee655f BSD license added to every C source and header file. 2012-11-08 18:33:13 +01:00
antirez
36f026a3a0 More robust handling of AOF rewrite child.
After the wait3() syscall we used to do something like that:

    if (pid == server.rdb_child_pid) {
        backgroundSaveDoneHandler(exitcode,bysignal);
    } else {
        ....
    }

So the AOF rewrite was handled in the else branch without actually
checking if the pid really matches. This commit makes the check explicit
and logs at WARNING level if the pid returned by wait3() does not match
neither the RDB or AOF rewrite child.
2012-11-01 22:41:57 +01:00
Yecheng Fu
ed44a74e72 fix typo in comments (redis.c, networking.c) 2012-11-01 18:14:55 +08:00
antirez
99d7dbe669 A filed called slave_read_only added in INFO output.
This was an important information missing from the INFO output in the
replication section.

It obviously reflects if the slave is read only or not.
2012-10-22 19:22:48 +02:00
antirez
a25b25f4ef Default memory limit for 32bit instanced moved from 3.5 GB to 3 GB.
In some system, notably osx, the 3.5 GB limit was too far and not able
to prevent a crash for out of memory. The 3 GB limit works better and it
is still a lot of memory within a 4 GB theorical limit so it's not going
to bore anyone :-)

This fixes issue #711
2012-10-22 10:45:55 +02:00
antirez
2164523244 Fix MULTI / EXEC rendering in MONITOR output.
Before of this commit it used to be like this:

MULTI
EXEC
... actual commands of the transaction ...

Because after all that is the natural order of things. Transaction
commands are queued and executed *only after* EXEC is called.

However this makes debugging with MONITOR a mess, so the code was
modified to provide a coherent output.

What happens is that MULTI is rendered in the MONITOR output as far as
possible, instead EXEC is propagated only after the transaction is
executed, or even in the case it fails because of WATCH, so in this case
you'll simply see:

MULTI
EXEC

An empty transaction.
2012-10-16 17:41:39 +02:00
antirez
c3ff470889 Merge remote-tracking branch 'origin/2.6' into 2.6 2012-10-11 18:36:18 +02:00
antirez
0e25c0ccf4 Allow AUTH when Redis is busy because of timedout Lua script.
If the server is password protected we need to accept AUTH when there is
a server busy (-BUSY) condition, otherwise it will be impossible to send
SHUTDOWN NOSAVE or SCRIPT KILL.

This fixes issue #708.
2012-10-11 18:35:52 +02:00
antirez
05e06e1543 Warn when configured maxmemory value seems odd.
This commit warns the user with a log at "warning" level if:

1) After the server startup the maxmemory limit was found to be < 1MB.
2) After a CONFIG SET command modifying the maxmemory setting the limit
is set to a value that is smaller than the currently used memory.

The behaviour of the Redis server is unmodified, and this wil not make
the CONFIG SET command or a wrong configuration in redis.conf less
likely to create problems, but at least this will make aware most users
about a possbile error they committed without resorting to external
help.

However no warning is issued if, as a result of loading the AOF or RDB
file, we are very near the maxmemory setting, or key eviction will be
needed in order to go under the specified maxmemory setting. The reason
is that in servers configured as a cache with an aggressive
maxmemory-policy most of the times restarting the server will cause this
condition to happen if persistence is not switched off.

This fixes issue #429.
2012-10-05 10:56:35 +02:00
antirez
c4cbffa3a1 Final merge of Sentinel into 2.6.
After cherry-picking Sentinel commits a few spurious issues remained
about references to Redis Cluster that is not present in the 2.6 branch.
2012-09-27 13:10:36 +02:00
antirez
b65f3c2176 Sentinel: add Redis execution mode to INFO output.
The new "redis_mode" field in the INFO output will show if Redis is
running in standalone mode, cluster, or sentinel mode.
2012-09-27 13:05:53 +02:00
mrb
f1057534e7 Fix warning in redis.c for sentinel config load 2012-09-27 13:04:37 +02:00
antirez
120ba3922d First implementation of Redis Sentinel.
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.

More info at http://redis.io/topics/sentinel
2012-09-27 13:03:41 +02:00
antirez
dd94771578 Added the SRANDMEMBER key <count> variant.
SRANDMEMBER called with just the key argument can just return a single
random element from a Redis Set. However many users need to return
multiple unique elements from a Set, this is not a trivial problem to
handle in the client side, and for truly good performance a C
implementation was required.

After many requests for this feature it was finally implemented.

The problem implementing this command is the strategy to follow when
the number of elements the user asks for is near to the number of
elements that are already inside the set. In this case asking random
elements to the dictionary API, and trying to add it to a temporary set,
may result into an extremely poor performance, as most add operations
will be wasted on duplicated elements.

For this reason this implementation uses a different strategy in this
case: the Set is copied, and random elements are returned to reach the
specified count.

The code actually uses 4 different algorithms optimized for the
different cases.

If the count is negative, the command changes behavior and allows for
duplicated elements in the returned subset.
2012-09-21 11:55:56 +02:00
antirez
f444e2afdc A reimplementation of blocking operation internals.
Redis provides support for blocking operations such as BLPOP or BRPOP.
This operations are identical to normal LPOP and RPOP operations as long
as there are elements in the target list, but if the list is empty they
block waiting for new data to arrive to the list.

All the clients blocked waiting for th same list are served in a FIFO
way, so the first that blocked is the first to be served when there is
more data pushed by another client into the list.

The previous implementation of blocking operations was conceived to
serve clients in the context of push operations. For for instance:

1) There is a client "A" blocked on list "foo".
2) The client "B" performs `LPUSH foo somevalue`.
3) The client "A" is served in the context of the "B" LPUSH,
synchronously.

Processing things in a synchronous way was useful as if "A" pushes a
value that is served by "B", from the point of view of the database is a
NOP (no operation) thing, that is, nothing is replicated, nothing is
written in the AOF file, and so forth.

However later we implemented two things:

1) Variadic LPUSH that could add multiple values to a list in the
context of a single call.
2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH"
side effect when receiving data.

This forced us to make the synchronous implementation more complex. If
client "B" is waiting for data, and "A" pushes three elemnents in a
single call, we needed to propagate an LPUSH with a missing argument
in the AOF and replication link. We also needed to make sure to
replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not
happened to serve another blocking client into another list ;)

This were complex but with a few of mutually recursive functions
everything worked as expected... until one day we introduced scripting
in Redis.

Scripting + synchronous blocking operations = Issue #614.

Basically you can't "rewrite" a script to have just a partial effect on
the replicas and AOF file if the script happened to serve a few blocked
clients.

The solution to all this problems, implemented by this commit, is to
change the way we serve blocked clients. Instead of serving the blocked
clients synchronously, in the context of the command performing the PUSH
operation, it is now an asynchronous and iterative process:

1) If a key that has clients blocked waiting for data is the subject of
a list push operation, We simply mark keys as "ready" and put it into a
queue.
2) Every command pushing stuff on lists, as a variadic LPUSH, a script,
or whatever it is, is replicated verbatim without any rewriting.
3) Every time a Redis command, a MULTI/EXEC block, or a script,
completed its execution, we run the list of keys ready to serve blocked
clients (as more data arrived), and process this list serving the
blocked clients.
4) As a result of "3" maybe more keys are ready again for other clients
(as a result of BRPOPLPUSH we may have push operations), so we iterate
back to step "3" if it's needed.

The new code has a much simpler semantics, and a simpler to understand
implementation, with the disadvantage of not being able to "optmize out"
a PUSH+BPOP as a No OP.

This commit will be tested with care before the final merge, more tests
will be added likely.
2012-09-17 10:26:50 +02:00