528 Commits

Author SHA1 Message Date
Matt Stancliff
60d62a99d2 Cluster: Add COMMANDS command
COMMANDS returns a nested multibulk reply for each
command in the command table.  The reply for each
command contains:
  - command name
  - arity
  - array of command flags
  - start key position
  - end key position
  - key offset step
  - optional: if the keys are not deterministic and
    Redis uses an internal key evaluation function,
    the 6th field appears and is defined as a status
    reply of: REQUIRES ARGUMENT PARSING

Cluster clients need to know where the keys are in each
command to implement proper routing to cluster nodes.

Redis commands can have multiple keys, keys at offset steps, or other
issues where you can't always assume the first element after
the command name is the cluster routing key.

Using the information exposed by COMMANDS, client implementations
can have live, accurate key extraction details for all commands.

Also implements COMMANDS INFO [commands...] to return only a
specific set of commands instead of all 160+ commands live in Redis.
2014-06-27 18:39:33 +02:00
antirez
b411678679 No more trailing spaces in Redis source code. 2014-06-26 18:48:47 +02:00
Matt Stancliff
ddad61bfcc Cancel SHUTDOWN if initial AOF is being written
Fixes #1826 (and many other reports of the same problem)
2014-06-23 10:09:22 +02:00
antirez
82caeee12e Allow to call ROLE in LOADING state. 2014-06-21 15:18:33 +02:00
antirez
b7dd451892 Assign an unique non-repeating ID to each new client.
This will be used by CLIENT KILL and is also a good way to ensure a
given client is still the same across CLIENT LIST calls.

The output of CLIENT LIST was modified to include the new ID, but this
change is considered to be backward compatible as the API does not imply
you can do positional parsing, since each filed as a different name.
2014-06-21 15:18:32 +02:00
antirez
d97d1a6430 Client types generalized.
Because of output buffer limits Redis internals had this idea of type of
clients: normal, pubsub, slave. It is possible to set different output
buffer limits for the three kinds of clients.

However all the macros and API were named after output buffer limit
classes, while the idea of a client type is a generic one that can be
reused.

This commit does two things:

1) Rename the API and defines with more general names.
2) Change the class of clients executing the MONITOR command from "slave"
   to "normal".

"2" is a good idea because you want to have very special settings for
slaves, that are not a good idea for MONITOR clients that are instead
normal clients even if they are conceptually slave-alike (since it is a
push protocol).

The backward-compatibility breakage resulting from "2" is considered to
be minimal to care, since MONITOR is a debugging command, and because
anyway this change is not going to break the format or the behavior, but
just when a connection is closed on big output buffer issues.
2014-06-21 15:18:32 +02:00
antirez
9b460e803a ROLE command added.
The new ROLE command is designed in order to provide a client with
informations about the replication in a fast and easy to use way
compared to the INFO command where the same information is also
available.
2014-06-21 15:18:31 +02:00
antirez
9f5ab8699d Don't process min-slaves-to-write for slaves.
Replication is totally broken when a slave has this option, since it
stops accepting updates from masters.

This fixes issue #1434.
2014-06-05 10:50:33 +02:00
antirez
90bbf1159b redisLogFromHandler() format changed to match new logs format. 2014-05-23 09:26:04 +02:00
antirez
9996c37e74 Tag every log line with role.
Every log contains, just after the pid, a single character that provides
information about the role of an instance:

S - Slave
M - Master
C - Writing child
X - Sentinel
2014-05-23 09:26:04 +02:00
antirez
33f63ff988 Cluster: slave validity factor is now user configurable.
Check the commit changes in the example redis.conf for more information.
2014-05-23 09:26:04 +02:00
antirez
0a707babb1 RESTORE: reply with -BUSYKEY special error code.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
2014-05-12 15:56:17 +02:00
antirez
357b039fdb Accept multiple clients per iteration.
When the listening sockets readable event is fired, we have the chance
to accept multiple clients instead of accepting a single one. This makes
Redis more responsive when there is a mass-connect event (for example
after the server startup), and in workloads where a connect-disconnect
pattern is used often, so that multiple clients are waiting to be
accepted continuously.

As a side effect, this commit makes the LOADING, BUSY, and similar
errors much faster to deliver to the client, making Redis more
responsive when there is to return errors to inform the clients that the
server is blocked in an not interruptible operation.
2014-04-28 18:17:02 +02:00
antirez
00726b2e0e PFCOUNT support for multi-key union. 2014-04-18 16:14:34 +02:00
antirez
816e22bc6f ZREMRANGEBYLEX implemented. 2014-04-18 16:14:34 +02:00
antirez
5e21acec54 ZLEXCOUNT implemented.
Like ZCOUNT for lexicographical ranges.
2014-04-16 15:09:47 +02:00
antirez
e4c2afb9c0 User-defined switch point between sparse-dense HLL encodings. 2014-04-16 15:09:47 +02:00
antirez
64fbd15706 Mark PFDEBUG as write command in the commands table.
It is safer since it is able to have side effects.
2014-04-16 15:09:47 +02:00
antirez
2c4a1eccda PFDEBUG added, PFGETREG removed.
PFDEBUG will be the interface to do debugging tasks with a key
containing an HLL object.
2014-04-16 15:09:46 +02:00
antirez
437cddee0d Add casting to match printf format.
adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were
assuming uint64_t is the same as unsigned long long, which is true
probably for all the systems out there that we target, but still GCC
emitted a warning since technically they are two different types.
2014-04-16 15:09:46 +02:00
antirez
109ec9c6a5 ZRANGEBYLEX and ZREVRANGEBYLEX implementation. 2014-04-16 15:09:46 +02:00
antirez
aecb59b0b2 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-16 15:09:45 +02:00
antirez
08516c1ab6 HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-04-16 15:09:45 +02:00
antirez
e262442fd4 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-04-16 15:09:45 +02:00
antirez
691846e38d HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-04-16 15:09:45 +02:00
antirez
93e5876a72 HLLCOUNT implemented. 2014-04-16 15:09:44 +02:00
antirez
bb57749a86 HLLADD implemented. 2014-04-16 15:09:44 +02:00
antirez
c7344144f4 HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-04-16 15:09:44 +02:00
antirez
046e691a6f Fix off by one bug in freeMemoryIfNeeded() eviction pool.
Bug found by the continuous integration test running the Redis
with valgrind:

==6245== Invalid read of size 8
==6245==    at 0x4C2DEEF: memcpy@GLIBC_2.2.5 (mc_replace_strmem.c:876)
==6245==    by 0x41F9E6: freeMemoryIfNeeded (redis.c:3010)
==6245==    by 0x41D2CC: processCommand (redis.c:2069)

memmove() size argument was accounting for an extra element, going
outside the bounds of the array.
2014-03-25 10:32:50 +01:00
antirez
c69c40a4ae adjustOpenFilesLimit() refactoring.
In this commit:
* Decrement steps are semantically differentiated from the reserved FDs.
  Previously both values were 32 but the meaning was different.
* Make it clear that we save setrlimit errno.
* Don't explicitly handle wrapping of 'f', but prevent it from
  happening.
* Add comments to make the function flow more readable.

This integrates PR #1630
2014-03-25 09:07:17 +01:00
Matt Stancliff
b2a02cc92d Fix potentially incorrect errno usage
errno may be reset by the previous call to redisLog, so capture
the original value for proper error reporting.
2014-03-25 09:07:17 +01:00
Matt Stancliff
01fe750c6a Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-25 09:07:17 +01:00
Matt Stancliff
1e7b99809b Fix infinite loop on startup if ulimit too low
Fun fact: rlim_t is an unsigned long long on all platforms.

Continually subtracting from a rlim_t makes it get smaller
and smaller until it wraps, then you're up to 2^64-1.

This was causing an infinite loop on Redis startup if
your ulimit was extremely (almost comically) low.

The case of (f > oldlimit) would never be met in a case like:

    f = 150
    while (f > 20) f -= 128

Since f is unsigned, it can't go negative and would
take on values of:

    Iteration 1: 150 - 128 => 22
    Iteration 2:  22 - 128 => 18446744073709551510
    Iterations 3-∞: ...

To catch the wraparound, we use the previous value of f
stored in limit.rlimit_cur.  If we subtract from f and
get a larger number than the value it had previously,
we print an error and exit since we don't have enough
file descriptors to help the user at this point.

Thanks to @bs3g for the inspiration to fix this problem.
Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
parts of Redis simultaneously, so I didn't get a chance to use them.
2014-03-25 09:07:17 +01:00
Matt Stancliff
f701a347b8 Improve error handling around setting ulimits
The log messages about open file limits have always
been slightly opaque and confusing.  Here's an attempt to
fix their wording, detail, and meaning.  Users will have a
better understanding of how to fix very common problems
with these reworded messages.

Also, we handle a new error case when maxclients becomes less
than one, essentially rendering the server unusable.  We
now exit on startup instead of leaving the user with a server
unable to handle any connections.

This fixes antirez#356 as well.
2014-03-25 09:07:17 +01:00
Matt Stancliff
6f4be45997 Replace magic 32 with REDIS_EVENTLOOP_FDSET_INCR
32 was the additional number of file descriptors Redis
would reserve when managing a too-low ulimit.  The
number 32 was in too many places statically, so now
we use a macro instead that looks more appropriate.

When Redis sets up the server event loop, it uses:
    server.maxclients+REDIS_EVENTLOOP_FDSET_INCR

So, when reserving file descriptors, it makes sense to
reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
of only 32.  Currently, REDIS_EVENTLOOP_FDSET_INCR is
set to 128 in redis.h.

Also, I replaced the static 128 in the while f < old loop
with REDIS_EVENTLOOP_FDSET_INCR as well, which results
in no change since it was already 128.

Impact: Users now need at least maxclients+128 as
their open file limit instead of maxclients+32 to obtain
actual "maxclients" number of clients.  Redis will carve
the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
needs out of the "maxclients" range instead of failing
to start (unless the local ulimit -n is too low to accomidate
the request).
2014-03-25 09:07:17 +01:00
Jan-Erik Rediger
9fa96697a9 Fixed a few typos. 2014-03-24 21:10:00 +01:00
antirez
2dd8c46249 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:03:41 +01:00
antirez
571d6b01de Cache uname() output across INFO calls.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.

The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
2014-03-24 10:02:55 +01:00
antirez
fd5e8c0132 Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-21 09:57:02 +01:00
antirez
10c8d86242 struct dictEntry -> dictEntry. 2014-03-21 09:57:02 +01:00
antirez
c641074a21 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-21 09:57:02 +01:00
antirez
205c2ccc0c Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-21 09:57:02 +01:00
antirez
8f0b74910a Specify LRU resolution in milliseconds. 2014-03-21 09:57:02 +01:00
antirez
8b6a674a5d Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-21 09:57:02 +01:00
antirez
e917de12d0 Cluster: flag the transaction as dirty for the new redirections. 2014-03-11 15:19:00 +01:00
antirez
399fca8f45 Cluster: SORT get keys helper implemented. 2014-03-11 11:10:33 +01:00
antirez
81efa0d296 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-11 11:10:09 +01:00
antirez
a2a72b87e0 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-11 11:10:09 +01:00
antirez
aa5898f53e Redis Cluster: support for multi-key operations. 2014-03-11 11:10:09 +01:00
Matt Stancliff
6f4b5ef6d5 Fix "can't bind to address" error reporting.
Report the actual port used for the listening attempt instead of
server.port.

Originally, Redis would just listen on server.port.
But, with clustering, Redis uses a Cluster Port too,
so we can't say server.port is always where we are listening.

If you tried to launch Redis with a too-high port number (any
port where Port+10000 > 65535), Redis would refuse to start, but
only print an error saying it can't connect to the Redis port.

This patch fixes much confusions.
2014-03-11 11:09:37 +01:00