375 Commits

Author SHA1 Message Date
Matt Stancliff
771f8ad0e7 Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-25 09:07:21 +01:00
Matt Stancliff
611372fa97 Fix maxclients error handling
Everywhere in the Redis code base, maxclients is treated
as an int with (int)maxclients or `maxclients = atoi(source)`,
so let's make maxclients an int.

This fixes a bug where someone could specify a negative maxclients
on startup and it would work (as well as set maxclients very high)
because:

    unsigned int maxclients;
    char *update = "-300";
    maxclients = atoi(update);
    if (maxclients < 1) goto fail;

But, (maxclients < 1) can only catch the case when maxclients
is exactly 0.  maxclients happily sets itself to -300, which isn't
-300, but rather 4294966996, which isn't < 1, so... everything
"worked."

maxclients config parsing checks for the case of < 1, but maxclients
CONFIG SET parsing was checking for case of < 0 (allowing
maxclients to be set to 0).  CONFIG SET parsing is now updated to
match config parsing of < 1.

It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.

These changes were inspired by antirez#356, but this doesn't
fix that issue.
2014-03-25 09:07:21 +01:00
antirez
4ebc7e3738 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:03:44 +01:00
antirez
001775f5fb Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-21 11:19:28 +01:00
antirez
c68189a19f Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-21 11:17:52 +01:00
antirez
ff8c81873a Set LRU parameters via REDIS_LRU_BITS define. 2014-03-21 11:17:36 +01:00
antirez
e3b71a1c02 Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-21 11:16:12 +01:00
zhanghailei
ca0720b694 refer to updateLRUClock's comment REDIS_LRU_CLOCK_MAX is 22 bits,but #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) only 21 bits 2014-03-11 10:12:00 +01:00
antirez
25e2791ec3 Initial implementation of BITPOS.
It appears to work but more stress testing, and both unit tests and
fuzzy testing, is needed in order to ensure the implementation is sane.
2014-02-27 15:54:31 +01:00
antirez
85492dcfef Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:35 +01:00
antirez
fadbbdd3f4 AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:57:13 +01:00
antirez
ddcf160309 Move mstime_t define outside sentinel.c.
The define is now used in other parts of Redis 2.8 tree instead of long
long.

A nice side effect is that now 2.8 and unstable sentinel.c files are
identical as it should be.
2014-02-03 16:34:46 +01:00
antirez
3da5cbe5bb Scripting: use mstime() and mstime_t for lua_time_start.
server.lua_time_start is expressed in milliseconds. Use mstime_t instead
of long long, and populate it with mstime() instead of ustime()/1000.

Functionally identical but more natural.
2014-02-03 15:46:47 +01:00
antirez
917b851491 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 15:03:22 +01:00
Nenad Merdanovic
8dda9dbef0 Add support for listen(2) backlog definition
In high RPS environments, the default listen backlog is not sufficient, so
giving users the power to configure it is the right approach, especially
since it requires only minor modifications to the code.
2014-01-31 15:03:19 +01:00
antirez
4ea91aadfb Set REDIS_AOF_REWRITE_MIN_SIZE to 64mb.
64mb is the default value in redis.conf. For some reason instead the
hard-coded default was 1mb that is too small.
2014-01-14 11:28:18 +01:00
antirez
2a1a31ca9d Don't send REPLCONF ACK to old masters.
Masters not understanding REPLCONF ACK will reply with errors to our
requests causing a number of possible issues.

This commit detects a global replication offest set to -1 at the end of
the replication, and marks the client representing the master with the
REDIS_PRE_PSYNC flag.

Note that this flag was called REDIS_PRE_PSYNC_SLAVE but now it is just
REDIS_PRE_PSYNC as it is used for both slaves and masters starting with
this commit.

This commit fixes issue #1488.
2014-01-08 14:27:49 +01:00
Yubao Liu
d62bbfda16 CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 16:03:37 +01:00
Yossi Gottlieb
25ba2e9607 Fix wrong repldboff type which causes dropped replication in rare cases. 2013-12-11 11:37:58 +01:00
antirez
563d6b3f98 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:42:22 +01:00
antirez
b6610a569d dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:18:36 +01:00
antirez
3d7263aa41 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:15 +01:00
antirez
50d140e90b SLAVEOF command refactored into a proper API.
We now have replicationSetMaster() and replicationUnsetMaster() that can
be called in other contexts (for instance Redis Cluster).
2013-11-28 16:19:16 +01:00
antirez
4bd1dd1c53 Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 15:24:22 +01:00
antirez
93d924ff1c Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync. 2013-11-21 15:22:11 +01:00
antirez
a52909c5f2 Sentinel: CONFIG REWRITE support for Sentinel config. 2013-11-21 15:22:07 +01:00
antirez
060d56e7eb SCAN code refactored to parse cursor first.
The previous implementation of SCAN parsed the cursor in the generic
function implementing SCAN, SSCAN, HSCAN and ZSCAN.

The actual higher-level command implementation only checked for empty
keys and return ASAP in that case. The result was that inverting the
arguments of, for instance, SSCAN for example and write:

    SSCAN 0 key

Instead of

    SSCAN key 0

Resulted into no error, since 0 is a non-existing key name very likely.
Just the iterator returned no elements at all.

In order to fix this issue the code was refactored to extract the
function to parse the cursor and return the error. Every higher level
command implementation now parses the cursor and later checks if the key
exist or not.
2013-11-05 17:25:29 +01:00
antirez
35816a2e4d ZSCAN implemented. 2013-10-29 16:21:18 +01:00
antirez
1cc7e646af HSCAN implemented. 2013-10-29 16:21:13 +01:00
antirez
24ecabc995 SSCAN implemented. 2013-10-29 16:21:01 +01:00
Pieter Noordhuis
0dd95e23ab Add SCAN command 2013-10-29 16:18:24 +01:00
antirez
fa48b1fa32 Add per-db average TTL information in INFO output.
Example:

db0:keys=221913,expires=221913,avg_ttl=655

The algorithm uses a running average with only two samples (current and
previous). Keys found to be expired are considered at TTL zero even if
the actual TTL can be negative.

The TTL is reported in milliseconds.
2013-08-06 15:36:43 +02:00
antirez
00c8cfef74 Some activeExpireCycle() refactoring. 2013-08-06 15:36:35 +02:00
antirez
500155b91b Draft #1 of a new expired keys collection algorithm.
The main idea here is that when we are no longer to expire keys at the
rate the are created, we can't block more in the normal expire cycle as
this would result in too big latency spikes.

For this reason the commit introduces a "fast" expire cycle that does
not run for more than 1 millisecond but is called in the beforeSleep()
hook of the event loop, so much more often, and with a frequency bound
to the frequency of executed commnads.

The fast expire cycle is only called when the standard expiration
algorithm runs out of time, that is, consumed more than
REDIS_EXPIRELOOKUPS_TIME_PERC of CPU in a given cycle without being able
to take the number of already expired keys that are yet not collected
to a number smaller than 25% of the number of keys.

You can test this commit with different loads, but a simple way is to
use the following:

Extreme load with pipelining:

redis-benchmark -r 100000000 -n 100000000  \
        -P 32 set ele:rand:000000000000 foo ex 2

Remove the -P32 in order to avoid the pipelining for a more real-world
load.

In another terminal tab you can monitor the Redis behavior with:

redis-cli -i 0.1 -r -1 info keyspace

and

redis-cli --latency-history

Note: this commit will make Redis printing a lot of debug messages, it
is not a good idea to use it in production.
2013-08-06 15:36:23 +02:00
yoav
9f6f436a51 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:59 +02:00
antirez
18fabeb264 SORT ALPHA: use collation instead of binary comparison.
Note that we only do it when STORE is not used, otherwise we want an
absolutely locale independent and binary safe sorting in order to ensure
AOF / replication consistency.

This is probably an unexpected behavior violating the least surprise
rule, but there is currently no other simple / good alternative.
2013-07-12 13:39:44 +02:00
antirez
d8fcbb6645 Fixed compareStringObject() and introduced collateStringObject().
compareStringObject was not always giving the same result when comparing
two exact strings, but encoded as integers or as sds strings, since it
switched to strcmp() when at least one of the strings were not sds
encoded.

For instance the two strings "123" and "123\x00456", where the first
string was integer encoded, would result into the old implementation of
compareStringObject() to return 0 as if the strings were equal, while
instead the second string is "greater" than the first in a binary
comparison.

The same compasion, but with "123" encoded as sds string, would instead
return a value < 0, as it is correct. It is not impossible that the
above caused some obscure bug, since the comparison was not always
deterministic, and compareStringObject() is used in the implementation
of skiplists, hash tables, and so forth.

At the same time, collateStringObject() was introduced by this commit, so
that can be used by SORT command to return sorted strings usign
collation instead of binary comparison. See next commit.
2013-07-12 13:39:40 +02:00
antirez
3472d045d8 getClientPeerId() refactored into two functions. 2013-07-11 17:09:14 +02:00
antirez
da18366609 getClientPeerId() now reports errors.
We now also use it in CLIENT KILL implementation.
2013-07-11 17:09:10 +02:00
antirez
4fa68b2815 getClientPeerID introduced.
The function returns an unique identifier for the client, as ip:port for
IPv4 and IPv6 clients, or as path:0 for Unix socket clients.

See the top comment in the function for more info.
2013-07-11 17:09:04 +02:00
Geoff Garside
68d72aa5b1 Add macro to define clusterNode.ip buffer size.
Add REDIS_CLUSTER_IPLEN macro to define the size of the clusterNode ip
character array. Additionally use this macro in inet_ntop(3) calls where
the size of the array was being defined manually.

The REDIS_CLUSTER_IPLEN is defined as INET_ADDRSTRLEN which defines the
correct size of a buffer to store an IPv4 address in. The
INET_ADDRSTRLEN macro itself is defined in the <netinet/in.h> header
file and should be portable across the majority of systems.
2013-07-11 17:04:32 +02:00
antirez
fc022ca300 Binding multiple IPs done properly with multiple sockets. 2013-07-08 10:30:54 +02:00
antirez
6dabd34ad0 Ability to bind multiple addresses. 2013-07-08 10:27:21 +02:00
antirez
8090c37f71 CONFIG SET maxclients. 2013-07-01 10:44:08 +02:00
antirez
bb0d0fd479 function renamed: popcount_binary -> redisPopcount. 2013-06-26 15:24:58 +02:00
antirez
cdf79c063f Don't disconnect pre PSYNC replication clients for timeout.
Clients using SYNC to replicate are older implementations, such as
redis-cli --slave, and are not designed to acknowledge the master with
REPLCONF ACK commands, so we don't have any feedback and should not
disconnect them on timeout.
2013-06-26 15:24:30 +02:00
antirez
545fe0c318 Use the RSC to replicate EVALSHA unmodified.
This commit uses the Replication Script Cache in order to avoid
translating EVALSHA into EVAL whenever possible for both the AOF and
slaves.
2013-06-26 15:23:29 +02:00
antirez
9d894b1b8c Replication of scripts as EVALSHA: sha1 caching implemented.
This code is only responsible to take an LRU-evicted fixed length cache
of SHA1 that we are sure all the slaves received.

In this commit only the implementation is provided, but the Redis core
does not use it to actually send EVALSHA to slaves when possible.
2013-06-26 15:23:15 +02:00
antirez
8328d993e1 New API to force propagation.
The old REDIS_CMD_FORCE_REPLICATION flag was removed from the
implementation of Redis, now there is a new API to force specific
executions of a command to be propagated to AOF / Replication link:

    void forceCommandPropagation(int flags);

The new API is also compatible with Lua scripting, so a script that will
execute commands that are forced to be propagated, will also be
propagated itself accordingly even if no change to data is operated.

As a side effect, this new design fixes the issue with scripts not able
to propagate PUBLISH to slaves (issue #873).
2013-06-26 15:21:55 +02:00
antirez
a8f1474dfd PUBSUB command implemented.
Currently it implements three subcommands:

PUBSUB CHANNELS [<pattern>]    List channels with non-zero subscribers.
PUBSUB NUMSUB [channel_1 ...]  List number of subscribers for channels.
PUBSUB NUMPAT                  Return number of subscribed patterns.
2013-06-26 15:21:08 +02:00