2244 Commits

Author SHA1 Message Date
antirez
a10bfded15 PFCOUNT: always unshare/decode the object.
This will be a non-op most of the times since the object will be
unshared / decoded, however it is more technically correct to start this
way since the object may be decoded even in the read-only code path.
2014-04-16 15:26:09 +02:00
antirez
ee764b0f8d Changed HyperLogLog hash seed to a non-zero value.
Using a seed of zero has the side effect of having the empty string
hashing to what is a very special case in the context of HyperLogLog: a
very long run of zeroes.

This did not influenced the correctness of the result with 16k registers
because of the harmonic mean, but still it is inconvenient that a so
obvious value maps to a so special hash.

The seed 0xadc83b19 is used instead, which is the first 64 bits of the
SHA1 of the empty string.

Reference: issue #1657.
2014-04-16 15:26:09 +02:00
antirez
a97675923d Return "WRONGTYPE" error on PF* type mismatch. 2014-04-16 15:26:09 +02:00
antirez
faa7f259cc Fix PFADD infinite loop.
We need to guarantee that the last bit is 1, otherwise an element may
hash to just zeroes with probability 1/(2^64) and trigger an infinite
loop.

See issue #1657.
2014-04-16 15:26:09 +02:00
antirez
bdd8701c60 Remove HyperLogLog type checking duplicated code. 2014-04-16 15:26:09 +02:00
antirez
b3baa51403 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-16 15:26:09 +02:00
antirez
348d5ea246 PFCOUNT: unshare the object when cached cardinality is modified. 2014-04-16 15:26:09 +02:00
antirez
ba0dc0be2c PFSELFTEST improved to test the approximation error. 2014-04-16 15:26:09 +02:00
antirez
ff551fad24 HyperLogLog: added magic / version.
This will allow future changes like compressed representations.
Currently the magic is not checked for performance reasons but this may
change in the future, for example if we add new types encoded in strings
that may have the same size of HyperLogLogs.
2014-04-16 15:26:09 +02:00
Raymond Myers
e08100a9ba Fixed pfadd/pfcount commands emitting hll* events instead of pf* events 2014-04-16 15:26:09 +02:00
Raymond Myers
23acae6e1c Change HLL* to PF* in error messages 2014-04-16 15:26:09 +02:00
antirez
e3773b0a0e Include redis.h before other stuff in hyperloglog.c.
Otherwise fmacros.h is included later and this may break compilation on
different systems.
2014-04-16 15:26:09 +02:00
antirez
badf23f57b HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-04-16 15:26:03 +02:00
antirez
6e71459c73 Makefile.dep updated with hyperloglog.o deps. 2014-04-16 15:25:28 +02:00
antirez
2ee2bec2d6 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-04-16 15:24:15 +02:00
antirez
1e90c0066c HLLMERGE fixed by adding a... missing loop! 2014-04-16 15:23:21 +02:00
antirez
f2e707b679 HyperLogLog apply bias correction using a polynomial.
Better results can be achieved by compensating for the bias of the raw
approximation just after 2.5m (when LINEARCOUNTING is no longer used) by
using a polynomial that approximates the bias at a given cardinality.

The curve used was found using this web page:

    http://www.xuru.org/rt/PR.asp

That performs polynomial regression given a set of values.
2014-04-16 15:23:21 +02:00
antirez
be7fe2b92b HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-04-16 15:23:21 +02:00
antirez
39dd1f648d HLLCOUNT is technically a write command
When we update the cached value, we need to propagate the command and
signal the key as modified for WATCH.
2014-04-16 15:23:21 +02:00
antirez
f07b1514bb HLLADD: propagate write when only variable name is given.
The following form is given:

    HLLADD myhll

No element is provided in the above case so if 'myhll' var does not
exist the result is to just create an empty HLL structure, and no update
will be performed on the registers.

In this case, the DB should still be set dirty and the command
propagated.
2014-04-16 15:23:21 +02:00
antirez
aad5959ddc HyperLogLog: use LINEARCOUNTING up to 3m.
The HyperLogLog original paper suggests using LINEARCOUNTING for
cardinalities < 2.5m, however for P=14 the median / max error
curves show that a value of '3' is the best pick for m = 16384.
2014-04-16 15:23:21 +02:00
antirez
c3df5965f3 HyperLogLog approximated cardinality caching.
The more we add elements to an HyperLogLog counter, the smaller is
the probability that we actually update some register.

From this observation it is easy to see how it is possible to use
caching of a previously computed cardinality and reuse it to serve
HLLCOUNT queries as long as no register was updated in the data
structure.

This commit does exactly this by using just additional 8 bytes for the
data structure to store a 64 bit unsigned integer value cached
cardinality. When the most significant bit of the 64 bit integer is set,
it means that the value computed is no longer usable since at least a
single register was modified and we need to recompute it at the next
call of HLLCOUNT.

The value is always stored in little endian format regardless of the
actual CPU endianess.
2014-04-16 15:23:21 +02:00
antirez
9e178afae9 String value unsharing refactored into proper function.
All the Redis functions that need to modify the string value of a key in
a destructive way (APPEND, SETBIT, SETRANGE, ...) require to make the
object unshared (if refcount > 1) and encoded in raw format (if encoding
is not already REDIS_ENCODING_RAW).

This was cut & pasted many times in multiple places of the code. This
commit puts the small logic needed into a function called
dbUnshareStringValue().
2014-04-16 15:22:56 +02:00
antirez
1c795db9d0 Use endian neutral hash function for HyperLogLog.
We need to be sure that you can save a dataset in a Redis instance,
reload it in a different architecture, and continue to count in the same
HyperLogLog structure.

So 32 and 64 bit, little or bit endian, must all guarantee to output the
same hash for the same element.
2014-04-16 15:19:40 +02:00
antirez
7f30998432 HyperLogLog internal representation modified.
The new representation is more obvious, starting from the LSB of the
first byte and using bits going to MSB, and passing to next byte as
needed.

There was also a subtle error: first two bits were unused, everything
was carried over on the right of two bits, even if it worked because of
the code requirement of always having a byte more at the end.

During the rewrite the code was made safer trying to avoid undefined
behavior due to shifting an uint8_t for more than 8 bits.
2014-04-16 15:19:40 +02:00
antirez
430d0ade75 Remove a few useless operations from hllCount() fast path. 2014-04-16 15:19:40 +02:00
antirez
83650a72a1 HLLCOUNT 3x faster taking fast path for default params. 2014-04-16 15:19:40 +02:00
antirez
3afd25ecc7 Use processor base types in HLL_(GET|SET)_REGISTER.
This speedups the macros by a noticeable factor.
2014-04-16 15:19:40 +02:00
antirez
0ffd5e4a30 HyperLogLog: use precomputed table for 2^(-M[i]). 2014-04-16 15:19:40 +02:00
antirez
42592a7faa HyperLogLog algorithm fixed in two ways.
There was an error in the computation of 2^register, and the sequence of
zeroes computed after the hashing did not included the "1".
2014-04-16 15:19:40 +02:00
antirez
3035075b2c HLLCOUNT implemented. 2014-04-16 15:19:40 +02:00
antirez
959d0f012a HLLADD implemented. 2014-04-16 15:19:33 +02:00
antirez
ac29bb2ae4 hllAdd() low level HyperLogLog "add" implemented. 2014-04-16 15:19:05 +02:00
antirez
2a91f548c4 HyperLogLog: redefine constants using "P". 2014-04-16 15:19:05 +02:00
antirez
3f159bbd44 HLL_SET_REGISTER fixed.
There was an error in the first version of the macro.
Now the HLLSELFTEST test reports success.
2014-04-16 15:19:05 +02:00
antirez
a18e880bf1 Use REDIS_HLL_REGISTER_MAX when possible. 2014-04-16 15:19:05 +02:00
antirez
00409d8742 HLL_(SET|GET)_REGISTER types fixed. 2014-04-16 15:19:05 +02:00
antirez
de3f821af6 HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-04-16 15:18:57 +02:00
antirez
02d88fb201 HyperLogLog: initial sketch of registers access. 2014-04-16 15:17:26 +02:00
antirez
d2e59c2715 Redis 2.8.8. 2014-03-25 11:30:42 +01:00
antirez
3580bb485a adjustOpenFilesLimit() refactoring.
In this commit:
* Decrement steps are semantically differentiated from the reserved FDs.
  Previously both values were 32 but the meaning was different.
* Make it clear that we save setrlimit errno.
* Don't explicitly handle wrapping of 'f', but prevent it from
  happening.
* Add comments to make the function flow more readable.

This integrates PR #1630
2014-03-25 09:07:21 +01:00
Matt Stancliff
c3510af1c0 Fix potentially incorrect errno usage
errno may be reset by the previous call to redisLog, so capture
the original value for proper error reporting.
2014-03-25 09:07:21 +01:00
Matt Stancliff
771f8ad0e7 Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-25 09:07:21 +01:00
Matt Stancliff
3ce742d1d5 Fix infinite loop on startup if ulimit too low
Fun fact: rlim_t is an unsigned long long on all platforms.

Continually subtracting from a rlim_t makes it get smaller
and smaller until it wraps, then you're up to 2^64-1.

This was causing an infinite loop on Redis startup if
your ulimit was extremely (almost comically) low.

The case of (f > oldlimit) would never be met in a case like:

    f = 150
    while (f > 20) f -= 128

Since f is unsigned, it can't go negative and would
take on values of:

    Iteration 1: 150 - 128 => 22
    Iteration 2:  22 - 128 => 18446744073709551510
    Iterations 3-∞: ...

To catch the wraparound, we use the previous value of f
stored in limit.rlimit_cur.  If we subtract from f and
get a larger number than the value it had previously,
we print an error and exit since we don't have enough
file descriptors to help the user at this point.

Thanks to @bs3g for the inspiration to fix this problem.
Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
parts of Redis simultaneously, so I didn't get a chance to use them.
2014-03-25 09:07:21 +01:00
Matt Stancliff
b59585396b Improve error handling around setting ulimits
The log messages about open file limits have always
been slightly opaque and confusing.  Here's an attempt to
fix their wording, detail, and meaning.  Users will have a
better understanding of how to fix very common problems
with these reworded messages.

Also, we handle a new error case when maxclients becomes less
than one, essentially rendering the server unusable.  We
now exit on startup instead of leaving the user with a server
unable to handle any connections.

This fixes antirez#356 as well.
2014-03-25 09:07:21 +01:00
Matt Stancliff
6826af1b50 Replace magic 32 with REDIS_EVENTLOOP_FDSET_INCR
32 was the additional number of file descriptors Redis
would reserve when managing a too-low ulimit.  The
number 32 was in too many places statically, so now
we use a macro instead that looks more appropriate.

When Redis sets up the server event loop, it uses:
    server.maxclients+REDIS_EVENTLOOP_FDSET_INCR

So, when reserving file descriptors, it makes sense to
reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
of only 32.  Currently, REDIS_EVENTLOOP_FDSET_INCR is
set to 128 in redis.h.

Also, I replaced the static 128 in the while f < old loop
with REDIS_EVENTLOOP_FDSET_INCR as well, which results
in no change since it was already 128.

Impact: Users now need at least maxclients+128 as
their open file limit instead of maxclients+32 to obtain
actual "maxclients" number of clients.  Redis will carve
the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
needs out of the "maxclients" range instead of failing
to start (unless the local ulimit -n is too low to accomidate
the request).
2014-03-25 09:07:21 +01:00
Matt Stancliff
611372fa97 Fix maxclients error handling
Everywhere in the Redis code base, maxclients is treated
as an int with (int)maxclients or `maxclients = atoi(source)`,
so let's make maxclients an int.

This fixes a bug where someone could specify a negative maxclients
on startup and it would work (as well as set maxclients very high)
because:

    unsigned int maxclients;
    char *update = "-300";
    maxclients = atoi(update);
    if (maxclients < 1) goto fail;

But, (maxclients < 1) can only catch the case when maxclients
is exactly 0.  maxclients happily sets itself to -300, which isn't
-300, but rather 4294966996, which isn't < 1, so... everything
"worked."

maxclients config parsing checks for the case of < 1, but maxclients
CONFIG SET parsing was checking for case of < 0 (allowing
maxclients to be set to 0).  CONFIG SET parsing is now updated to
match config parsing of < 1.

It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.

These changes were inspired by antirez#356, but this doesn't
fix that issue.
2014-03-25 09:07:21 +01:00
Matt Stancliff
3bd3240660 Sentinel: remove variable causing warning
GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;
2014-03-25 08:06:04 +01:00
antirez
79349affd5 Fixed undefined variable value with certain code paths.
In sentinelFlushConfig() fd could be undefined when the following if
statement was true:

        if (rewrite_status == -1) goto werr;

This could cause random file descriptors to get closed.
2014-03-25 08:03:11 +01:00
Matt Stancliff
80dec5e4df Sentinel: Notify user when config can't be saved 2014-03-25 08:03:07 +01:00