mirror of
https://github.com/fluencelabs/redis
synced 2025-06-16 10:41:22 +00:00
PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between slaves and masters (without the need of a full resync with RDB transfer and so forth), work in a number of cases when it was impossible in the past. For instance: 1. When a slave is promoted to mastrer, the slaves of the old master can partially resynchronize with the new master. 2. Chained slalves (slaves of slaves) can be moved to replicate to other slaves or the master itsef, without requiring a full resync. 3. The master itself, after being turned into a slave, is able to partially resynchronize with the new master, when it joins replication again. In order to obtain this, the following main changes were operated: * Slaves also take a replication backlog, not just masters. * Same stream replication for all the slaves and sub slaves. The replication stream is identical from the top level master to its slaves and is also the same from the slaves to their sub-slaves and so forth. This means that if a slave is later promoted to master, it has the same replication backlong, and can partially resynchronize with its slaves (that were previously slaves of the old master). * A given replication history is no longer identified by the `runid` of a Redis node. There is instead a `replication ID` which changes every time the instance has a new history no longer coherent with the past one. So, for example, slaves publish the same replication history of their master, however when they are turned into masters, they publish a new replication ID, but still remember the old ID, so that they are able to partially resynchronize with slaves of the old master (up to a given offset). * The replication protocol was slightly modified so that a new extended +CONTINUE reply from the master is able to inform the slave of a replication ID change. * REPLCONF CAPA is used in order to notify masters that a slave is able to understand the new +CONTINUE reply. * The RDB file was extended with an auxiliary field that is able to select a given DB after loading in the slave, so that the slave can continue receiving the replication stream from the point it was disconnected without requiring the master to insert "SELECT" statements. This is useful in order to guarantee the "same stream" property, because the slave must be able to accumulate an identical backlog. * Slave pings to sub-slaves are now sent in a special form, when the top-level master is disconnected, in order to don't interfer with the replication stream. We just use out of band "\n" bytes as in other parts of the Redis protocol. An old design document is available here: https://gist.github.com/antirez/ae068f95c0d084891305 However the implementation is not identical to the description because during the work to implement it, different changes were needed in order to make things working well.
This commit is contained in:
@ -352,6 +352,14 @@ void addReplySds(client *c, sds s) {
|
||||
}
|
||||
}
|
||||
|
||||
/* This low level function just adds whatever protocol you send it to the
|
||||
* client buffer, trying the static buffer initially, and using the string
|
||||
* of objects if not possible.
|
||||
*
|
||||
* It is efficient because does not create an SDS object nor an Redis object
|
||||
* if not needed. The object will only be created by calling
|
||||
* _addReplyStringToList() if we fail to extend the existing tail object
|
||||
* in the list of objects. */
|
||||
void addReplyString(client *c, const char *s, size_t len) {
|
||||
if (prepareClientToWrite(c) != C_OK) return;
|
||||
if (_addReplyToBuffer(c,s,len) != C_OK)
|
||||
@ -1022,7 +1030,7 @@ int processInlineBuffer(client *c) {
|
||||
char *newline;
|
||||
int argc, j;
|
||||
sds *argv, aux;
|
||||
size_t querylen;
|
||||
size_t querylen, protolen;
|
||||
|
||||
/* Search for end of line */
|
||||
newline = strchr(c->querybuf,'\n');
|
||||
@ -1035,6 +1043,7 @@ int processInlineBuffer(client *c) {
|
||||
}
|
||||
return C_ERR;
|
||||
}
|
||||
protolen = (newline - c->querybuf)+1; /* Total protocol bytes of command. */
|
||||
|
||||
/* Handle the \r\n case. */
|
||||
if (newline && newline != c->querybuf && *(newline-1) == '\r')
|
||||
@ -1057,6 +1066,15 @@ int processInlineBuffer(client *c) {
|
||||
if (querylen == 0 && c->flags & CLIENT_SLAVE)
|
||||
c->repl_ack_time = server.unixtime;
|
||||
|
||||
/* Newline from masters can be used to prevent timeouts, but should
|
||||
* not affect the replication offset since they are always sent
|
||||
* "out of band" directly writing to the socket and without passing
|
||||
* from the output buffers. */
|
||||
if (querylen == 0 && c->flags & CLIENT_MASTER) {
|
||||
c->reploff -= protolen;
|
||||
while (protolen--) chopReplicationBacklog();
|
||||
}
|
||||
|
||||
/* Leave data after the first line of the query in the buffer */
|
||||
sdsrange(c->querybuf,querylen+2,-1);
|
||||
|
||||
@ -1321,7 +1339,11 @@ void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {
|
||||
|
||||
sdsIncrLen(c->querybuf,nread);
|
||||
c->lastinteraction = server.unixtime;
|
||||
if (c->flags & CLIENT_MASTER) c->reploff += nread;
|
||||
if (c->flags & CLIENT_MASTER) {
|
||||
c->reploff += nread;
|
||||
replicationFeedSlavesFromMasterStream(server.slaves,
|
||||
c->querybuf+qblen,nread);
|
||||
}
|
||||
server.stat_net_input_bytes += nread;
|
||||
if (sdslen(c->querybuf) > server.client_max_querybuf_len) {
|
||||
sds ci = catClientInfoString(sdsempty(),c), bytes = sdsempty();
|
||||
|
Reference in New Issue
Block a user