226 lines
8.5 KiB
Rust
Raw Normal View History

// Copyright 2019 Parity Technologies (UK) Ltd.
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
// OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
// DEALINGS IN THE SOFTWARE.
#![cfg(test)]
Improve XOR metric. (#1108) There are two issues with the current definition and use of Kademlia's XOR metric: 1. The distance is currently equated with the bucket index, i.e. `distance(a,b) - 1` is the index of the bucket into which either peer is put by the other. The result is a metric that is not unidirectional, as defined in the Kademlia paper and as implemented in e.g. libp2p-go and libp2p-js, which is to interpret the result of the XOR as an integer in its entirety. 2. The current `KBucketsPeerId` trait and its instances allow computing distances between types with differing bit lengths as well as between types that hash all inputs again (i.e. `KadHash`) and "plain" `PeerId`s or `Multihash`es. This can result in computed distances that are either incorrect as per the requirement of the libp2p specs that all distances are to be computed from the XOR of the SHA256 of the input keys, or even fall outside of the image of the metric used for the `KBucketsTable`. In the latter case, such distances are not currently used as a bucket index - they can only occur in the context of comparing distances for the purpose of sorting peers - but that still seems undesirable. These issues are addressed here as follows: * Unidirectionality of the XOR metric is restored by keeping the "full" integer representation of the bitwise XOR. The result is an XOR metric as defined in the paper. This also opens the door to avoiding the "full table scan" when searching for the keys closest to a given key - the ideal order in which to visit the buckets can be computed with the help of the distance bit string. * As a simplification and to make it easy to "do the right thing", the XOR metric is only defined on an opaque `kbucket::Key` type, partially derived from the current `KadHash`. `KadHash` and `KBucketsPeerId` are removed.
2019-05-17 17:27:57 +02:00
use crate::{Kademlia, KademliaOut, kbucket::{self, Distance}};
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
use futures::{future, prelude::*};
use libp2p_core::{
PeerId,
Swarm,
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
Transport,
identity,
transport::{MemoryTransport, boxed::Boxed},
nodes::Substream,
multiaddr::{Protocol, multiaddr},
muxing::StreamMuxerBox,
upgrade,
};
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
use libp2p_secio::SecioConfig;
use libp2p_yamux as yamux;
use rand::random;
use std::{io, u64};
use tokio::runtime::Runtime;
type TestSwarm = Swarm<
Boxed<(PeerId, StreamMuxerBox), io::Error>,
Kademlia<Substream<StreamMuxerBox>>
>;
/// Builds swarms, each listening on a port. Does *not* connect the nodes together.
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
fn build_nodes(num: usize) -> (u64, Vec<TestSwarm>) {
let port_base = 1 + random::<u64>() % (u64::MAX - num as u64);
let mut result: Vec<Swarm<_, _>> = Vec::with_capacity(num);
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
for _ in 0 .. num {
// TODO: make creating the transport more elegant ; literaly half of the code of the test
// is about creating the transport
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let local_key = identity::Keypair::generate_ed25519();
let local_public_key = local_key.public();
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let transport = MemoryTransport::default()
.with_upgrade(SecioConfig::new(local_key))
.and_then(move |out, endpoint| {
let peer_id = out.remote_key.into_peer_id();
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let yamux = yamux::Config::default();
upgrade::apply(out.stream, yamux, endpoint)
.map(|muxer| (peer_id, StreamMuxerBox::new(muxer)))
})
.map_err(|e| panic!("Failed to create transport: {:?}", e))
.boxed();
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let kad = Kademlia::new(local_public_key.clone().into_peer_id());
result.push(Swarm::new(transport, kad, local_public_key.into_peer_id()));
}
let mut i = 0;
for s in result.iter_mut() {
Swarm::listen_on(s, Protocol::Memory(port_base + i).into()).unwrap();
i += 1
}
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
(port_base, result)
}
#[test]
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
fn query_iter() {
Improve XOR metric. (#1108) There are two issues with the current definition and use of Kademlia's XOR metric: 1. The distance is currently equated with the bucket index, i.e. `distance(a,b) - 1` is the index of the bucket into which either peer is put by the other. The result is a metric that is not unidirectional, as defined in the Kademlia paper and as implemented in e.g. libp2p-go and libp2p-js, which is to interpret the result of the XOR as an integer in its entirety. 2. The current `KBucketsPeerId` trait and its instances allow computing distances between types with differing bit lengths as well as between types that hash all inputs again (i.e. `KadHash`) and "plain" `PeerId`s or `Multihash`es. This can result in computed distances that are either incorrect as per the requirement of the libp2p specs that all distances are to be computed from the XOR of the SHA256 of the input keys, or even fall outside of the image of the metric used for the `KBucketsTable`. In the latter case, such distances are not currently used as a bucket index - they can only occur in the context of comparing distances for the purpose of sorting peers - but that still seems undesirable. These issues are addressed here as follows: * Unidirectionality of the XOR metric is restored by keeping the "full" integer representation of the bitwise XOR. The result is an XOR metric as defined in the paper. This also opens the door to avoiding the "full table scan" when searching for the keys closest to a given key - the ideal order in which to visit the buckets can be computed with the help of the distance bit string. * As a simplification and to make it easy to "do the right thing", the XOR metric is only defined on an opaque `kbucket::Key` type, partially derived from the current `KadHash`. `KadHash` and `KBucketsPeerId` are removed.
2019-05-17 17:27:57 +02:00
fn distances(key: &kbucket::Key<PeerId>, peers: Vec<PeerId>) -> Vec<Distance> {
peers.into_iter()
.map(kbucket::Key::from)
.map(|k| k.distance(key))
.collect()
}
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
fn run(n: usize) {
// Build `n` nodes. Node `n` knows about node `n-1`, node `n-1` knows about node `n-2`, etc.
// Node `n` is queried for a random peer and should return nodes `1..n-1` sorted by
// their distances to that peer.
let (port_base, mut swarms) = build_nodes(n);
let swarm_ids: Vec<_> = swarms.iter().map(Swarm::local_peer_id).cloned().collect();
// Connect each swarm in the list to its predecessor in the list.
for (i, (swarm, peer)) in &mut swarms.iter_mut().skip(1).zip(swarm_ids.clone()).enumerate() {
swarm.add_address(&peer, Protocol::Memory(port_base + i as u64).into())
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
}
// Ask the last peer in the list to search a random peer. The search should
// propagate backwards through the list of peers.
let search_target = PeerId::random();
Improve XOR metric. (#1108) There are two issues with the current definition and use of Kademlia's XOR metric: 1. The distance is currently equated with the bucket index, i.e. `distance(a,b) - 1` is the index of the bucket into which either peer is put by the other. The result is a metric that is not unidirectional, as defined in the Kademlia paper and as implemented in e.g. libp2p-go and libp2p-js, which is to interpret the result of the XOR as an integer in its entirety. 2. The current `KBucketsPeerId` trait and its instances allow computing distances between types with differing bit lengths as well as between types that hash all inputs again (i.e. `KadHash`) and "plain" `PeerId`s or `Multihash`es. This can result in computed distances that are either incorrect as per the requirement of the libp2p specs that all distances are to be computed from the XOR of the SHA256 of the input keys, or even fall outside of the image of the metric used for the `KBucketsTable`. In the latter case, such distances are not currently used as a bucket index - they can only occur in the context of comparing distances for the purpose of sorting peers - but that still seems undesirable. These issues are addressed here as follows: * Unidirectionality of the XOR metric is restored by keeping the "full" integer representation of the bitwise XOR. The result is an XOR metric as defined in the paper. This also opens the door to avoiding the "full table scan" when searching for the keys closest to a given key - the ideal order in which to visit the buckets can be computed with the help of the distance bit string. * As a simplification and to make it easy to "do the right thing", the XOR metric is only defined on an opaque `kbucket::Key` type, partially derived from the current `KadHash`. `KadHash` and `KBucketsPeerId` are removed.
2019-05-17 17:27:57 +02:00
let search_target_key = kbucket::Key::from(search_target.clone());
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
swarms.last_mut().unwrap().find_node(search_target.clone());
// Set up expectations.
let expected_swarm_id = swarm_ids.last().unwrap().clone();
Improve XOR metric. (#1108) There are two issues with the current definition and use of Kademlia's XOR metric: 1. The distance is currently equated with the bucket index, i.e. `distance(a,b) - 1` is the index of the bucket into which either peer is put by the other. The result is a metric that is not unidirectional, as defined in the Kademlia paper and as implemented in e.g. libp2p-go and libp2p-js, which is to interpret the result of the XOR as an integer in its entirety. 2. The current `KBucketsPeerId` trait and its instances allow computing distances between types with differing bit lengths as well as between types that hash all inputs again (i.e. `KadHash`) and "plain" `PeerId`s or `Multihash`es. This can result in computed distances that are either incorrect as per the requirement of the libp2p specs that all distances are to be computed from the XOR of the SHA256 of the input keys, or even fall outside of the image of the metric used for the `KBucketsTable`. In the latter case, such distances are not currently used as a bucket index - they can only occur in the context of comparing distances for the purpose of sorting peers - but that still seems undesirable. These issues are addressed here as follows: * Unidirectionality of the XOR metric is restored by keeping the "full" integer representation of the bitwise XOR. The result is an XOR metric as defined in the paper. This also opens the door to avoiding the "full table scan" when searching for the keys closest to a given key - the ideal order in which to visit the buckets can be computed with the help of the distance bit string. * As a simplification and to make it easy to "do the right thing", the XOR metric is only defined on an opaque `kbucket::Key` type, partially derived from the current `KadHash`. `KadHash` and `KBucketsPeerId` are removed.
2019-05-17 17:27:57 +02:00
let expected_peer_ids: Vec<_> = swarm_ids.iter().cloned().take(n - 1).collect();
let mut expected_distances = distances(&search_target_key, expected_peer_ids.clone());
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
expected_distances.sort();
// Run test
Runtime::new().unwrap().block_on(
future::poll_fn(move || -> Result<_, io::Error> {
for (i, swarm) in swarms.iter_mut().enumerate() {
loop {
match swarm.poll().unwrap() {
Async::Ready(Some(KademliaOut::FindNodeResult {
key, closer_peers
})) => {
assert_eq!(key, search_target);
assert_eq!(swarm_ids[i], expected_swarm_id);
assert!(expected_peer_ids.iter().all(|p| closer_peers.contains(p)));
Improve XOR metric. (#1108) There are two issues with the current definition and use of Kademlia's XOR metric: 1. The distance is currently equated with the bucket index, i.e. `distance(a,b) - 1` is the index of the bucket into which either peer is put by the other. The result is a metric that is not unidirectional, as defined in the Kademlia paper and as implemented in e.g. libp2p-go and libp2p-js, which is to interpret the result of the XOR as an integer in its entirety. 2. The current `KBucketsPeerId` trait and its instances allow computing distances between types with differing bit lengths as well as between types that hash all inputs again (i.e. `KadHash`) and "plain" `PeerId`s or `Multihash`es. This can result in computed distances that are either incorrect as per the requirement of the libp2p specs that all distances are to be computed from the XOR of the SHA256 of the input keys, or even fall outside of the image of the metric used for the `KBucketsTable`. In the latter case, such distances are not currently used as a bucket index - they can only occur in the context of comparing distances for the purpose of sorting peers - but that still seems undesirable. These issues are addressed here as follows: * Unidirectionality of the XOR metric is restored by keeping the "full" integer representation of the bitwise XOR. The result is an XOR metric as defined in the paper. This also opens the door to avoiding the "full table scan" when searching for the keys closest to a given key - the ideal order in which to visit the buckets can be computed with the help of the distance bit string. * As a simplification and to make it easy to "do the right thing", the XOR metric is only defined on an opaque `kbucket::Key` type, partially derived from the current `KadHash`. `KadHash` and `KBucketsPeerId` are removed.
2019-05-17 17:27:57 +02:00
let key = kbucket::Key::from(key);
assert_eq!(expected_distances, distances(&key, closer_peers));
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
return Ok(Async::Ready(()));
}
Async::Ready(_) => (),
Async::NotReady => break,
}
}
}
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
Ok(Async::NotReady)
}))
.unwrap()
}
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
for n in 2..=8 { run(n) }
}
#[test]
fn unresponsive_not_returned_direct() {
// Build one node. It contains fake addresses to non-existing nodes. We ask it to find a
// random peer. We make sure that no fake address is returned.
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let (_, mut swarms) = build_nodes(1);
// Add fake addresses.
for _ in 0 .. 10 {
swarms[0].add_address(&PeerId::random(), Protocol::Udp(10u16).into());
}
// Ask first to search a random value.
let search_target = PeerId::random();
swarms[0].find_node(search_target.clone());
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
Runtime::new().unwrap().block_on(
future::poll_fn(move || -> Result<_, io::Error> {
for swarm in &mut swarms {
loop {
match swarm.poll().unwrap() {
Async::Ready(Some(KademliaOut::FindNodeResult { key, closer_peers })) => {
assert_eq!(key, search_target);
assert_eq!(closer_peers.len(), 0);
return Ok(Async::Ready(()));
}
Async::Ready(_) => (),
Async::NotReady => break,
}
}
}
Ok(Async::NotReady)
}))
.unwrap();
}
#[test]
fn unresponsive_not_returned_indirect() {
// Build two nodes. Node #2 knows about node #1. Node #1 contains fake addresses to
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
// non-existing nodes. We ask node #2 to find a random peer. We make sure that no fake address
// is returned.
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
let (port_base, mut swarms) = build_nodes(2);
// Add fake addresses to first.
let first_peer_id = Swarm::local_peer_id(&swarms[0]).clone();
for _ in 0 .. 10 {
swarms[0].add_address(
&PeerId::random(),
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
multiaddr![Udp(10u16)]
);
}
// Connect second to first.
swarms[1].add_address(&first_peer_id, Protocol::Memory(port_base).into());
// Ask second to search a random value.
let search_target = PeerId::random();
swarms[1].find_node(search_target.clone());
Fix self-dialing in Kademlia. (#1097) * Fix self-dialing in Kademlia. Addresses https://github.com/libp2p/rust-libp2p/issues/341 which is the cause for one of the observations made in https://github.com/libp2p/rust-libp2p/issues/1053. However, the latter is not assumed to be fully addressed by these changes and needs further investigation. Currently, whenever a search for a key yields a response containing the initiating peer as one of the closest peers known to the remote, the local node would attempt to dial itself. That attempt is ignored by the Swarm, but the Kademlia behaviour now believes it still has a query ongoing which is always doomed to time out. That timeout delays successful completion of the query. Hence, any query where a remote responds with the ID of the local node takes at least as long as the `rpc_timeout` to complete, which possibly affects almost all queries in smaller clusters where every node knows about every other. This problem is fixed here by ensuring that Kademlia never tries to dial the local node. Furthermore, `Discovered` events are no longer emitted for the local node and it is not inserted into the `untrusted_addresses` from discovery, as described in #341. This commit also includes a change to the condition for freezing / terminating a Kademlia query upon receiving a response. Specifically, the condition is tightened such that it only applies if in addition to `parallelism` consecutive responses that failed to yield a peer closer to the target, the last response must also either not have reported any new peer or the number of collected peers has already reached the number of desired results. In effect, a Kademlia query now tries harder to actually return `k` closest peers. Tests have been refactored and expanded. * Add another comment.
2019-05-02 21:43:29 +02:00
Runtime::new().unwrap().block_on(
future::poll_fn(move || -> Result<_, io::Error> {
for swarm in &mut swarms {
loop {
match swarm.poll().unwrap() {
Async::Ready(Some(KademliaOut::FindNodeResult { key, closer_peers })) => {
assert_eq!(key, search_target);
assert_eq!(closer_peers.len(), 1);
assert_eq!(closer_peers[0], first_peer_id);
return Ok(Async::Ready(()));
}
Async::Ready(_) => (),
Async::NotReady => break,
}
}
}
Ok(Async::NotReady)
}))
.unwrap();
}