P2P Time Sync for Nodes

Saul · April 10, 2016, 8:17am

read

The following has been adapted from the Technical Report and is indeed very technical in nature. In summary, it explains how NEM has deployed the first ever time sync P2P network and the formulas for how it works.

After having operated for almost one year on a live network, we are are pleased to report that NEM's P2P time sync for nodes has performed well.

Introduction

Like most other crypto currencies, NEM relies on time stamps for transactions and blocks. Ideally, all nodes in the network should be synchronized with respect to time. Even though most modern operating systems have time synchronization integrated, nodes can still have local clocks that deviate from real time by more than a minute. This causes those nodes to reject valid transactions or blocks, which makes it impossible for them to synchronize with the network.

It is therefore needed to have a synchronization mechanism to ensure all nodes agree on time. There are basically two ways to do this:

Use an existing protocol, such as NTP
Use a custom protocol

The advantage of using an existing protocol like NTP is that it is easy to implement and the network time will always be near real time. This has the disadvantage that the network relies on servers outside the network.

Using a custom protocol that only relies on the P2P network itself solves this problem, but there is a trade off. It is impossible to guarantee that the network time is always near real time. NEM uses a custom protocol in order to be completely independent from any outside entity.

Gathering samples

Each node in the network manages an integer number called offset which is set to 0 at start. The local system time in milliseconds incremented by the offset (which can be negative) is the network time (again in milliseconds) of the node.

After the start up of a node is completed, the node (hereafter called local node) selects up to 20 partner nodes for performing a time synchronization round. Only nodes that expose a minimum importance are considered as partners.

For all selected partners the local node sends out a request asking the partner for its current network time. The local node remembers the network time stamps when the request was sent and when the response was received. Each partner node responds with a sample that contains the time stamp of the arrival of the request and the time stamp of the response. The partner node uses its own network time to create the time stamps.

Using the time stamps, the local node can calculate the round trip time

and then estimate the offset o between the network time used by the two nodes as

This is repeated for every time synchronization partner until the local node has a list of offset estimations.

Applying filters to remove bad data

There could be bad samples due to various reasons:

• A malicious node can supply incorrect time stamps.

• An honest node can have a clock far from real time without knowing it and without having synchronized yet.

• The round trip time can be highly asymmetric due to internet problems or one of the nodes being very busy. This is known as channel asymmetry and cannot be avoided.

Filters are applied that try to remove the bad samples. The filtering is done in 3 steps:

If the response from a partner is not received within an expected time frame (i.e. if t4 − t1 > 1000ms) the sample is discarded.
If the calculated offset is not within certain bounds, the sample is discarded. The allowable bounds decrease as a node’s uptime increases. When a node first joins the network, it tolerates a high offset in order to adjust to the already existing consensus of network time within the network. As time passes, the node gets less tolerant with respect to reported offsets. This ensures that malicious nodes reporting huge offsets are ignored after some time.
The remaining samples are ordered by their offset and then alpha trimmed on both ends. In other words, on both sides a certain portion of the samples is discarded.

Calculation of the effective offset

The reported offset is weighted with the importance of the boot account of the node reporting the offset. This is done to prevent Sybil attacks.

An attacker that tries to influence the calculated offset by running many nodes with low importances reporting offsets close to the tolerated bound will therefore not have a bigger influence than a single node having the same cumulative importance reporting the same offset. The influence of the attacker will be equal to the influence of the single node on a macro level.

Also, the numbers of samples that are available and the cumulative importance of all partner nodes should be incorporated. Each offset is therefore multiplied with a scaling factor.

Let Ij be the importance of the node reporting the j-th offset oj and viewSize be the number of samples divided by the number of nodes that were eligible for the last PoI calculation.

Then the scaling factor used is

This gives the formula for the effective offset o

Note that the influence of an account with large importance is artificially limited because the viewSize caps the scale. Such an account can raise its influence on a macro level by splitting its NEM into accounts that are not capped. But, doing so will likely decrease its influence on individual partners because the probability that all of its split accounts are chosen as time-sync partners for any single node is low.

Coupling and threshold

New nodes that just joined the network need to quickly adjust their offset to the already established network time. In contrast, old nodes should behave a lot more rigid in order to not get influenced by malicious nodes or newcomers too much.

In order to enable this, nodes only adjust a portion of the reported effective offset. Nodes multiply the effective offset with a coupling factor to build the final offset. Each node keeps track of the number of time synchronization rounds it has performed.

This is called the node age.

The formula for this coupling factor c is:

This ensures that the coupling factor will be 1 for 5 rounds and then decay exponentially to 0.1.

Finally, a node only adds any calculated final offset to its internal offset if the absolute value is above a given threshold (currently set to 75ms). This is effective in preventing slow drifts of the network time due to the communication between nodes having channel asymmetry.

This is a companion discussion topic for the original entry at http://blog.nem.io/first-ever-p2p-time-sync-for-nodes/

1111 · April 15, 2016, 7:02am

I have a question about how to synchronize time between P2P network node. As we know that traditional way to synchronize time is by NTP way. However, NTP needs a central NTP server help to correct time, which is hard to scale. So nem uses another distributed way to solve this problem by set offset. Problem comes out:

Offset is got from 20 neighbors comprising offset List. Then what is the real time to represent time stamp of this node?
If a node is far from the real time, then how should we set the time stamp according to offset?

My understanding is as follows:
If values of offset list are almost close, then we use (current timestamp = current time + average offset) to synchronize.
Otherwise, if some values of offset list are deviated seriously, then we detect whether is the timestamp of local node not synchronized. If local node causes, then we use (current timestamp = current time of normal node + average filtered offset) , else (current timestamp = current time of local node + average filtered offset) .

All of this is on the hypothesis of most of 20 nodes are normal. If the timestamp of 20 nodes are all wrong, we almost cannot get the right timestamp.

Is this right? Thanks!

Another question is about wallet. I want to use local nis to proxy my ncc. That requires me to modify some configures in the folder of NIS. But to me, I often change my ip address, if change ip, then reconfigure that. I should be suitable and rational. However, my wallet could not be opened after doing reconfiguration. Afterwards, I found that in another place where is used to store nem data and log, have a json format to store the old IP address which is the fundamental problem. So I should change two places.
A little hard of use.

BloodyRookie · April 15, 2016, 11:08am

As explained in the blog post from Saul, it is a bit more complicated than you describe. The basic concepts are:

All offsets from the neighbours are weighted by the importance of the neighbour. That way, a sybil type attack is not possible. The ‘average’ of the offsets is thus calculated as weighted offsets.
If any neighbour supplies timestamp that is too far off, the neighbour is ignored.
A local node that just joined the network tries to adjust his offset very fast, i.e. it is willing to adopt to the offset he is seeing in the network for some synchronization attempts. Then the node changes strategy and behaves more rigit, i.e. even if the node sees an offset to its neighbours, it will only incorporate 10% of that offset into its local time. That ensures that if there is a drift in network time, the drift is rather slow.

If a node encounters the situation that all 20 nodes that it contacted have a large offset to itself, then in case the offsets are larger than a certain threshold the node will just ignore all offsets and keep its local time, or in the other case it will just adapt to the time the other nodes have. Remember, there is no absolute time in a P2P network, the main purpose of synchronization is reach a common standard for the network time, no matter how far that is away from real time.

1111 · April 15, 2016, 12:38pm

Make sense. Thanks!
How long will mobile wallet publish? I want to see it.

BloodyRookie · April 15, 2016, 1:13pm

i don’t know, maybe someone else can answer that.

1111 · April 15, 2016, 1:53pm

xzfkiller · April 15, 2016, 2:18pm

Good explanation, the white paper is not enough, this section I’ve just skipped when I read white paper. Now I understand the basic concept.