XEM full node stopping synchronization

Hi,
I dont know where to put feedback regarding the bug using NEM full-node, so I decided to write here. My problem is that, after some time the full-node stops synchronizing. The restart helps, but if I do not restart the process it is stuck. When I look into logs I see following exception being thrown:

Dec 23 10:06:42 localhost nix.runNis.sh[10824]: 2017-12-23 09:06:42.637 WARNING forcibly aborting request to http://nijuichi.nem.ninja:7890/chain/score (org.nem.core.connect.HttpMethodClient lambda$sendRequest$2)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]: 2017-12-23 09:06:42.637 WARNING Timer SYNC raised exception: java.util.concurrent.CancellationException
Dec 23 10:06:42 localhost nix.runNis.sh[10824]: java.util.concurrent.CompletionException: java.util.concurrent.CancellationException
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2265)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.nem.core.connect.HttpMethodClient$HttpMethodClientFutureCallback.cancelled(HttpMethodClient.java:216)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.apache.http.concurrent.BasicFuture.cancel(BasicFuture.java:150)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.apache.http.concurrent.BasicFuture.cancel(BasicFuture.java:157)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.executionCancelled(DefaultClientExchangeHandlerImpl.java:112)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.cancel(AbstractClientExchangeHandler.java:432)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.apache.http.client.methods.AbstractExecutionAwareRequest.abort(AbstractExecutionAwareRequest.java:90)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.nem.core.connect.HttpMethodClient.lambda$sendRequest$2(HttpMethodClient.java:129)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at org.nem.core.async.SleepFuture$1.run(SleepFuture.java:24)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.TimerThread.mainLoop(Timer.java:555)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.TimerThread.run(Timer.java:505)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]: Caused by: java.util.concurrent.CancellationException
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         at java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2263)
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:         ... 14 more
Dec 23 10:06:42 localhost nix.runNis.sh[10824]:  (org.nem.core.async.NemAsyncTimerVisitor notifyOperationCompleteExceptionally)
Dec 23 10:06:44 localhost nix.runNis.sh[10824]: 2017-12-23 09:06:44.344 INFO calculating trust values (org.nem.peer.trust.CachedTrustProvider a)
Dec 23 10:06:44 localhost nix.runNis.sh[10824]: 2017-12-23 09:06:44.344 INFO trust calculation finished (1 values) (org.nem.peer.trust.CachedTrustProvider a)

How can I fix that? It happens every few days.

@BloodyRookie could you help here?
@Kamil92 maybe upgrading to new version (0.6.95) will help? It’s available here https://bob.nem.ninja/

There have been a lot of transactions in the network over the last few days.
The transaction is stopped at the current time, and if it restarts now, the problem is solved.
However, there are servers that do not serve as such a phenomenon.
The difference is the memory difference allocated to the NIS program.
The server to which big memory is allocated is not down.
The fundamental cause is that allocated memory is insufficient.
If there is not enough physical memory, you have to increase the memory.
25th The new version of NIS that will be announced in the future is the version that it is hard to occur.

@mizunashi So how much memory do I have to run the node without problems? The server that I am using for running the node have 32 GB free RAM for it to use. Can this issue be related to using HDD drive instead of SSD one? The problems with that node started week or two weeks ago, before that it was running correctly.

When was the node started up before the problem occurred?
A node that has not restarted for a long time can cause problems.
Physical memory is required as a prerequisite, but the amount of memory allocated to NIS programs is important.
I do not know the optimal value. I think that allocating a lot of memory will be a server that will tolerate the load.

Did you change also java parameters in nix.runNis.sh and increase memory for NIS? Could you copy here content of this file from your NIS?

Most likely, as @CryptoBeliever mentioned, you are using the default parameters when starting NIS which means that only 1GB heap is givven to NIS. Check the startup script and adjust the parameters -Xms and -Xmx.

1 Like

@CryptoBeliever @BloodyRookie yep, you are right, I was using default nis parameters, totally forgot about that script. I have now updated it to use following config:

java -Xms4G -Xmx8G -cp ".:./*:../libs/*" org.nem.deploy.CommonStarter

And will see how it will deal with the load

1 Like

With such an amount of heap i would probably would use the G1 garbage collector by adding

-XX:+UseG1GC -XX:MaxGCPauseMillis=200

to the startup script.