Archive node sync trouble [very very slow]

Hello,

I am trying to run a full archive node the issue is that sometime between block 2,000,000 and 2,312,324 (where I am at now) the node slows down so dramatically making simple rest api calls take between a minute and 5 minutes to return. The node is currently consuming 84GB of the ram on the server. Can you provide me with the current requirements to run a full archive node with reasonable performance?

Any guidance would be appreciated.

My basic configuration is below:

I am running 0.6.97 on Linux using OpenJDK8 but I am experiencing similar issues on Windows. My server is running 96GB of RAM and the node data is stored on an NVMe disk.

My config.properties file is set with:
nis.optionalFeatures = TRANSACTION_HASH_LOOKUP|HISTORICAL_ACCOUNT_DATA

and my nix.runNis.sh is configured as so:
java -Xms1G -Xmx70G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -cp “.:./:…/libs/” org.nem.deploy.CommonStarter

Are you using HISTORICAL_ACCOUNT_DATA? Is very resource consuming. If you don’t use it just turn it off. Most probably you don’t need it.

70gb is too much. after removing HISTORICAL_ACCOUNT_DATA change it to 32

Unfortunately, I need to be able to calculate a historical point in time balance for an address. From what I read, HISTORICAL_ACCOUNT_DATA is required to do that. Is it possible to say get a balance for an address as of a day and time without HISTORICAL_ACCOUNT_DATA being turned on?

Ok. So if you need that then HISTORICAL_ACCOUNT_DATA should be turn on.
I heard about problems with node stability when this feature is on.
Maybe @BloodyRookie can help here.

I would set -Xms70G as well, no need to successively increase RAM during operation.
70G should be enough right now.
Can you add
-XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:GCLogFileSize=25m -XX:NumberOfGCLogFiles=100 -XX:+PrintGC -Xloggc:"./gc.txt"
as startup params? That creates log files where you can see how long garbage collection phases took.
Are you experiencing slow responses when syncing or loading the chain upon start?

The only node that has historical data in mainnet is a 128G server with -Xms100G -Xmx100G (HugeAlice), but as I said, 70G should be enough imo. A typical garbage collection cycle looks like

2019-10-01T08:25:25.207+0200: 2907133.907: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 69917M->69759M(100G), 0.1616818 secs]
2019-10-01T08:25:25.369+0200: 2907134.068: [GC concurrent-root-region-scan-start]
2019-10-01T08:25:25.682+0200: 2907134.381: [GC concurrent-root-region-scan-end, 0.3126560 secs]
2019-10-01T08:25:25.682+0200: 2907134.381: [GC concurrent-mark-start]
2019-10-01T08:25:34.790+0200: 2907143.489: [GC concurrent-mark-end, 9.1084788 secs]
2019-10-01T08:25:34.791+0200: 2907143.490: [GC remark, 0.3268970 secs]
2019-10-01T08:25:35.118+0200: 2907143.818: [GC cleanup 70520M->51699M(100G), 0.1867877 secs]
2019-10-01T08:25:35.305+0200: 2907144.005: [GC concurrent-cleanup-start]
2019-10-01T08:25:35.311+0200: 2907144.010: [GC concurrent-cleanup-end, 0.0059125 secs]