Symbol Testnet Update (11-Jan-2020)

DaveH · January 12, 2021, 10:32am

Testing Update (11-Jan-2021)

Summary

New Testnet has been created, 500 nodes, currently internal only
Patch testing has gone well
Stress Testing passed: 100 & 150tps
Stress Testing failed: 400tps, due to configuration, being re-run
Testnet reset and full release expected shortly (after 400tps pass)
Memory usage is much improved, even under very heavy load

This update follows on from: Symbol launch issues & Testnet update (06-Jan-2021)

As per @Jaguar0625’s tweet on 08-Jan: https://twitter.com/Jaguar0625/status/1347611656021532675

Patches have been completed and handed to the test team for validation. So far, these look good. The Core Developers and Test teams have also been working very closely on various configuration items to improve rollback handling.

The test team have created a new 500 node network, with learnings from the previous tests, new patches and some minor configuration changes. The tests below were run over the weekend and through Monday:

Normal Running Tests - Passed up to 150tps

The following tests have been run over the past few days:

Automation/regression testing on an internal dev environment
Stress test on an internal dev environment at 100tps, increased to 150tps
Stress test on the new 500 node testnet at 150tps for ~12 hours

Summary of the 150tps stress test:

150tps test finished with 10mil txs over ~12 hours
MongoDB stayed at around 2Gig
The core servers are still in sync and had no memory issues.
The pass means normal functioning, no overload occurred.

Prior to these fixes, it overloaded at 130tps on the public Testnet and didn’t recover well so the patches are working well for functional improvements.

This is in a controlled environment with known node sizes/performances, it will be re-run once the new Testnet is made public to ensure behaviour is the same with community nodes present.

Overload Test - 400tps, failed due to config

A final test was run at 400tps which was passing for ~8-10 hours. It was capping the transactions at 150-200tps so the patches are working. However, toward the end of the test (final ~2 hours) the run encountered issues which are believed to be configuration related; rollbacks and data/packet sizes meant some nodes fell behind and couldn’t recover. As a result the test failed but is being re-run with configuration amendments and is expected to pass later today or tomorrow morning (UTC). The good news is that until the issues toward the end of the test, the memory usage was much improved with the patches and was constant.

The Testnet has also been brought back into sync which has forced usage of the Deep Rollback patches and configurations have been adjusted in collaboration with the core devs for the next test.

Immediate Next Steps

The current plan is to rerun the 400tps test which will complete, at the earliest, late on Tues (UTC time). If it does not pass then an additional cycle may be required, if it does pass then a decision and plan can be taken to release publicly and start community testing.

A further patch is being produced today which is a more aggressive node banning approach in certain scenarios and will also be included in the next test and next release, this will most likely be included in the test above.

Further memory profiling work on the Core Server is being undertaken by the Core Devs and the NGL Test/Dev team to see if additional optimisations are possible, these will be assessed for inclusion if any are found.

A further update will be provided as soon as there is more information, we are now getting the end of the resolution work and the outcome looks to be positive in terms of resilience and memory usage, fingers crossed for the final testing.

DaveH · January 12, 2021, 10:36am

Before someone asks - the update is based on data until 11 Jan, due to time zones, the information from 12th of Jan won’t be known until later today, hence why the update is dates 11th, but posted on the 12th my local time.

GodTanu · January 14, 2021, 4:03am

When do you plan to update bootstrap?
Also, the behavior of harvesting is unclear in many ways.
Could you please provide an up-to-date procedure?( Local and delegated harvesting)

DaveH · January 14, 2021, 8:04am

When do you plan to update bootstrap?

With the next server release (very soon)

Also, the behavior of harvesting is unclear in many ways.
Could you please provide an up-to-date procedure?( Local and delegated harvesting)

Could I ask you to you explain the ways it is unclear please so I can get it clarified?
(Please explain in Japanese if easier to be exact and I will ask one of the team to translate)

An update was made to the guides 3 weeks ago, but I have not checked the details and it is useful for feedback from a fresh perspective to be sure it makes sense

GodTanu · January 14, 2021, 8:18am

現行のBootstrap(alpha)の場合、Linkコマンドでのハーベスティングがうまく動作しない現象が見えています。
以前Daveから通達があったように、デフォルトでリモートアドレスがハーベスティングアドレスに指定されており、この仕様が暫定的な仕様か最終仕様なのか不明ですが、とにかくうまく行きません。そもそもリモートアドレスのままハーベスティングしようにも残高が無いのでハーベスティングが動きません。リモートアドレスは残高が0が条件のはずです。

また、コンフィグでリモートアドレスをFalseに変更して別のアドレスの設定もできますが、なぜかLinkコマンドが失敗してうまく動作しません。何度やっても失敗したので諦めました。
いずれにしてもファイナライズが行われている形跡がなくネットワークが微妙な状態ですのでアップデート待ちです。したがって今はハーベスティングに関するテストが深く出来ていない状態です。

コミュニティでストレステストを行うのは良いのですが、ハーベスティングもコミュニティノードでしっかりこなせるか試す必要があります。ローカルハーベスティング、デリゲーテッドハーベスティング両方しっかり試験したいです。ですがまだ準備ができません。
※ドキュメントを最新にして貰えればその通り実行できますが本当に記述は最新でしょうか？

過去に以下のissue(というか質問)を発行しました。

回答はありません。

ローンチ前に以下の二点の手順を詳細まで明確にしてほしいです。
1.リモートハーベスティングの手順
2.デリゲートハーベスティングの手順

宜しくお願いいたします。

tresto · January 14, 2021, 8:36am

The following is a translation of the above from GodTanu.

In the case of the current Bootstrap (alpha), we are seeing that harvesting with the Link command does not work properly.
As Dave previously informed me, the remote address is specified as the harvesting address by default, and I’m not sure if this is a temporary or final specification, but anyway, it doesn’t work. I don’t know if this is a temporary or final specification, but it doesn’t work anyway. In the first place, harvesting won’t work with the remote address because there is no balance. The remote address must have a zero balance.

Also, I can change the remote address to False in the config and set another address, but for some reason the Link command fails and it doesn’t work. I gave up because it failed no matter how many times I tried.
In any case, there is no sign of finalization and the network is in a delicate state, so I am waiting for an update. Therefore, I’m not able to do any deep testing on harvesting right now.

It’s good to have a community stress test, but we need to make sure that the community nodes can handle harvesting as well. We want to test both local harvesting and delegated harvesting well. But I am not ready yet.
If you can get the documentation up to date, I can do that, but is the description really up to date?

In the past, I have raised the following issues (or questions)

https://github.com/nemtech/symbol-bootstrap/issues/96

No answer so far.

I would like you to clarify the following two procedures in detail before the launch.

Remote harvesting procedure

Delegated harvesting procedure

Thank you very much for your cooperation.

DaveH · January 14, 2021, 9:31am

Thankyou @GodTanu and @tresto I will speak to the devs concerned and find out the answer.

It’s good to have a community stress test, but we need to make sure that the community nodes can handle harvesting as well. We want to test both local harvesting and delegated harvesting well. But I am not ready yet.

I agree 100%

fboucquez · January 14, 2021, 10:24am

@tresto @GodTanu thank for your post. Could you create an issue in bootstrap describing the preset and custom preset you are using?

GitHub is the best place to track possible issues or questions.

GodTanu · January 14, 2021, 11:18am

before that
Remote address is a prerequisite for setting the balance to zero.
But node need 10,000 XYM to do local harvesting. But the current default setting requires the remote address to hold 10,000 XEM.
This is a specification.

I’m confused. I would like to know what is the recommendation.

DaveH · January 14, 2021, 12:44pm

Thanks @GodTanu the docs could definitely do with improving and have been put on the list to fix.

I hope the below makes sense, let me know if it needs more information though.

Remote Harvesting:

The Remote Account’s balance must be 0 to link it
The Main Account and Remote Account are linked by the AccountKeyLinkTransaction.
The Remote Private Key is should be configured in the node
The main account needs to have 10’000 XYM (not the remote one)
It is conceptually similar to delegating the main one to the remote one but not at a technical level

DelegatedHarvesting:

The Proxy account must have balance of zero to link it
The Main Account and Proxy account are linked by the AccountKeyLinkTransaction
The main account needs to have 10’000 XYM
The Proxy Account’s private key is sent to the Node using encryption and PersistantDelegationRequestTransaction
It is up to the Node to accept the delegation into an empty slot, or reject it

tresto · January 14, 2021, 1:58pm

How is the 400tps retest ?

GodTanu · January 14, 2021, 3:23pm

First of all, the term ‘Remote Harvesting’ is not defined.

Does that mean that a node cannot do local harvesting if the config is default?
I would like to clarify that.

DaveH · January 14, 2021, 3:36pm

400tps test has now passed and so has testing of all the patches, just posted the below:

DaveH · January 14, 2021, 3:37pm

A node can do Local harvesting and the NGL nodes do a mixture of Remote and Local

Remote Harvesting is similar to the current NIS1 approach to running a SuperNode and not having your keys for the Main Account stored on the node (security risk).

I’ve asked for the docs to be updated and it should be clearer in there as well

Hope that makes sense