Thanks for the detailed update. Always appreciated.
Update - 05-Feb-2021 approx midday UTC
Further to this update, the Interim Load Test was started approx midnight UTC last night.
It appears to have caused issued on various nodes on the Testnet. It was planned to run for 12-24 hours but was stopped early as a result.
If you look on this page, the height column should ideally show most nodes within 1-2 blocks of each other, but currently doesn’t: symbol node list (testnet)
The issue has affected the main NGL nodes which Wallet etc use as defaults, so you may find things like Wallets behave strangely while those nodes are resynchronised. The Faucet was affected and has been moved, it should be working again.
At this stage there is not an estimate or confirmed root cause until the logs etc are reviewed. The team will be looking into the issue and steps to bring Testnet back into consistency through today and as updates become available we will try and communicate them.
Slack conversation for anyone on Slack: https://nem2.slack.com/archives/C9YKR0EUX/p1612516491133100
Not only the network stopped, but also the phenomenon that the node does not recover.
Although it is not certain, nodes with higher specs tend to be working.
Some VPS are wiped out.
We estimate that there are about 60 nodes that are currently working.
On the other hand, are the surviving nodes useful?
Is there any information I can provide?
Thanks @vistar if you can join the conversation on slack above, it would be helpful to keep things together.
Probably useful would be things like:
- Server spec
- Operating system
- Node type (peers, api, dual, voting etc)
- If you are using bootstrap or not
But definitely easiest to have in one slack channel for people
How does the daoka cannon retest passed where i assume there was same patch and 6k tps, but the current 400tps test coused the nodes to drop now?
I think the symbol launch is best on 3/29
Symbol can be twins with nis1
Different kind of test, the Daoka-Canon retest was checking a specific set of steps which are now working ok
Update 05-Feb-2021 approx 17:30 UTC
Huge thanks to Wayon, Jag, Gimre and the rest of the team for taking the time to explain the below alongisde the investigation work, I’ve tried to summarise the current state as I understand it from what is happening:
The team are still looking at the issue, the summary of currently known information is below, it is ongoing and subject to change as more is known:
The issue affects api-broker, exactly why/how is being confirmed
The chain is still operating and Finality is still working, it can be checked on a known working API node: http://188.8.131.52:3000/chain/info
On affected nodes, the api-broker is down, so rest gateway is not aware of the current state of the chain (MongoDB isn’t updated when it is down), so REST reports what it knows about up until it went down rather than the actual current state of the chain on the peer node
The node list site (https://symbolnodes.org/nodes_testnet) relies on REST calls for chain height so may have issues reporting the actual height while the broker-node(s) are down
The auto-recovery issue that is present in Bootstrap (#108) means just restarting isn’t quite enough, that issue was already known and we knew needed to be addressed and has obviously now risen in priority
Resetting and resynchronising the node does appear to resolve the issue and bring it back online, this is the only concrete approach we know definitely fixes it, but we are still looking for other ones
The process appears conceptually to have been something like:
Api-broker had issues on some nodes (root still being identified 100%),
Api-broker failed due to the above and stopped
Bootstrap Auto-recovery doesn’t allow it to restart and api-node ends in a state that cannot be easily recovered
Peer node is still functioning normally.
Issue only affects Dual or API nodes, it just happens that most nodes are dual nodes and most NGL nodes are dual and voting to simulate Mainnet in terms of SuperNodes
The work is going to continue today and over the weekend, we are likely to start resetting the NGL nodes in small batches soon and that will obviously take a day or two due to the number of nodes involved and not wishing to disrupt the chain or finality.
Edit: Just noticed a tweet from Jag so linking here as well: https://twitter.com/Jaguar0625/status/1357725263245762560
Hi @DaveH can update us on the state of the testnet right now? Looks like nodes are recovered.
About the issue that coused the most nodes to drop and if there is a solution to fix what happened.
Anything new about the github issues #151-152 and what’s are the next steps the team will take?
Testnet is now back consistent across almost all nodes
The issues are being investigated and checked still, I’ll update when we have more on them today/tomorrow
Sorry for the question. I’m new to the group. Is there a projection date as to when the snapshot will take place and when the equaling symbol tokens would show up in my mobile symbol wallet?
The snapshot date has not been announced, neither the launch date, but as for seeing the tokens in you mobile symbol wallet, they should be there after launch, presumably automatically if you have opted in with your mobile app.
If you want to dump go ahead. More cheap coins for me and others hahahahahhahahaaaa
Thanks. Yeah the mobile wallet is confirmed opted in and shows the total of NEM I currently hold. So I should be good. Just curious if there was any projection as to when the launch date was anticipated. Guess I’m starting to get a little frustrated that it’s taking so long and worried that it’s not going to happen. I’m considering offloading my NEM bag which is pretty sizeable (only considering it because I fear if they can’t get this XYM symbol thing figured out, I’m concerned NEM will plummet.) Appreciate you getting back to me.
I am very happy that the Symbol launch was delayed. I would not have been able to complete my goal of 10,000 XEM had not other alts spiked in price first. And I would not have set that goal had not XEM and Symbol appeared to have long term value beyond mere money. I can see clearly that those people who can only count the money they don’t have will never have enough, and that those who do not focus only on money will never go without.
“Happy that i could buy more” is the same as count the money. This isn’t a talk about money anymore it’s been 3 years since nem is trying to launch catapult. What makes you think next date is the last or that it will happen at all? Just hopes. Do you really think when the talk is about devs doing their work, company, startup, investmants it’s about hopes? Nope. it’s about professionalism, skill, trust. This is sad and true story about nem. So @Harpazo_Ready concerns are legit as much as any others guys. But yeah who cares another week - another “next week” is the current statement. Being frustrated is sadly a normal thing for anyone who follows nem.
Even team know they always can announce a snapshot now to be estimated as soon as they end the work. And annonce the launch as soon as the work is complate so at least snapshot will have a hard date. And a flexible launch date.
Any change for February Launch?
Whenever these threads start to deviate from their original purpose, I’ll be starting new ones, if you want to discuss snapshot dates, delays etc, probably sensible to use a thread intended for that rather than for technical updates.
It just makes it hard for others to follow who are interested in them
This topic is temporarily closed for at least 4 hours due to a large number of community flags.