NEM Symbol 0.9.6.2 Release

I am posting this on behalf of @kaiyzen and the development team to give them space to continue resolving an annoying issue. The intention is to explain where 0.9.6.2 is up to, what is the reason it wasn’t released by the 30th June as communicated here and how we get to the release date.

In order the release, the following high level steps need to complete:

  1. 0.9.6.2 Core Code released: complete and released in public on 23rd June 2020, passed testing
  2. REST API code: released on time for testing
  3. SDK - TS/JS: released on time for testing, minor bug fix after
  4. SDK - Java: released 1 day late but testing caught up the time
  5. Testing of 2, 3 & 4 started on Mon 29th and completed on 30th June 2020
  6. End to end testing in an isolated environment using bootstrap completed 30th June

This means that the release passed all testing and all components were complete, ready for release on time. At this point, the release exists in the development/test environment and in a private branch(es) waiting for the final sign off to be released to the publicly visible Github with tags.

In parallel to the above steps a new Testnet was created, the bootstrap and testnet-bootstrap were updated and it was running

The only steps after this point is to release to Testnet and scale the chain with a few nodes, then a quick automation test, then signed off for release to the public github repo.

However, when the team began adding nodes to the Testnet, the nodes were unable to synchronise, meaning no new nodes could join it. They have been working on this issue most of last night and today, we have held off this update until we had a reasonable idea of progress to give as full a report as possible.

What has been found is:

  • Running a fresh/clean testnet in the isolated testing environment doesn’t have this issue
  • Copying data from the ‘public’ 0.9.6.2 testnet to the isolated one allows reproduction of the issue

There are two primary hypotheses just now - first there may be a bug and it only presents itself with the specific data on the chain that shows the issue, or second there is an issue with the Testnet configuration/build.

Next steps will continue tonight and tomorrow, we expect these to conclude at latest Monday at this stage, if they complete before then release will happen before that, either way it will be announced as soon as practical:

  1. Reset the Testnet
  2. Re-run the automation tests
  3. If the above passes as it is expected to, release 0.9.6.2
  4. Continue investigating the issue in the reproduction environment and issue a hot fix if necessary later.

In parallel the issue will be investigated to try and identify the root cause.

The reason this approach is being taken is that until we can state categorically what the issue is, there is a chance it could occur on other private chain deployments, if it is data related for example. However at this stage it appears that a chain either does or does not have the issue…i.e. it doesn’t develop it. Which means if the Testnet is working, apart from this issue, then it is ready for release and community use/testing.

We hope that providing the full situation and information is useful for people trying to understand why the release is late, unfortunately the issue was caught late, the teams are working very hard to track the issue down and we expect to have an update by Monday, but possibly sooner.

12 Likes

The dev and QA teams have been making good progress on this issue. The root cause was identified as a minor code bug that shows under certain circumstances, it has now been fixed and is running through the testing process.

Progress so far is good with it passing isolated automation testing. There is a bit more private testnet like testing needed to ensure it is ready to release, this will take place over the next day or two and be considered for a public test net release. Futher update to follow tomorrow/wed as makes sense alongside testing progress

9 Likes

Further minor update on the above, the testing is still passing and focus has been shifted to preparing the Testnet for public use; followed by a subsequent sanity/automation test on the Testnet as is always done.

Likely timelines from here (subject to change depending on findings) is that testnet will be sorted out today/tommorow and automated testing takes almost a day to run, assuming things go smoothly, it is looking more likely to be Friday, but may just come forward with some smooth sailing.

9 Likes

The team have just provided an update and the current state of this issue is below, it is all still progressing well:

  1. The issue is fixed (as per earlier update)
  2. It has passed isolated testing (as per earlier update)
  3. Testnet for 0.9.6.2 has been created (as per earlier update)
  4. The automated sanity testing on testnet has now completed and has all passed

It is now ready for release, there are a few steps to complete to make it usable by the community:

Next Steps

  • Sort out the faucet on testnet
  • Finalise the release image for people to use for building nodes
  • Update the various repos with the fixes and new bootstraps etc
  • Create a release image and tags for Github to formally release publically

The team are progressing well with the above and it will depend on how smoothly this progresses through the day (across time zones) it will either release today or Monday. It will be announced on Slack as usual and I will post here when I see it as well

5 Likes

This has now been successfully released thanks to a huge effort on behalf of some very dedicated people in the dev team

4 Likes