Ticket #7521 (closed: fixed)

Opened 7 years ago

Last modified 5 years ago

Speed up the slowest unit tests

Reported by: Nick Draper Owned by: Russell Taylor
Priority: major Milestone: Release 3.2
Component: Framework Keywords: Maintenance
Cc: Blocked By: #7655, #7656, #7707
Blocking: Tester: Wenduo Zhou

Description

Come up with a target list and distribute work out.

Attachments

UnitTestTimes.xlsx (99.2 KB) - added by Russell Taylor 7 years ago.

Change History

comment:1 Changed 7 years ago by Russell Taylor

  • Milestone changed from Backlog to Release 3.0

comment:2 Changed 7 years ago by Russell Taylor

  • Status changed from new to inprogress

comment:3 Changed 7 years ago by Russell Taylor

  • Blocked By 7655 added

comment:4 Changed 7 years ago by Russell Taylor

  • Blocked By 7656 added

Changed 7 years ago by Russell Taylor

comment:5 Changed 7 years ago by Russell Taylor

Sent the following email to mantid-developers:

One of my maintenance tasks this time around is to search for our slowest unit tests and look to get them speeded up. We want our unit tests to run quickly, so that we get quick feedback from the buildservers and so it is not too painful to run them ourselves. We would like none of our unit test suites to take more than a few seconds, and ideally much less. A good aim is something like 100ms per test method.

I’m going to get in and reiterate one of our main ways of doing this early: unit tests should not load real files/data. Please do not add new unit tests that do this. If you are testing a ticket and spot this, reopen it for this to be remedied. (Load* algorithms get a free pass on this one, though should still otherwise be kept fast.)

The attached spreadsheet shows that we are some way from this goal! The spreadsheet contains timings from a selection of builds and gives the wall-clock time for a test suite. In this way it includes any setup & teardown time, which is missing in the timings you will see within the Jenkins reports. In some cases there are wide variations between platforms. My thinking is that this is partly explained by the fact that we run tests in parallel on the buildservers and there could sometimes be contention for resources (e.g. the filesystem) – another reason not to load files in tests!

My proposal is that first time around we try to improve all the tests that took more than 20s in any one case, or more than 10s in three or more cases. That leaves us with the following 18 test suites:

AlgorithmsTest_MedianDetectorTestTest
PythonWorkflowAlgorithmsTest_SANSBeamFluxCorrectionTest
DataHandlingTest_LoadInstrumentTest
AlgorithmsTest_ConvertUnitsTest
AlgorithmsTest_CorrectFlightPathsTest
DataHandlingTest_LoadEventNexusTest
AlgorithmsTest_ApplyTransmissionCorrectionTest
CrystalTest_GoniometerAnglesFromPhiRotationTest
DataHandlingTest_LoadILLTest
CrystalTest_FindUBUsingLatticeParametersTest
APITest_ExperimentInfoTest
PythonWorkflowAlgorithmsTest_EQSANSQ2DTest
AlgorithmsTest_AppendSpectraTest
MDAlgorithmsTest_ConvertToDiffractionMDWorkspaceTest
CrystalTest_PeakHKLErrorsTest
DataHandlingTest_LoadEmptyInstrumentTest
AlgorithmsTest_ConjoinWorkspacesTest
MDEventsTest_MDBoxTest

If you recognize any of these as tests that you wrote, or have worked on, then do volunteer to speed it up! After that people will be volunteered!

As a postscript, one last thing I’ve noticed past the particularly slow tests is that the tests for the Crystal package are pretty uniformly slow. I think this might be because they almost all load something, but if anyone knows of another reason do let me know.

comment:6 Changed 7 years ago by Russell Taylor

  • Blocked By 7707 added

comment:7 Changed 7 years ago by Russell Taylor

  • Milestone changed from Release 3.0 to Backlog

I came to the conclusion that I was looking at the wrong thing by taking the timings from the build server jobs. The issue is that they run tests in parallel (ctest -j) and what you see is that tests can vary greatly in time if they load a file from disk, depending on whether they have to compete for that resource. It is desirable to reduce the number of tests that load from disk, but some legitimately can.

I propose to collect the timings again, from local builds, during the next maintenance period and see if there are any that should be addressed. Note that the really pathological ones shown up in the previous timings have already been fixed.

comment:8 Changed 7 years ago by Russell Taylor

  • Milestone changed from Backlog to Release 3.1

comment:9 Changed 7 years ago by Russell Taylor

  • Milestone changed from Release 3.1 to Backlog

comment:10 Changed 7 years ago by Russell Taylor

  • Milestone changed from Backlog to Release 3.2

comment:11 Changed 6 years ago by Russell Taylor

Did another check for any pathological tests by running the tests on a release build on each platform, not in parallel. The following tests came in the top 10 slowest tests on linux, windows & osx:

  • DirectEnergyConversionTest
  • LoadIsawSpectrumTest
  • LoadEventNexusTest
  • LoadILLTest

I don't think I can do much about the loaders (LoadEventNexusTest was tackled previously in #7656), but I'll try and improve the others before signing off on this ticket.

comment:12 Changed 6 years ago by Russell Taylor

Re #7521. Speed up test 3x by using a smaller IDF.

Changeset: d6f87b5e21742c415718a514f90e8af758dcfd99

comment:13 Changed 6 years ago by Russell Taylor

Re #7521. Remove check that accounts for 60% of test time.

I think we can manage without this as using MERLIN is tested in a system test.

Changeset: 9a24565b66081afe39500b88f56a0b2fd6a48365

comment:14 Changed 6 years ago by Russell Taylor

Re #7521. Speed up test.

This code isn't used at the SNS anyway (we use DGSReduction).

Changeset: 8d35ea2eb9da00631955e5f9683c9065c6d774aa

comment:15 Changed 6 years ago by Russell Taylor

Re #7521. Speed up test 3x by using a smaller IDF.

Changeset: 9bb926f8df05701ac355862cc28b5dd0cc021a63

comment:16 Changed 6 years ago by Russell Taylor

Re #7521. Remove check that accounts for 60% of test time.

I think we can manage without this as using MERLIN is tested in a system test.

Changeset: a20a2b069111ed91bf09fb62113586f50dae78e9

comment:17 Changed 6 years ago by Russell Taylor

Re #7521. Speed up test.

This code isn't used at the SNS anyway (we use DGSReduction).

Changeset: 9d0116c35a77ae2e495b04b41104bf7335f3c071

comment:18 Changed 6 years ago by Russell Taylor

There was a rebase that accounts for the duplicate commits above.

comment:19 Changed 6 years ago by Russell Taylor

  • Status changed from inprogress to verify
  • Resolution set to fixed

That will do for this one I think. The pathological ones have been addressed, mainly in the blocking tickets.

Things still aren't as fast as they could be. The main issue is the old one of touching the file system along with the fact that we run tests in parallel. This means that tests sometimes run fast, but sometimes very slowly if they have to fight with/wait for other tests that are also touching the file system. We have a lot of tests that load data still. The tests in the Crystal package seem to be particularly guilty of this.

comment:20 Changed 6 years ago by Wenduo Zhou

  • Status changed from verify to verifying
  • Tester set to Wenduo Zhou

comment:21 Changed 6 years ago by Wenduo Zhou

The 2 unit tests are sped up significantly. Thus ticket is closed.

comment:22 Changed 6 years ago by Wenduo Zhou

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/bugfix/7521_slow_unit_tests'

Full changeset: ef4ec7a00c2df6facbe0525a32313e4c6c744867

comment:23 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 8366

Note: See TracTickets for help on using tickets.