Ticket #10385 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Profile LoadNexus Slow for group workspaces

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: major Milestone: Release 3.3
Component: Reflectometry Keywords:
Cc: Blocked By: #9848
Blocking: #10500 Tester: Pete Peterson

Description

Saving of large group workspaces is much faster now. Load times are still poor. Profiling should reveal why.

Attachments

Screen Shot 2014-10-20 at 13.09.48.png (174.5 KB) - added by Owen Arnold 6 years ago.
Screen Shot 2014-10-23 at 16.54.01.png (120.2 KB) - added by Owen Arnold 6 years ago.
updated profiler results
profiletimings.py (357 bytes) - added by Owen Arnold 6 years ago.

Change History

comment:1 Changed 6 years ago by Owen Arnold

  • Status changed from new to assigned

comment:2 Changed 6 years ago by Owen Arnold

Might be interesting to use the OSX instrument tools given that I know the load times are really bad on that platform. INTER00027729 is an offending run.

  • Load INTER00027729.raw (fast)
  • Save it via SaveNexus (relatively fast)
  • Load it LoadNexus (really slow)
Last edited 6 years ago by Owen Arnold (previous) (diff)

comment:3 Changed 6 years ago by Owen Arnold

  • Status changed from assigned to inprogress
  • Summary changed from Profile LoadISISNexus2 Slow for group workspaces to Profile LoadNexus Slow for group workspaces

Changed 6 years ago by Owen Arnold

comment:4 Changed 6 years ago by Owen Arnold

Attached profiler results for Load of INTER00027731.raw, saved via SaveNexus and then loaded.

comment:5 Changed 6 years ago by Owen Arnold

As a course-grain timing comparator, LoadNexus on INTER 27731 currently takes 5.94 to load all 51 periods on a release build on my macbook pro

comment:6 Changed 6 years ago by Owen Arnold

refs #10385. Extract spectrum info to object.

This is the first step towards reusing the same spectrum information rather than fetching it renewed each time. One caveat with this is that you will HAVE to be sure that all workspaces in the group are equivalent in terms of instrument detector mappings etc. What i intend to do next is:

1) For workspaces that look to relate to multiperiod group workspaces, default to reuse the spectrum-detectector mappings, once obtained on a lazy basis. 2) Put a boolean property with value in place (which defaults to true) called "PredictiveInstrumentLoading". If false, we would go back to old behaviour.

Changeset: 1978e1484b60c48538484ca46450103e829bbb7c

comment:7 Changed 6 years ago by Owen Arnold

refs #10385. Naming issue root problem.

There is a naming issue here, which is actually the root cause of the problem! The query that was being performed retrieved the number of workspace entries, NOT the number of periods. Going forward. If the number of periods exactly matches the number of workspace entries, we have a multiperiod group workspace, and can happily apply the optimisations we want.

Next step will be to actually extract the nperiods. To do that we need to try to fetch it out of the logs for The first entry.

Changeset: fdf6281af77bfb069315c3fae6937bad26e09bab

comment:8 Changed 6 years ago by Owen Arnold

refs #10385. Query default property.

Changeset: d0ba5d5f93dfb7b1c20474eece8952f88099963b

comment:9 Changed 6 years ago by Owen Arnold

refs #10385. Multiperiod query.

Changeset: 8a0c59958fdc0b0f05ecfb616f273b34c6764aa9

comment:10 Changed 6 years ago by Owen Arnold

refs #10385. Reuse the SpecInfo on multiperiod groups.

Changeset: 61959e5705be8423f54eb8a77034d8e739880dfb

comment:11 Changed 6 years ago by Owen Arnold

As a course-grain timing comparator, LoadNexus on INTER 27731 currently NOW takes 5.10 to load all 51 periods on a release build on my macbook pro. I'll need to tackle the other big areas highlighed by the profiler in order to achieve better results, I suspect I've moved the bottleneck now. I'm going to re-profile to see what effect this sort of strategy has had.

Changed 6 years ago by Owen Arnold

updated profiler results

comment:12 Changed 6 years ago by Owen Arnold

Updated profiler results, it now looks like the major problem has shifted to the number of calls to NXDataSet::open (see cycle estimation results). We now make +1 calls to loadEntry, so in my profiler example, we have 468 calls to the offending function, that's 9 calls per workspace entry. My next step will be to tackle this and reduce the number or redundant calls. Noting the success above, the multiperiod-only caching technique seems to be a good route to do this.

comment:13 Changed 6 years ago by Owen Arnold

My next step is going to be to try the following:

  1. For multiperiod runs, copy process the first load as per normal, then, clone the workspace each time, set dataY, dataE and load the logs only.
  2. Check the above as a proof of concept
  3. Refactor the above
  4. Add a property which would default to the old behaviour. Set this property to be false by default.

comment:14 Changed 6 years ago by Owen Arnold

refs #10385. Try to use clone and write technique.

Changeset: 9a677316ce4294e96ec8a2c793c9d9090dd13be8

comment:15 Changed 6 years ago by Owen Arnold

refs #10385. Test file is twice as fast now.

There are lots of TODOs still to resolve. We also need to add tests for this sort of multiperiod loading.

Changeset: ccc1c235df1cb2d613326730fd9c0421f0751754

comment:16 Changed 6 years ago by Owen Arnold

refs #10385. Add failing test.

We are not currently handling log extraction properly, which is why the test is failing.

File size is just over the threshold, but shouldn't be a problem. I've tried to trim it down as much as possible.

Changeset: 757b0046c62a550e10f005d04fa2284fa4f18286

comment:17 Changed 6 years ago by Owen Arnold

refs #10385. Remove log values

This fixes the regression test. We clear the logs off the temporary workspace, that way we don't have to do a similar thing in the loop, and we avoid copying them over each time. This is not a fully formed solution, but it's a start.

Changeset: ab1da07fcd610910084b5c5e6ee641efddba9322

comment:18 Changed 6 years ago by Owen Arnold

refs #10385. Reporting added and extra property.

Also better robustness around workspace type.

Changeset: 8e03062228667837665c57b4ef45c5f95f82ef35

comment:19 Changed 6 years ago by Owen Arnold

refs #10385. Extract spectrum info to object.

This is the first step towards reusing the same spectrum information rather than fetching it renewed each time. One caveat with this is that you will HAVE to be sure that all workspaces in the group are equivalent in terms of instrument detector mappings etc. What i intend to do next is:

1) For workspaces that look to relate to multiperiod group workspaces, default to reuse the spectrum-detectector mappings, once obtained on a lazy basis. 2) Put a boolean property with value in place (which defaults to true) called "PredictiveInstrumentLoading". If false, we would go back to old behaviour.

Changeset: 1978e1484b60c48538484ca46450103e829bbb7c

comment:20 Changed 6 years ago by Owen Arnold

refs #10385. Naming issue root problem.

There is a naming issue here, which is actually the root cause of the problem! The query that was being performed retrieved the number of workspace entries, NOT the number of periods. Going forward. If the number of periods exactly matches the number of workspace entries, we have a multiperiod group workspace, and can happily apply the optimisations we want.

Next step will be to actually extract the nperiods. To do that we need to try to fetch it out of the logs for The first entry.

Changeset: fdf6281af77bfb069315c3fae6937bad26e09bab

comment:21 Changed 6 years ago by Owen Arnold

refs #10385. Query default property.

Changeset: d0ba5d5f93dfb7b1c20474eece8952f88099963b

comment:22 Changed 6 years ago by Owen Arnold

refs #10385. Multiperiod query.

Changeset: 8a0c59958fdc0b0f05ecfb616f273b34c6764aa9

comment:23 Changed 6 years ago by Owen Arnold

refs #10385. Reuse the SpecInfo on multiperiod groups.

Changeset: 61959e5705be8423f54eb8a77034d8e739880dfb

comment:24 Changed 6 years ago by Owen Arnold

refs #10385. Try to use clone and write technique.

Changeset: 9a677316ce4294e96ec8a2c793c9d9090dd13be8

comment:25 Changed 6 years ago by Owen Arnold

refs #10385. Test file is twice as fast now.

There are lots of TODOs still to resolve. We also need to add tests for this sort of multiperiod loading.

Changeset: ccc1c235df1cb2d613326730fd9c0421f0751754

comment:26 Changed 6 years ago by Owen Arnold

refs #10385. Add failing test.

We are not currently handling log extraction properly, which is why the test is failing.

File size is just over the threshold, but shouldn't be a problem. I've tried to trim it down as much as possible.

Changeset: 757b0046c62a550e10f005d04fa2284fa4f18286

comment:27 Changed 6 years ago by Owen Arnold

refs #10385. Remove log values

This fixes the regression test. We clear the logs off the temporary workspace, that way we don't have to do a similar thing in the loop, and we avoid copying them over each time. This is not a fully formed solution, but it's a start.

Changeset: ab1da07fcd610910084b5c5e6ee641efddba9322

comment:28 Changed 6 years ago by Owen Arnold

refs #10385. Reporting added and extra property.

Also better robustness around workspace type.

Changeset: 8e03062228667837665c57b4ef45c5f95f82ef35

comment:29 Changed 6 years ago by Owen Arnold

refs #10385 Refactor new functionality to method.

Should read better now, although the entire algorithm is still a mess.

Changeset: aba603d5e0bef0c6a3c3babfef7edc0f46e955d8

comment:30 Changed 6 years ago by Owen Arnold

refs #10385. Remove instrument info caching

This does result in a speed up, but the caching is being bypassed at the moment since our strategy is to clone and overwrite. I've removed the caching code as part of the clean up, but I've kept the SpecInfo type around as it makes the usage clearer, and does not result in a performance penalty.

Changeset: d88bbe4abf5cdb342d58b79897e75486c14d9ce7

comment:31 Changed 6 years ago by Owen Arnold

refs #10385. Minor changes.

Changeset: 5b18d28967654bbbfa5f76fd32b1a2245977b7b1

comment:32 Changed 6 years ago by Owen Arnold

refs #10385. Put error identifer code in place.

Changeset: fb70f730ed01cb46d4a4e064fb989b95ce7e6aed

comment:33 Changed 6 years ago by Owen Arnold

refs #10385 Extend tests.

Extend tests. Do not process spectrum lists.

Changeset: ce10e438130d7a0d0aa3684f6ed5c90b9d9b5d0f

comment:34 Changed 6 years ago by Owen Arnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #56.

comment:35 Changed 6 years ago by Owen Arnold

refs #10385. Fix warnings.

Changeset: e084e3b8da4a930d627b8561edfde75d0a401df6

comment:36 Changed 6 years ago by Owen Arnold

Tester:

1) The most important thing to verify is that the load/save unit tests are passing and that the system tests loadlotsoffiles have passed with these changes.

2) Secondly, I removed a nxs data open which will affect workspaces with fractional area. So try, running http://docs.mantidproject.org/nightly/algorithms/SofQW3-v1.html, saving the results using SaveNexus, and reloading them using LoadProcessedNexus. The loaded workspaces should be have the fractional area information.

3) The attached python script would run slowly prior to these changes. It should now report faster load times.

Last edited 6 years ago by Owen Arnold (previous) (diff)

comment:37 Changed 6 years ago by Owen Arnold

refs #10385. Fix cpp check issue.

Changeset: 679c839829669c9252c7d0ea73caef984f968287

Changed 6 years ago by Owen Arnold

comment:38 Changed 6 years ago by Owen Arnold

  • Blocking 10500 added

comment:39 Changed 6 years ago by Pete Peterson

  • Status changed from verify to verifying
  • Tester set to Pete Peterson

comment:40 Changed 6 years ago by Pete Peterson

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/feature/10385_loadnexus_speed'

Full changeset: a8fe5a33c2cfb8321d62a8c15b09e2170cae1d52

comment:41 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11227

Note: See TracTickets for help on using tickets.