Ticket #8172 (closed: fixed)
Nexus file-loading clean-up
Reported by: | Russell Taylor | Owned by: | Federico M Pouzols |
---|---|---|---|
Priority: | major | Milestone: | Release 3.4 |
Component: | Framework | Keywords: | Maintenance, CORE |
Cc: | Blocked By: | ||
Blocking: | #10619 | Tester: | Harry Jeffery |
Description
I noticed a number of things concerning the loading of nexus files (in LoadEventNexus) when working on #8108.
For example:
- If you choose to load monitors, the logs and instrument are loaded again. This seems unnecessary.
- If LoadIDFFromNexus is called as a child algorithm, the nexus file is opened again even though it's already open in the parent.
- ....I'll add more as I remember them
Attachments
Change History
Changed 7 years ago by Russell Taylor
- Attachment DataHandlingTest.LoadEventNexusTestPerformance.testDefaultLoad.runtime.v.revision.png added
comment:1 Changed 7 years ago by Russell Taylor
The performance test suggest that my changes in #8108 slowed LoadEventNexus (for the tested file at least) by about 10%. Looking at that should be a focus of this ticket as well.
comment:2 Changed 7 years ago by Nick Draper
- Status changed from new to assigned
Bulk move to assigned at the introduction of the triage step
comment:3 Changed 6 years ago by Federico M Pouzols
- Owner set to Federico M Pouzols
- Blocking 10619 removed
- Milestone changed from Backlog to Release 3.4
comment:4 Changed 6 years ago by Federico M Pouzols
- Blocking 10619 added
(In #10619) It seems that I removed the 'blocked by' inadvertently.
comment:6 Changed 6 years ago by Federico Montesino Pouzols
- Status changed from assigned to inprogress
avoid loading insturment twice, when loading as workspace2D, re #8172
Changeset: 0495042b9e8797897cd4dc03c02ee385b7fab125
comment:7 Changed 6 years ago by Federico Montesino Pouzols
more about not loading instrument twice, and comments, re #8172
Changeset: 7a6112c60cd89008504304b811434016c11a2cfb
comment:8 Changed 6 years ago by Federico Montesino Pouzols
for now, least disruptive way to avoid loading logs twice, re #8172
Changeset: 7d6b32e9bc1a97fc4feea58e86e83838ffce7b55
comment:9 Changed 6 years ago by Federico M Pouzols
Two issues were identified in this ticket (there are/may be more related issues, see #10619).
- The first issue (logs and instrument are loaded twice in LoadEventNexus if you choose to load monitors) is fixed with the changes done so far in this ticket's branch.
- The second issue (nexus file is opened again if LoadIDFFromNexus is called as a child algorithm) I think could stay as it is for now, and it might be alleviated in further LoadXXXNexus maintenance work. See explanation below.
In principle I would not worry much about the second issue for now. I would expect (and hope) that opening NeXus files is not expensive, because it is done many times for every file that is loaded.
In LoadEventNexus.cpp the nexus file is opened several times. In general, the file name is passed to all (helper) methods that need to read from the file, and the file is re-opened (sometimes in sequence, sometimes nested open calls):
- in LoadBankFromDiskTask::run()
- in LoadEventNexus::setTopEntryName()
- in LoadEventNexus::loadEvents() to load the start_time if not loading logs
- in LoadEventNexus::loadEvents() to load the instrument (loadInstrument() -> runLoadIDFFromNexus())
- in LoadEventNexus::loadEvents() again, now to load the events
- in LoadEventNexus::loadEvents() -> loadEntryMetadata()
- in LoadEventNexus::loadEvents() -> createSpectraMapping() -> loadSpectraMapping()
- in LoadEventNexus::loadEvents() -> loadTimeOfFlight()
- in LoadEventNexus::hasEventMonitors()
The file name is also passed as a property to child algorithms that open the nexus file again:
- LoadNexusMonitors
- LoadNexusLogs
- LoadMonitors
- LoadIDFFromNexus
- LoadInstrument
comment:10 Changed 6 years ago by Federico Montesino Pouzols
make sure to init m_allBanksPulseTimes irre logs are loaded, re #8172
Changeset: 9c04c12ce39ba5dccdb691e74206beda0ec69147
comment:11 Changed 6 years ago by Federico Montesino Pouzols
this if breaks SysTests...LETReduction, investigate, re #8172
Changeset: 01662d244c0fda0e568d585b13e0c6e6f8718874
comment:12 Changed 6 years ago by Martyn Gigg
Strangely the LETReduction test is now fixed on RHEL6 on incremental but not anywhere else - http://builds.mantidproject.org/view/Develop%20Builds/job/develop_systemtests_all/
comment:13 Changed 6 years ago by Federico M Pouzols
The offending if statement was breaking SystemTests.ISISDirectInelastic.LETReduction loading of LET00006278.nxs fails with the following message (monitors not loaded):
LoadEventNexus-[Error] Error while loading monitors as events from file: vector::_M_fill_insert LoadEventNexus-[Notice] LoadEventNexus successful, Duration 6.48 seconds
Which eventually produces this error in the AnalysisTests/ISIS_LETReduction.py system test script:
ValueError: Invalid value for property InputWorkspace2 (MatrixWorkspace) "w1_monitors": Workspace "w1_monitors" was not found in the Analysis Data Service
comment:14 Changed 6 years ago by Federico Montesino Pouzols
re-use instrument and logs for monitors workspace, re #8172
Changeset: 7587f0af6e71513fc92db881a9da8925ae8b5714
comment:15 Changed 6 years ago by Federico M Pouzols
I think this last commit definitely fixes the issue, but I'm not merging into develop for now, until we get clean builds of the system tests.
comment:16 Changed 6 years ago by Federico M Pouzols
Ok, now it's in develop and it seems that nothing has broken.
A simple 'benchmarking' with LET00006278.nxs shows that this change provides a not-that-bad speed-up. With the following settings on: LoadMonitors, MonitorsAsEvents, LoadLogs I get the following times on Debian:
Once the filed is cached:
- old LoadEventNexus (master): ~4.70 s (between 4.68 and 4.73 for 10 repetitions)
- this branch: ~3.97 s (between 3.95 and 3.99 for 10 repetitions)
First load:
- old LoadEventNexus (master): 9.88 s
- this branch: ~9.20 s
comment:17 Changed 6 years ago by Federico Montesino Pouzols
don't re-open the file, as far as possible for now, re #8172
Changeset: 5fd6bc65a0183e5557e7bac4f1d7d56671cd7bc4
comment:18 Changed 6 years ago by Federico M Pouzols
In this last commit I tried to avoid re-opening the NeXus file. As listed in #comment:9, there's plenty of places where the file is being opened again without really needing it (it was generally passed by name, and only in a few method as a reference to NeXus::File object. I added a m_file data member and re-used it as much as possible. There's three places where this was not possible though. The following four methods are static and called from other related algorithms:
- LoadEventNexus::LoadEntryMetadata: used by LoadTOFRawNexus, LoadNexusMonitors
- LoadEventNexus::runLoadIDFFromNexus: used by LoadISISNexus2
- LoadEventNexus::runLoadInstrument: used by LoadTOFRawNexus, LoadPreNexus
- LoadEventNexus::runLoadNexusLogs: used by LoadTOFRawNexus
This new ticket has been created to clarify this: #10975.
This means that a file re-open has been avoided in: setTopEntryName, inside loadEvents (2 or 3 times depending on whether monitors are loaded as events), hasEventMonitors, loadSpectraMapping, and loadtimeOfFlight.
Style note: in 3 methods, a few calls like file.openGroup(...) have been modified to m_file->openGroup(...) Alternatively we could use a reference at the beginning of the methods to keep those lines unmodified. I opted for making the use of the m_file data member more explicit everywhere.
As I'd expect this modification had only marginal performance effects.
- Run time of first load: seems to go down from ~9.20 to ~9.11
- Run time of subsequent loads: virtually same values for 10 repetitions.
comment:19 Changed 6 years ago by Federico M Pouzols
It seems to be working well. Steps to test:
- Double-check unit tests and system tests (especially with '-R Nexus', '-R ISIS'.
- Review code. For an explanation of what was done, please check #comment:9, but note that in the end the multiple file open operations were cut down, as explained in #comment:18.
- In performance tests, if any, this should produce an improvement.
This ticket is related to a few other ones that are in the 'blocking' list of #10619.
comment:20 Changed 6 years ago by Federico M Pouzols
- Status changed from inprogress to verify
- Resolution set to fixed
comment:21 Changed 6 years ago by Harry Jeffery
- Status changed from verify to verifying
- Tester set to Harry Jeffery
comment:22 Changed 6 years ago by Harry Jeffery
- Status changed from verifying to closed
Merge remote-tracking branch 'origin/bugfix/8172_nexus_file_loading_clean_up'
Full changeset: 2b9929870432c6cf51a54ea63213e98f2e16aeca
comment:23 Changed 6 years ago by Federico M Pouzols
- Keywords Maintenance, CORE added; Maintenance removed
Added CORE keyword as suggested by Anders.
comment:24 Changed 6 years ago by Harry Jeffery
Merge remote-tracking branch 'origin/bugfix/8172_nexus_file_loading_clean_up'
Full changeset: 2b9929870432c6cf51a54ea63213e98f2e16aeca
comment:25 Changed 5 years ago by Nick Draper
Somehow these slipped through without a resolution. Set to Fixed.
comment:26 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 9017