Ticket #10187 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Proper group processing behaviour in SaveNexus and SaveProcessedNexus

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: critical Milestone: Release 3.3
Component: Framework Keywords:
Cc: Blocked By:
Blocking: #9848, #10188 Tester: mareuter

Description

There is a lot of wasted effort as part of this algorithm, because it fails to recognise the speedup that can be achieved when you write the file in one-shot. see #9848 for preliminary investigation. This is hitting GroupWorkspace IO hard currently.

  • The end result file should be the same
  • There is no change to the format
  • Everything should work as it does, now, it should just be faster.

Change History

comment:1 Changed 6 years ago by Owen Arnold

  • Blocking 10188 added

comment:2 Changed 6 years ago by Owen Arnold

Pointers for code implementation

We want to override processGroups and open the file once at the start, and close it once at the end, once we've finished.

Line 232 in SaveNexusProcess.cpp, nexusFile-openNexusWrite() results in an m_handle to the file being generated. We need to find a way to keep hold of this without having to call openNexusWrite to regenerate it. Noe that openNexusWrite does things relating to creating the top level group for each workspace. We do still need to do this, so we probably need to refactor openNexusWrite to spit out the functionality.

The close-out call is line 325. in SaveNexusProcessed.cpp. This should sit at the end of processGroups. The history writing just above that can also probably be delayed untill the end of processGroups.

This algorithm will still need to work for non-group cases, so best to refactor to methods where possible.

comment:3 Changed 6 years ago by Owen Arnold

  • Status changed from new to assigned

comment:4 Changed 6 years ago by Owen Arnold

  • Status changed from assigned to inprogress

refs #10187. Start refactoring processing logic

Make the same methods available for group processing as well as individual workspace processing.

Changeset: 6a87f37dfd31b3eaf9b1c6b83b4ef84695a0143a

comment:5 Changed 6 years ago by Owen Arnold

refs #10187. Start modifying for processgroup overload.

Changeset: 193339bde58418cbdeec9c859413c67d30d01a3f

comment:6 Changed 6 years ago by Owen Arnold

refs #10187. Not fixed yet.

Looks like there's an issue with the m_filehandler not being initialized properly for some reason!

Changeset: 803b3dc67dc6706f8bc8722ef27e705647776484

comment:7 Changed 6 years ago by Owen Arnold

refs #10187. Need to open the group first.

Changeset: cb0fc9d792904be2d5f1cb188fad9e9f87390faf

comment:8 Changed 6 years ago by Owen Arnold

x = [0,1,2,3]
names = str()
for i in range(0,3):
        name = 'ws_'+str(i)
        CreateWorkspace(DataX=x, DataY=[i,i,i], OutputWorkspace=name)
        names += name + ", "
p = GroupWorkspaces(InputWorkspaces=names)

comment:9 Changed 6 years ago by Owen Arnold

refs #10187. Fix SaveNexus.

Changeset: 3f8ab82017f3df36e1b185f2b67e75a4ffbc39b9

comment:10 Changed 6 years ago by Owen Arnold

refs #10187. Bug in test code.

Changeset: 3348bfb264168dbeb620b1fd1845d08b00d1fdad

comment:11 Changed 6 years ago by OwenArnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #37.

comment:12 Changed 6 years ago by Owen Arnold

refs #10187. History tracking fix.

The history tracking object must be initialized whether we are processing groups or not.

Changeset: 432d6f1652125c76fda275ffa6e3cf9ee8d23dc7

comment:13 Changed 6 years ago by Owen Arnold

refs #10187. Old gcc compiler complains about init order.

Changeset: b00c39a33ab01967149b3ba73f9c286544378e73

comment:14 Changed 6 years ago by Owen Arnold

Revert "refs #10187. Old gcc compiler complains about init order."

This reverts commit b00c39a33ab01967149b3ba73f9c286544378e73.

Changeset: 46e4eefd80c42659d67ae51b8e48f9ba40201717

comment:15 Changed 6 years ago by Owen Arnold

refs #10187. Compiler complains about init order.

Changeset: dec7b803a53342964e1c094a09a47a77f5e2236f

comment:16 Changed 6 years ago by Owen Arnold

refs #10187. More init order warning fixes.

Changeset: ad5c5e5d89bf36e2f3453cd51f4c32dd15ca49e0

comment:17 Changed 6 years ago by Owen Arnold

refs #10187. Sort out all ordering.

Changeset: b675a8faa8bc1f81a1349079348756777905b88a

comment:18 Changed 6 years ago by Owen Arnold

refs #10187. Always close group.

A few issues fixed here. Firstly. I found that the algorithm was failing when thrown a very large group workspace. This turned out to be because we needed to close the group within the doExec call. A group i always opened, so it is necessary to close it.

The second thing done is to fix SaveNexus so that it will pass through group workspaces intact to SaveNexusProcessed. Under the previous mechanism, this would not have happened, because the generic Algorithm group processing code would have exploded out the individual workspaces.

Changeset: 98f24b5007ce725ed0b03f87d787063a9f5cd86b

comment:19 Changed 6 years ago by Owen Arnold

I did some rough timings of the new code vs the old code, and they are roughly similar. The noticeable affect is that as more entries are written, and the file gets larger the processing time per group entry gets significantly larger.

I decided to profile the code to find the exact bottle neck. 94% of the processing time (for a moderate sized group workspace of 50 entries) is tied up in findMantidWSEntries. This is called from openNexusWrite, which is called by doExec, which is itself called for every group entry via processGroup. Therefore findMantidWSEntries is called the same number of times in the old code and the new code. This explains why the runtimes are the same.

The next area of attack is to reduce the number of calls to findMantidWSEntries. We know that if we are not appending, that we can formulate the group names ourselves. Since all that findMantidWSEntries is being used for is to determine the current number of workspace entries.

comment:20 Changed 6 years ago by Owen Arnold

refs #10187. Delete existing file if no append.

Changeset: 41200f2e8e4d74a76f2b87001c9e913790b21b98

comment:21 Changed 6 years ago by Owen Arnold

refs #10187. File var not initialized fix.

Changeset: 7bbec526b659b0291e37cddf0ace1d83037df4be

comment:22 Changed 6 years ago by Owen Arnold

refs #10187. stop calling searching for groups following POGO.

Use an optional entry number parameter instead of counting workspace group entries in the file. This was the main bottleneck acording to the previous POGO report. Seems to have generated a massive speedup. Can save as WG of size 512 in < 8 seconds rather than 138 seconds as previously.

Changeset: 9157ac1315bf77a002f7faf94ae983481c250209

comment:23 Changed 6 years ago by Owen Arnold

Tester:

The most important things are that the unit tests and system tests are still running OK. I haven't been able to refactor the existing code as much as I would have liked given time constraints, but the changes should be robust.

  1. Obtain the zip file here with test data http://trac.mantidproject.org/mantid/attachment/ticket/9848/multiperiod_crash.zip
  2. Take the file INTER00027731.raw. Load it into Mantid, and save it both using SaveNexus and SaveNexusProcessed. It should complete in less than 1 second in both cases. Load the nexus files back in, and check that they are the same as the original workspaces saved out.

For stress testing:

  1. Take the file INTER00027729.raw and load it. It contains 512 periods. If you have the last release of Mantid handy, running SaveNexus on this took well over two minutes (138 seconds on my machine). With the new changes it should save out in less than 10 seconds.

comment:24 Changed 6 years ago by Karl Palmen

  • Status changed from verify to verifying
  • Tester set to Karl Palmen

comment:25 Changed 6 years ago by Karl Palmen

I cannot download the zip file. Can you please E-mail it to me?

comment:26 Changed 6 years ago by mareuter

  • Tester changed from Karl Palmen to mareuter

comment:27 Changed 6 years ago by Owen Arnold

refs #10187. Report of group processing success.

Group workspaces don't indicate via the logs that they have finished processing. Normal execution does. I've fixed that here. I've also added a message about the fact that group processing completion has been done so that timings can be seen to relate to the overall processing. This will help where the native processGroups is used, and algorithm execution logs are reported in addition.

Changeset: 0a8b9e304f332dc907a2aeb9781ebfb69f6dee2f

comment:28 Changed 6 years ago by Owen Arnold

Revert "refs #10187. Report of group processing success."

This reverts commit 0a8b9e304f332dc907a2aeb9781ebfb69f6dee2f.

Changeset: a6952dc644a43e2a23200bf20cc96d7115f553a4

comment:29 Changed 6 years ago by Owen Arnold

Revert "Revert "refs #10187. Report of group processing success.""

This reverts commit a6952dc644a43e2a23200bf20cc96d7115f553a4.

Changeset: 754634edca8eed142da46aa2a862a276428319b1

comment:30 Changed 6 years ago by Owen Arnold

refs #10187. Return result.

Changeset: 1517a7579a7a7869e60d2f926187b500d815620a

comment:31 Changed 6 years ago by Owen Arnold

  • Status changed from verifying to closed

Merge branch 'master' into feature/10187_process_groups

Full changeset: 565381eabfc9a0fc19da59b120c2d5364f37ece0

comment:32 Changed 6 years ago by Owen Arnold

Merge branch 'feature/10187_process_groups' of github.com:mantidproject/mantid into feature/10187_process_groups

Full changeset: 045c1b30c5660ae39bc804a3fd3bf439602067c9

comment:33 Changed 6 years ago by Michael Reuter

Merge pull request #37 from mantidproject/feature/10187_process_groups

Feature/10187 process groups

Full changeset: 023e4e5a312aeaf58b1a04ad4593200da24eea8f

comment:34 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11029

Note: See TracTickets for help on using tickets.