Ticket #3586 (closed: fixed)

Opened 9 years ago

Last modified 5 years ago

LoadSQW: optimize for very large files

Reported by: Janik Zikovsky Owned by: Janik Zikovsky
Priority: critical Milestone: Release 2.0
Component: Mantid Keywords:
Cc: owen.arnold@… Blocked By:
Blocking: Tester: Owen Arnold

Description

Need to be able to load the large SQW files from toby into a MDEventWorkspace (presumably a file-backed one).

Change History

comment:1 Changed 9 years ago by Owen Arnold

20Gb and 40Gb Fe files contain visible spin-waves when rebinned. We should also ensure that the MDEW rebinning algorithm can work effectively on this volume of data. Janik may want to split this into a separate ticket.

comment:2 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 30 to Iteration 31

Bulk move of tickets to iteration 31 at the iteration 30 code freeze

comment:3 Changed 9 years ago by Janik Zikovsky

  • Status changed from new to accepted

comment:4 Changed 9 years ago by Janik Zikovsky

In [14851]:

Refs #3586: LoadSQW works a lot better for SQW files that are large but still fit in memory. 6 GB SQW file loads in about 2 minutes.

comment:5 Changed 9 years ago by Janik Zikovsky

In [14855]:

Refs #3586: Version of LoadSQW that can load directly to a file-backed MDworkspace

comment:6 Changed 9 years ago by Janik Zikovsky

In [14856]:

Refs #3586: Refresh cache

comment:7 Changed 9 years ago by Janik Zikovsky

In [14860]:

Refs #3586: Parallelized part of the SQW loading, tweak to make loading to a file back-end significantly faster (~ twice the speed)

comment:8 Changed 9 years ago by Janik Zikovsky

In [14862]:

Refs #3586

comment:9 Changed 9 years ago by Janik Zikovsky

  • Status changed from accepted to verify
  • Resolution set to fixed

My latest test converted a 60 GB SQW file (fe_E1400_8K.sqw) into a 81 GB .nxs file with 1.76 billion events in 66 minutes. This is a good improvement over the last time of ~190 minutes! Output workspace had 11 million boxes. Memory use was around 4 GB during this process.

This performance is in line with (a bit slower) the performance of the MergeMD algorithm which treated 1 billion events in 25 minutes. Keep in mind that you have 140 GB of disk IO which would take ~25 minutes alone, so while the algorithm is not quite IO limited it is within a factor of 2-3x of IO limits.

comment:10 Changed 9 years ago by Janik Zikovsky

In [14889]:

Refs #3586: Make Loading large files faster in paraview

comment:11 Changed 9 years ago by Owen Arnold

  • Status changed from verify to verifying
  • Tester set to Owen Arnold

comment:12 Changed 9 years ago by Owen Arnold

  • Status changed from verifying to closed

Works as expected. Test run Loaded a 30 GB Iron dataset in 3135 seconds. Acceptable given that SQW is soon to become a redundant format anyway.

comment:13 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 4433

Note: See TracTickets for help on using tickets.