Ticket #9388 (assigned)

Opened 6 years ago

Last modified 5 years ago

MDImage Performance Investigation

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: major Milestone: Backlog
Component: Framework Keywords:
Cc: Blocked By:
Blocking: Tester:

Description (last modified by Nick Draper) (diff)

TESTER: DO NOT MERGE TO MASTER

MDHistoWorkspace provides a great interface for the existing MDWork, but also has great potential for Imageing work for the near future.

We are now seeing that finding contiguous blocks of memory for the workspaces is proving very difficult and therefore the usability of this workspace type is limited.

An area of exploration would be to attempt to find some way to breaking the arrays into piecewise contigous chunks. This is effectively what happens with a Workspace2D, which is very efficient in memory. This is an experimental piece of work. If we find that performance is badly negatively effected, then it will not prove to be a suitable solution.

Here's my suggested approach to this problem:

  • Identify, or write some performance tests that are heavily dependent on the MDHistoWorkspace performance. ConnectedComponentLabelingTestPerformance and IntegratePeaksUsingClustersPerformanceTest spring to mind.
  • Find a limiting case where we exhaust memory. Again, example scripts such as this one http://trac.mantidproject.org/mantid/ticket/9360#comment:4 run several times with the Number of Bins argument to BinMD increased to say 800 for all would be a good example.
  • Extract the container types in MDHistoWorkspace to a new type MDHistoWorkspaceContainer. It may be a good idea to template this by the element type. Expose the subscript operator[] so that the consumer usage of these containers will still work.
  • Initially, Implement the held container in MDHistoWorkspaceContainer as an array, but change it to a deque and see how the tests above fair. Could also try a piecewise continuous array, or a vector with a custom allocator.

Change History

comment:1 Changed 6 years ago by Owen Arnold

It appears that all Image types, including IMDHistoWorkspace fundamentally expose the raw arrays via a pointer to the first element. This forces the data to be served-up in a SINGLE contiguous block because otherwise the pointer arithmetic would not work. Not sure what to do here. Options are:

  • Remove the raw access option, this probably will affect performance, but would make the interface better
  • Create an array to return on demand, but this will increase memory usage, not decrease it

comment:2 Changed 6 years ago by Nick Draper

  • Keywords CORE removed
  • Status changed from new to assigned
  • type changed from enhancement to task
  • Description modified (diff)
  • Summary changed from MDImage Performance to MDImage Performance Investigation

After discussion with Owen, this has been converted into an investigation ticket. Therefore CORE removed, converted to task, and NO CODE CHANGES SHOULD BE MERGED TO MASTER.

In a local branch (no published), time the loading, saving and visualization of a dataset, remove the raw access option and do a simplistic repair of the affected functions and perform the timing again.

This should be timeboxed to 2 days maximum.

The results if positive of this may lead to a separate CORE design and implementation ticket.

comment:3 Changed 6 years ago by Nick Draper

  • Milestone changed from Release 3.2 to Backlog

Moved to Backlog at the code freeze of release 3.2

comment:4 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10231

Note: See TracTickets for help on using tickets.