Ticket #3118 (closed: fixed)

Opened 9 years ago

Last modified 5 years ago

Speed up CreateWorkspace

Reported by: Russell Taylor Owned by: Russell Taylor
Priority: major Milestone: Iteration 29
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: Tester: Janik Zikovsky

Description

From Garrett:

Create Workspace is particularly slow when creating large workspaces.

for a workspace with 115712 spectra and 8000 histograms, It is taking roughly 4hours and 128G of ram.

Change History

comment:1 Changed 9 years ago by Russell Taylor

  • Status changed from new to accepted

comment:2 Changed 9 years ago by Russell Taylor

(In [12194]) Reduce vector copying and resizing in CreateWorkspace algorithm. Re #3118.

comment:3 Changed 9 years ago by Russell Taylor

The previous change sped up exec() by 30%

comment:4 Changed 9 years ago by Russell Taylor

(In [12326]) CreateWorkspace now 65% faster, and also has options to provide common X values only once for all the spectra and to skip the E values. Re #3118.

comment:5 Changed 9 years ago by Mathieu Doucet

(In [12331]) Re #3118 Fix PythonAlgorithmTest

comment:6 Changed 9 years ago by Russell Taylor

(In [12390]) Add progress reporting to CreateWorkspace. Re #3118.

comment:7 Changed 9 years ago by Russell Taylor

  • Status changed from accepted to verify
  • Resolution set to fixed

The algorithm itself is much improved, but there are still issues with generating the history (see #3136) and with running it from python, both of which require the (potentially very long) input vector properties to be turned into strings (which is very slow). Really, this algorithm is never likely to be suitable for the size of workspace Garrett was creating since it requires all data points to be concatenated into a very large contiguous array.

On my machine, creating a 10k x 10k workspace (as I do in the performance test that's been added) takes:

  • 2.3s as a C++ child algorithm (so no history)
  • 215s as a regular C++ algorithm
  • 133s as a Python subalgorithm creating the data in numpy arrays and passing those in

It takes 81s to make a SEQ sized workspace as a C++ child algorithm.

comment:8 Changed 9 years ago by Janik Zikovsky

  • Status changed from verify to verifying
  • Tester set to Janik Zikovsky

comment:9 Changed 9 years ago by Janik Zikovsky

  • Status changed from verifying to closed

Works with the caveats indicated by Russell. It takes 17 seconds from a Python script on my machine for a 1000x1000 workspace. Those massive strings are the real problem. In the longer term, a smarter way to set array properties from Python that uses the Numpy array directly instead of converting to string could be considered....

comment:10 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 3965

Note: See TracTickets for help on using tickets.