Ticket #3118 (closed: fixed)
Speed up CreateWorkspace
Reported by: | Russell Taylor | Owned by: | Russell Taylor |
---|---|---|---|
Priority: | major | Milestone: | Iteration 29 |
Component: | Mantid | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Tester: | Janik Zikovsky |
Description
From Garrett:
Create Workspace is particularly slow when creating large workspaces.
for a workspace with 115712 spectra and 8000 histograms, It is taking roughly 4hours and 128G of ram.
Change History
comment:7 Changed 9 years ago by Russell Taylor
- Status changed from accepted to verify
- Resolution set to fixed
The algorithm itself is much improved, but there are still issues with generating the history (see #3136) and with running it from python, both of which require the (potentially very long) input vector properties to be turned into strings (which is very slow). Really, this algorithm is never likely to be suitable for the size of workspace Garrett was creating since it requires all data points to be concatenated into a very large contiguous array.
On my machine, creating a 10k x 10k workspace (as I do in the performance test that's been added) takes:
- 2.3s as a C++ child algorithm (so no history)
- 215s as a regular C++ algorithm
- 133s as a Python subalgorithm creating the data in numpy arrays and passing those in
It takes 81s to make a SEQ sized workspace as a C++ child algorithm.
comment:8 Changed 9 years ago by Janik Zikovsky
- Status changed from verify to verifying
- Tester set to Janik Zikovsky
comment:9 Changed 9 years ago by Janik Zikovsky
- Status changed from verifying to closed
Works with the caveats indicated by Russell. It takes 17 seconds from a Python script on my machine for a 1000x1000 workspace. Those massive strings are the real problem. In the longer term, a smarter way to set array properties from Python that uses the Numpy array directly instead of converting to string could be considered....
comment:10 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 3965