Ticket #9215 (closed: fixed)
Speed up data loading
Reported by: | Arturs Bekasovs | Owned by: | Raquel Alvarez Banos |
---|---|---|---|
Priority: | major | Milestone: | Backlog |
Component: | Muon | Keywords: | ALC |
Cc: | Blocked By: | #9213, #11382 | |
Blocking: | #11319 | Tester: | Karl Palmen |
Description (last modified by Arturs Bekasovs) (diff)
Data loading and integration takes an awful lot of time at the moment, which makes it a huge pain to play with integration parameters and change a set of runs. This is a ticket for looking into that problem.
One of the suggested solutions was to keep some of the loaded raw data in memory, and re-use it for integration. Investigate how feasible it is, given the number of files we are usually dealing with.
Another thing to investigate - make the PlotAsymmetryByLogValue use multiple threads. Loading and integrating every file is an independent operation, and probably spends quite some time waiting for the IO. This should kick the performance up a lot on multi-core machines.
Attachments
Change History
comment:2 Changed 6 years ago by Arturs Bekasovs
- Keywords ALC added
- Description modified (diff)
- Summary changed from [ALC] Reduce loading times when playing with parameters to Reduce loading times when playing with parameters
comment:3 Changed 6 years ago by Arturs Bekasovs
- type changed from enhancement to task
- Description modified (diff)
- Summary changed from Reduce loading times when playing with parameters to Speed up data loading
comment:6 Changed 6 years ago by Anders Markvardsen
- Owner changed from Arturs Bekasovs to Anders Markvardsen
comment:7 Changed 6 years ago by Anders Markvardsen
- Owner changed from Anders Markvardsen to Karl Palmen
comment:8 Changed 6 years ago by Anders Markvardsen
- Owner changed from Karl Palmen to Raquel Alvarez Banos
comment:9 Changed 6 years ago by Raquel Alvarez Banos
I see different issues to address here:
- Parallelize data loading
- Parallelize asymmetry calculation
- Add intelligence so that only new datasets are loaded (this is ticket #6931)
The three of them are related to PlotAsymmetryByLogValue, and require that some variables are declared static so that their values can be used from one call to another. 1 and 2 currently belong to the same for loop and therefore they should be split in separate loops if we want the user to be able to play with integration limits without having to load all the datasets every time. I will be creating a new ticket for this task, which will be blocked by the current ticket, and will be blocking #6931.
comment:11 Changed 6 years ago by Raquel Alvarez Banos
Just to clarify comment 9:
- In this ticket, I will be parallelizing PlotAsymmetryByLogValue, which means that data loading + asymmetry calculation will stay within the same loop.
- In ticket #11319 both processes will be split in different loops, and asymmetry calculation will be parallelized as well.
comment:12 Changed 6 years ago by Raquel Alvarez Banos
After discussion with Martyn, I am creating a new ticket that will be blocking this one. See issue #11324.
comment:13 Changed 6 years ago by Raquel Alvarez Banos
- Blocked By 11382 removed
And another one #11382
comment:16 Changed 6 years ago by Raquel Alvarez Banos
- Status changed from assigned to inprogress
Re #9215 Main loop parallelization
Changeset: ab13759a53435f722fbd3580ca99c091c0b787a1
comment:17 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Get rid of scoped workspaces which are not thread-safe
Changeset: da6b8da313ac10e9458c5b9dc31f024c27ea9c1a
comment:18 Changed 6 years ago by Raquel Alvarez
- Status changed from inprogress to verify
- Resolution set to fixed
This is being verified as pull request #449.
comment:19 Changed 6 years ago by Karl Palmen
- Status changed from verify to verifying
- Tester set to Karl Palmen
comment:20 Changed 6 years ago by Raquel Alvarez Banos
I have attached two files I have used to check performance: "run_plotasymmetry.py" is the python script I have run in Mantid after and before this fix, and "PlotAsymmetryByLogValueTimes.txt" is a brief summary reporting the results.
comment:21 Changed 6 years ago by Raquel Alvarez Banos
It seems that I can't follow the approach I had planned (see commits in comments 16 and 17). The reason is that muon nexus files (e.g. MUSR... and HIFI...) are in the old HDF4 format, which can not be safely accessed from multiple threads (this was causing the build to fail in all platforms but Windows). The only solution is to load nexus files in serial and then analyse the workspaces in parallel.
comment:22 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Update algorithm to allow data to be loaded in serial
Changeset: b91c6d62425afd326281d20d35c10a84769519ce
comment:23 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Store loaded data in vectors
Changeset: ef53f97e213e6da01ecb78e42d46858d948e1f43
comment:24 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Apply corrections and grouping if requested
Changeset: cc53d936258140f12cfbffcff4fa017c74139872
comment:25 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Fix bug and clear vectors
Changeset: c472788e6eeea74c0ccef31a25fa9318c7232ed5
comment:26 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Move progress report to loading step
Changeset: a0b31c70515c15d82efc374d996e41e831d08626
comment:27 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Add comment
Changeset: ace077d737beafe931cde36c4e5e015164944961
comment:28 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Fix compilation error on rhel7
Changeset: 6641a885c7a2993654a93ae176d66e0079226432
comment:29 Changed 6 years ago by Raquel Alvarez Banos
Re #9215 Replace omp command by macro
Changeset: 067ce917f1f15ece54708e7e2b66cd0f8e11b5ff
comment:30 Changed 6 years ago by Raquel Alvarez
Jenkins, retest this please
comment:31 Changed 6 years ago by Karl Palmen
- Status changed from verifying to closed
Merge pull request #449 from mantidproject/9215_Speed_up_data_loading
Speed up data loading
Full changeset: 381e0374db3920e5ce13d700c0206f82f523f9ec
comment:32 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 10058