Ticket #9360 (closed: fixed)
Record and avoid duplicates in merging
Reported by: | Owen Arnold | Owned by: | Owen Arnold |
---|---|---|---|
Priority: | major | Milestone: | Release 3.2 |
Component: | Diffraction | Keywords: | |
Cc: | Blocked By: | #9257 | |
Blocking: | #9509 | Tester: | Michael Reuter |
Description
One of the problems with parallel implementations of CCL (#9257) is that when tracking cluster across boundaries, sets of labels may be duplicated many times. This means that unnecessary merge operations are performed in ClusterRegister::merge.
It should be faster to track pairs that have already been merged. To do this, you need to implement a hash operation so that a, b can be treated as equivalent to b, a. boost::hash probably provides the answer here. Once you have an appropriate hash operation, you can track pairs in a set, and only process pairs that have not already been identified.
Change History
comment:3 Changed 6 years ago by Owen Arnold
- Status changed from assigned to inprogress
refs #9360. Use hash cache pre-check.
We create a hash of each pair of labels to avoid processing duplicates. This makes a huge difference to performance. I'm seeeing a consistent x6 speedup on my osx build in debug mode with the IntegratePeaksUsingClustersPerformanceTest.
This pre-checking is particularly important where the Threshold for IntegratePeaksUsingClusters has been set low, because in that instance there will be many neighbouring clusters and therefore many pairs with the same label ids.
Changeset: ce732ee4256a5c42bf489d53f9c9c09ed458311e
comment:4 Changed 6 years ago by Owen Arnold
Tester: Testing this ticket is easy.
1) Check code changes and static analysis.
2) Check all unit tests still pass. There is already extensive test coverage for these features, so broken behaviour will very likely be picked up.
3) Take the Master nightly build of Mantid and run the following with Logs at at least notification. Record how long it takes IntegratePeaksUsingClusters to run. Now take the latest development build and re-run the scrip. Record how long it takes IntegratePeaks using clusters to run. On my machine the timings go from ~12 seconds down to less than 1 second.
ws_name = "TOPAZ_3132" filename = ws_name +"_event.nxs" ws = LoadEventNexus(Filename=filename,FilterByTofMin=3000, FilterByTofMax=16000) # Convert to Q space LabQ = ConvertToDiffractionMDWorkspace(InputWorkspace=ws, LorentzCorrection='0', OutputDimensions='Q (lab frame)', SplitInto=2, SplitThreshold=150) # Find peaks PeaksLattice = FindPeaksMD(InputWorkspace=LabQ,MaxPeaks=100) HistoMDQLab = BinMD(InputWorkspace=LabQ,AlignedDim0='Q_lab_x, 0, 8, 300',AlignedDim1='Q_lab_y, -10, 10, 300',AlignedDim2='Q_lab_z, 0, 10, 300') PeaksLattice_Integrated_Clusters, ClusterImage = IntegratePeaksUsingClusters(InputWorkspace=HistoMDQLab, PeaksWorkspace=PeaksLattice, Threshold=100000)
comment:5 Changed 6 years ago by Owen Arnold
- Status changed from inprogress to verify
- Resolution set to fixed
comment:6 Changed 6 years ago by Owen Arnold
refs #9360. Update wiki text.
Changeset: e7bb5c32fe8b1d293df8f08df7491821da367be3
comment:7 Changed 6 years ago by Michael Reuter
- Status changed from verify to verifying
- Tester set to Michael Reuter
comment:8 Changed 6 years ago by Owen Arnold
refs #9360. Fix VS C++ warning
Changeset: d364497c24008f42306147e7229d94e469964a93
comment:9 Changed 6 years ago by Michael Reuter
- Status changed from verifying to closed
Merge remote-tracking branch 'origin/feature/9360_ccl_pair_hash'
Full changeset: 65f76cb4f078c86f53ce1e12bea99274f9c930a5
comment:11 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 10203
To implement this: The following code demonstrates this working and the templated order method and the hash will do the job.
Apply the ordering to the arguments BEFORE hashing them is very important.