Ticket #9360 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Record and avoid duplicates in merging

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: major Milestone: Release 3.2
Component: Diffraction Keywords:
Cc: Blocked By: #9257
Blocking: #9509 Tester: Michael Reuter

Description

One of the problems with parallel implementations of CCL (#9257) is that when tracking cluster across boundaries, sets of labels may be duplicated many times. This means that unnecessary merge operations are performed in ClusterRegister::merge.

It should be faster to track pairs that have already been merged. To do this, you need to implement a hash operation so that a, b can be treated as equivalent to b, a. boost::hash probably provides the answer here. Once you have an appropriate hash operation, you can track pairs in a set, and only process pairs that have not already been identified.

Change History

comment:1 Changed 6 years ago by Owen Arnold

To implement this: The following code demonstrates this working and the templated order method and the hash will do the job.

#include <iostream>
#include <limits>
#include <boost/functional/hash.hpp>

template
<typename T>
std::pair<T, T> ordered_pair(const T& a, const T& b)
{
    T min = std::min(a, b);
    T max = std::max(a, b);
    return std::pair<T, T>(min, max);
}

int main()
{
    int a = 1;
    int b = 2;
    boost::hash<std::pair<int, int> > hasher;
    size_t c = hasher(ordered_pair(a, b));
    size_t d = hasher(ordered_pair(b, a));

    
    std::cout << a << std::endl;
    std::cout << b << std::endl;
    std::cout << c << std::endl;
    std::cout << d << std::endl;
    
    return 0;

Apply the ordering to the arguments BEFORE hashing them is very important.

comment:2 Changed 6 years ago by Nick Draper

  • Status changed from new to assigned

comment:3 Changed 6 years ago by Owen Arnold

  • Status changed from assigned to inprogress

refs #9360. Use hash cache pre-check.

We create a hash of each pair of labels to avoid processing duplicates. This makes a huge difference to performance. I'm seeeing a consistent x6 speedup on my osx build in debug mode with the IntegratePeaksUsingClustersPerformanceTest.

This pre-checking is particularly important where the Threshold for IntegratePeaksUsingClusters has been set low, because in that instance there will be many neighbouring clusters and therefore many pairs with the same label ids.

Changeset: ce732ee4256a5c42bf489d53f9c9c09ed458311e

comment:4 Changed 6 years ago by Owen Arnold

Tester: Testing this ticket is easy.

1) Check code changes and static analysis.

2) Check all unit tests still pass. There is already extensive test coverage for these features, so broken behaviour will very likely be picked up.

3) Take the Master nightly build of Mantid and run the following with Logs at at least notification. Record how long it takes IntegratePeaksUsingClusters to run. Now take the latest development build and re-run the scrip. Record how long it takes IntegratePeaks using clusters to run. On my machine the timings go from ~12 seconds down to less than 1 second.

ws_name = "TOPAZ_3132"
filename = ws_name +"_event.nxs"
ws = LoadEventNexus(Filename=filename,FilterByTofMin=3000, FilterByTofMax=16000)

# Convert to Q space
LabQ = ConvertToDiffractionMDWorkspace(InputWorkspace=ws, LorentzCorrection='0',
        OutputDimensions='Q (lab frame)', SplitInto=2, SplitThreshold=150)
        
# Find peaks
PeaksLattice = FindPeaksMD(InputWorkspace=LabQ,MaxPeaks=100)
        
HistoMDQLab = BinMD(InputWorkspace=LabQ,AlignedDim0='Q_lab_x, 0, 8, 300',AlignedDim1='Q_lab_y, -10, 10, 300',AlignedDim2='Q_lab_z, 0, 10,  300')        

PeaksLattice_Integrated_Clusters, ClusterImage = IntegratePeaksUsingClusters(InputWorkspace=HistoMDQLab, PeaksWorkspace=PeaksLattice, Threshold=100000)

comment:5 Changed 6 years ago by Owen Arnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:6 Changed 6 years ago by Owen Arnold

refs #9360. Update wiki text.

Changeset: e7bb5c32fe8b1d293df8f08df7491821da367be3

comment:7 Changed 6 years ago by Michael Reuter

  • Status changed from verify to verifying
  • Tester set to Michael Reuter

comment:8 Changed 6 years ago by Owen Arnold

refs #9360. Fix VS C++ warning

Changeset: d364497c24008f42306147e7229d94e469964a93

comment:9 Changed 6 years ago by Michael Reuter

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/feature/9360_ccl_pair_hash'

Full changeset: 65f76cb4f078c86f53ce1e12bea99274f9c930a5

comment:10 Changed 6 years ago by Owen Arnold

  • Blocking 9509 added

comment:11 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10203

Note: See TracTickets for help on using tickets.