Ticket #9257 (closed: fixed)

Opened 7 years ago

Last modified 5 years ago

Speed up IntegratePeaksUsingClusters

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: major Milestone: Release 3.2
Component: Diffraction Keywords:
Cc: Blocked By: #9194
Blocking: #9360 Tester: Martyn Gigg

Description

Speed up the IntegratePeaksUsingClusters algorithm using the performance pointers described in #9194.

Attachments

IntegrateTopaz.py (1.9 KB) - added by Owen Arnold 6 years ago.
peak_animation.gif (380.8 KB) - added by Owen Arnold 6 years ago.

Change History

comment:1 Changed 7 years ago by Owen Arnold

refs #9257. Working new parallel implementation of clustering.

Clustering now works on a fundamentally thread-based basis. We then combine labels between the individual results for label ids that are otherewise equivalent. This seems to be working at least in the 1D test cases I have provided, but the implementation will need further work, and I must unbreak the Integration routines I have affected with these changes.

Changeset: 77b50477ce2ea558caea75c08ccca1603d1fe98a

comment:2 Changed 7 years ago by Owen Arnold

refs #9257. Extend tests to nThreads. Fix logic error.

Tests extended and refactored. The multithreaded schenario now has some good test cases. I had to fix a bug where empty labels were being processed via a label map. However empty labels always have an id of -1, so weren't being found in the map and this was causing a null pointer exception further down the line.

Changeset: f5b7b9f9daeff678e0d217fed75d3dbd9558e338

comment:3 Changed 7 years ago by Owen Arnold

The multithreaded implementation for the CCL algorithm now seems to be logically correct for the new multi threaded case. I have good test cases for this now. Performance can now be targeted safely.

There are a number of places to look to speed up:

1) namely the sequential section that builds clusters and handles the remote disjoint element cases.

2) Following this, I will then need to Return the cluster objects and have the integration algorithm use these clusters.

3) I then need to look at swapping in the HardBackgroundThreshold for the PeakStrategy. This should massively speed up the Integration algorithm.

Last edited 7 years ago by Owen Arnold (previous) (diff)

comment:4 Changed 7 years ago by Owen Arnold

refs #9257. Move more into parallel blocks.

Changeset: 4016109b6d2117199afe49557b12c574a0d23697

comment:5 Changed 7 years ago by Owen Arnold

refs #9257. More documentation.

Changeset: 5ec0859ef40dc7fbfcbf656ac67b59697812f1e8

comment:6 Changed 7 years ago by Owen Arnold

refs #9257. Track clusters.

Cluster extracted to it's own type in it's own physical structure. Clusters are now passed around. Tests added for the cluster type.

Changeset: ebe894c037201a32e3a895646df3d522067834fb

comment:7 Changed 7 years ago by Owen Arnold

Actually. It would be a lot quicker. And most accurate to do the integration in the following way.

1) Do not integrate the clusters, but return them from the CCL.

2) Get the coordinates of the Peak centre in the WS coordinates. May need the transform code to do this properly.

3) On the cluster workspace, get the label corresponding to that peak center.

4) Look up the Cluster in the map via the just found peak label.

5) Now integrate only the clusters of interest.

Last edited 7 years ago by Owen Arnold (previous) (diff)

comment:8 Changed 7 years ago by Owen Arnold

refs #9257. Move PeakTransform to Mantid API.

This allows the PeakTransform to be used in multiple projects. It is required in Crystal where Peak coordinates must also be properly selected and transformed according to an IMDWorkspace setup.

Changeset: f8b01fec2860c5f06f6aae79badfa8b5f25dbba4

comment:9 Changed 7 years ago by Owen Arnold

refs #9257. Starting to fix integration.

Changeset: b3e26e719c3245d26c2961a3229edf0cdfab1e52

comment:10 Changed 7 years ago by Owen Arnold

refs #9257. Have the integration working.

Have the integration working parallel and non parallel. Peformance tests have reduced in speed by 1/2 so now run in approximately 1/2 a second. The non-parallel version runs quicker than the parallel version indicating that some work still needs to be done to improve the speed of the merging.

Changeset: 0c2d7a29bac0b8312cb21c9ff9efdae2dbd26896

comment:11 Changed 7 years ago by Owen Arnold

refs #9257. Make integration thread safe.

A very simple solution to a threading issue. We can safely process the peaks list in parallel if the integration can be guaranteed to be thread safe. By making the interation a const member that simply returns the results we do this. Therefore concurrent thread access will cause no side effects. This Will allow us to handle the case where the user has selected a threshold such that two peaks end up sitting on the same cluster and therefore the integration method could be entered simultaneously. Functional programming had the solution.

Also, I've made some changes to reduce the amount of work happening to determine the Clusters which are unresolved. I can avoid this duplication because for each indexpair that gets registered, there will always be a duplicate (there has to be as each unreachable neighbour for one process index is itself an unreachable neighbour for another process index) and therfore only one cluster needs to be entered into the vector.

Changeset: 8a99aee784d3387292d843402065ae76c1114014

comment:12 Changed 7 years ago by Owen Arnold

refs #9257. Do writeTo in parallel.

Changeset: 299ab7d862cae155dda1d990271d00e5e851bb37

comment:13 Changed 7 years ago by Owen Arnold

refs #9257. Extend tests for cluster type.

Changeset: 6dbdecc4f8c8e1b80d9b3b8df1cbacb667d0980a

comment:14 Changed 7 years ago by Owen Arnold

refs #9257. No need to specify the radius any more.

Part of the speedup is removing the PeakBackground strategy for the simplier Hard threshold. We also have no need to use the radius. This is nicer as we just let the algorithm figure out the integrated region associated with a peak with minimal help (other than the threshold of course).

Changeset: b077fb5059caa2f65b3d526c3d1f23e0c97f9ffa

comment:15 Changed 7 years ago by Owen Arnold

refs #9257. Add diagnostics

Changeset: e83580f1954c1c24571b9fe18812dd0dea5af3d9

comment:16 Changed 6 years ago by Owen Arnold

refs #9257. Fix critical bug.

Also added reporting back in.

Changeset: c2fa00e2756a21bd3e1ebb7dd64aebb99ef4b77b

comment:17 Changed 6 years ago by Owen Arnold

refs #9257. More performance work.

Also start fixing a bug whereby the cluster offsets can become larger than the range of an integer. This has been partially fixed by better calculating the number of possible clusters that could exist. The full fix should be to also change the type of m_id in DisjointElement from int to size_t!

Another issue that is being worked on here is a faster clone of the MDWorkspace. We only care about it's shape, so we just need something the right size to start putting data into!

Changeset: 24bb41c474025ec7b91f6e72f841acad116cba10

comment:18 Changed 6 years ago by Owen Arnold

refs #9257. Add clone method.

Changeset: a2793e40a400ba38701de0759896e522ff59391c

Changed 6 years ago by Owen Arnold

comment:19 Changed 6 years ago by Owen Arnold

refs #9257. Update documentation. Fix RHEL6 header issue.

Changeset: dafbd82a140e8aafec289ea963332d90ba749248

comment:20 Changed 6 years ago by Owen Arnold

refs #9257. Find and fix another bug.

Unit tests revealed two bugs in the updated merge logic. Tests now pass again.

Changeset: 4942bcdc4ae5f4d993524c8d3c6368c0b46bd438

comment:21 Changed 6 years ago by Owen Arnold

refs #9257. The way that logging works has changed.

Changeset: 4975bb8c42076468d68417a7fba4e8eb4e3ede76

comment:22 Changed 6 years ago by Owen Arnold

Dear Tester:

First off: Apologies for the size of this ticket. I thought about splitting it up to make it more granular, but it really was a piece of work that could either be properly done (or not), and would have been confusing at best to break up.

The changes here have reduced the processing time by 1/2 at least! Amongst some other desirable features over the original version of the algorithm, The user no longer needs to specify a guess at a PeakRadius. The entire set of clusters is processed, and only those that align with peaks are integrated (in parallel) when the time comes to do so.

You can test this using the script given in #9194. Or using the one attached here. If you use either of these and you have a machine with low memory < 16GB. Then change BinMD to bin to a 300 * 300 * 300 grid in both scripts before executing them.

comment:23 Changed 6 years ago by Owen Arnold

  • Status changed from new to assigned

comment:24 Changed 6 years ago by Owen Arnold

  • Status changed from assigned to verify
  • Resolution set to fixed

comment:25 Changed 6 years ago by Owen Arnold

  • Milestone changed from Backlog to Release 3.2

comment:26 Changed 6 years ago by Owen Arnold

  • Status changed from verify to reopened
  • Resolution fixed deleted

comment:27 Changed 6 years ago by Owen Arnold

Labeling across processor boundaries does not appear to be being updated properly. The following script illustrates this.

ws_name = "TOPAZ_3132"
filename = ws_name +"_event.nxs"
ws = LoadEventNexus(Filename=filename,FilterByTofMin=3000, FilterByTofMax=16000)

# Convert to Q space
LabQ = ConvertToDiffractionMDWorkspace(InputWorkspace=ws, LorentzCorrection='0',
        OutputDimensions='Q (lab frame)', SplitInto=2, SplitThreshold=150)
        
# Find peaks
PeaksLattice = FindPeaksMD(InputWorkspace=LabQ,MaxPeaks=100)

        
# Find the UB matrix using the peaks and known lattice parameters
FindUBUsingLatticeParameters(PeaksWorkspace=PeaksLattice, a=10.3522, b=6.0768, c=4.7276,
                alpha=90, beta=90, gamma=90, NumInitial=20, Tolerance=0.12)
                
# And index to HKL            
IndexPeaks(PeaksWorkspace=PeaksLattice, Tolerance=0.12)

HistoMDQLab = BinMD(InputWorkspace=LabQ,AlignedDim0='Q_lab_x, 2.7, 3, 50',AlignedDim1='Q_lab_y, -0.9, -0.6, 50',AlignedDim2='Q_lab_z, 9.2, 9.5,  50')   

PeaksLattice_Integrated_Clusters, ClusterImage = IntegratePeaksUsingClusters(InputWorkspace=HistoMDQLab, PeaksWorkspace=PeaksLattice, Threshold=100000)

svw = plotSlice(ClusterImage)
sv = svw.getSlicer()
pp_all = sv.setPeaksWorkspaces([PeaksLattice_Integrated_Clusters])
pp = pp_all.getPeaksPresenter(PeaksLattice_Integrated_Clusters)
pp.zoomToPeak(31)
svw.zoomBy(0.01)

Now in the Slice viewer, move forward-backward over the peak cluster. You will see a colour change. This is because the labels are not being resolved properly.

Last edited 6 years ago by Owen Arnold (previous) (diff)

comment:28 Changed 6 years ago by Martyn Gigg

comment:29 Changed 6 years ago by Owen Arnold

Also need to add a memory check. This would be incredibly useful as the algorithm will grind to a halt if the image is too large for the available space. See https://github.com/mantidproject/mantid/blob/a17060aa8b75e3c2d70bca8e0367808e8f0cd9d3/Code/Mantid/Vates/VatesAPI/src/LoadVTK.cpp

Changed 6 years ago by Owen Arnold

comment:30 Changed 6 years ago by Owen Arnold

  • Status changed from reopened to inprogress

refs #9257. Add test case for failure.

Changeset: ed5270c7a6c0004354fd82b7b89c3789615307a5

comment:31 Changed 6 years ago by Owen Arnold

refs #9257. Add composite cluster.

A composite cluster will fix the problem I've been having. Namely that the clusters are processed in parallel to get the uniform min, but the order that they are processed in will affect the end labeling. If clusters are set up as composites before hand. This will not be a problem.

Changeset: 75614849308c8749eed22d58422d06f2ac072ec0

comment:32 Changed 6 years ago by Martyn Gigg

comment:33 Changed 6 years ago by Owen Arnold

refs #9257. Add ClusterRegister>

Changeset: f05f257efc3523d948d04fb63c4e5d8e82e759a2

comment:34 Changed 6 years ago by Owen Arnold

refs #9257. Fix cpp check issues.

Changeset: e23618756e92fdcfd48148dfca4de8d048d84675

comment:35 Changed 6 years ago by Owen Arnold

refs #9257. Use ClusterRegister. Tests pass.

Changeset: 50a64af88772a1902e1af5eee9b04f43210aee6b

comment:36 Changed 6 years ago by Owen Arnold

refs #9257. Trying to fix a bug with merge on ClusterRegister

The problem does not get picked up in the unit tests, and only occurs with a real dataset to work on. The problem is that we get a overloaded stack due to some clusters becoming their own children. When the getLabel is then called on the compositecluster, it results in an overloaded stack. I think this could be fixed by changing the logic and working of the merge method to use a member vector<set<label_id> > to track each cluster group for the incomplete clusters, and only generating the composites (which would only ever by one level deep) at the end of the process.

Changeset: c013609c035a4e17767cbca2dab1875063083251

comment:37 Changed 6 years ago by Owen Arnold

refs #9257. Working clusterregister.

ClusterRegister fixed. Test cases work (unit) as well as the test scripts that I have marked as broken previously.

Changeset: cfeca1655c6c8c7dbf7305371f3e8ac3cd13f899

comment:38 Changed 6 years ago by Owen Arnold

refs #9257. Tidy up and add memory check.

Changeset: 3c4fe8436c4ccb3df3ea2634d0084861f98a6cfe

comment:39 Changed 6 years ago by Owen Arnold

refs #9257. Fix RHEL6 errors and warnings.

Add better documentation.

Changeset: 1d60bf388e18ece3d8c89daa98beb3ca8f76a199

comment:40 Changed 6 years ago by Owen Arnold

refs #9257. Fix GCC warning.

Changeset: 584810bd55b38c288ce5e117e739eb3d7ec10764

comment:41 Changed 6 years ago by Owen Arnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:42 Changed 6 years ago by Owen Arnold

  • Blocking 9360 added

comment:43 Changed 6 years ago by Martyn Gigg

  • Status changed from verify to reopened
  • Resolution fixed deleted

There's a cppcheck error in ConnectedComponentLabelling

comment:44 Changed 6 years ago by Owen Arnold

  • Status changed from reopened to inprogress

refs #9257. Delete unused variable.

Changeset: e104446a89039549a6371f852486fc61ea09a8e0

comment:45 Changed 6 years ago by Owen Arnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:46 Changed 6 years ago by Owen Arnold

refs #9257. Fix doxygen warnings.

Changeset: 5f0efa6514a46400b66c867ded4af61ab60fccbb

comment:47 Changed 6 years ago by Martyn Gigg

  • Status changed from verify to verifying
  • Tester set to Martyn Gigg

comment:48 Changed 6 years ago by Martyn Gigg

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/feature/9257_ccl_performance'

Full changeset: 0e40cbb96dc71ba0ba7e43eff83223d5a19bd256

comment:49 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10100

Note: See TracTickets for help on using tickets.