Ticket #9194 (closed: fixed)
IntegratePeaksUsingClusters
Reported by: | Owen Arnold | Owned by: | Owen Arnold |
---|---|---|---|
Priority: | major | Milestone: | Release 3.3 |
Component: | Diffraction | Keywords: | |
Cc: | Blocked By: | #9115 | |
Blocking: | #9257 | Tester: | Vickie Lynch |
Description
Connected component analysis: This is an exploratory method for peak finding and integration. Theoretically both can be achieved in the same set of steps very rapidly. Arbitrary shaped peaks, in both detector space, and in reciprocal space can be identified.
The purpose of this ticket is to determine whether this approach is feasible and effective for SCD data. I will limit the initial implementations to work on MDHistoWorkspaces, as these will be the easiest to develop against.
Auxiliary classes for union-find will be required in addition to a new algorithm.
Attachments
Change History
comment:4 Changed 7 years ago by Owen Arnold
- Status changed from assigned to inprogress
refs #9194. First implementation of algorithm.
Lots of work still to do yet. But on some basic tests generates expected component images (MD workspaces). One thing that will need to be solved yet, is what strategy for background detection to use. Also speedups will be required across the board.
Changeset: aebfb6efebf6b0408b633f28dcaf5853d402a00a
comment:5 Changed 7 years ago by Owen Arnold
For benchmarking purposes, I've added the NaCl_updated script, which works on a real MDWorkspace. The integration algorithm currently takes 803 seconds to process this workspace.
comment:6 Changed 7 years ago by Owen Arnold
refs #9194. Algorithm seems to be working well.
Working on providing a test suite for the algorithm.
Changeset: 20213917d78d0deab49e9c05edcb0909747cec90
comment:7 Changed 7 years ago by Owen Arnold
refs #9194. Tests added.
Not a complete test suite yet.
Changeset: d3049f0ee4fed25174a82ed31a4d25040187b69d
comment:8 Changed 7 years ago by Owen Arnold
refs #9194. Complete set of functional tests.
Clearly more testing is always possible, but the functional tests present here, as well as those on the worker ConnectedComponentLabelingTest suite provide a decent and useful saftey-net for my upcoming refactoring and performance improvement work.
Changeset: bfc39ce3714b4207b8bd9d80140818a5628e0987
comment:9 Changed 7 years ago by Owen Arnold
refs #9194. Add performance tests.
We know that things aren't fast because we've paid no attention whatsoever to performance. This provides a top-level benchmark for making improvements.
Changeset: 6a3e59adc1f0a09cafcc397fff9893a7d44b6516
comment:10 Changed 7 years ago by Owen Arnold
Performance tests execute in 1.00 seconds on my Windows Debug build. at the time of writing.
comment:11 Changed 7 years ago by Owen Arnold
refs #9194. Add input validators.
Changeset: bc80092eaf2cc6164c6b013963a692a756f9ba75
comment:12 Changed 7 years ago by Owen Arnold
refs #9194. Refactor PeakBackground into separate physical location.
Changeset: 7102ddbde6f575f85306cd5c6c4fda60db54ffc8
comment:13 Changed 7 years ago by Owen Arnold
refs #9194. Move permutation calcuations to init.
These are dimensionality and grid size dependent. They can be performed once because the itertor and mdhistoworkspace at large has a fixed dimensionality. Avoids having to redo the calculation on every findNeighbour call.
Changeset: 0fb4e5074c0eb3d8b25af91dabc82e7fa0ae9aff
comment:14 Changed 7 years ago by Owen Arnold
refs #9194. Kill loops and parralelize remaining loops where possible.
Remove loops that are not required. Use OpenMP to parralelize existing loops. Major change is to pre-calculate background indexes in parallel since the calls to BackgroundStrategy.isBackground() could be expensive.
Changeset: aab6f686386c559145e4716138796d9eba54d341
comment:15 Changed 7 years ago by Owen Arnold
refs #9194. Reserve neighbours to avoid resize on pushback
Changeset: d6172c61073948f1dde404c0bd8ce2a6fbe07280
comment:16 Changed 7 years ago by Owen Arnold
refs #9194. executeAndIntegrate on ConnectedComponentType.
Move integration onto the ConnectedComponentLabel type. This eliminates the need to have two separate loops over the same data.
Changeset: 41405e6f2b39d8bec6a9d65a37e3009ce238f448
comment:17 Changed 7 years ago by Owen Arnold
refs #9194. Remove need for std::find.
Changeset: 347373918bb4f128e1388915e531d1b692e8959d
comment:18 Changed 7 years ago by Owen Arnold
refs #9194. Tests no longer match restrictions.
Changeset: b49329eb70a838acb7ef1877ea4d198b8f9f389a
comment:19 Changed 7 years ago by Owen Arnold
refs #9194. Missing method in test class.
Changeset: 8c71a11809c9c199d66ed9a08d325b7e13b4d1d7
comment:20 Changed 7 years ago by Owen Arnold
refs #9194. Try replace custom DisjointElement
Changeset: 7b2bd126eba59fba68a4d114381c64e5ff8d14de
comment:21 Changed 7 years ago by Owen Arnold
refs #9194. Get working with boost::disjoint_set.
Changeset: c33f672fd301ebba7dbba81742e4333f1d7bb86f
comment:22 Changed 7 years ago by Owen Arnold
refs #9194. Remove custom DisjointElement.
Changeset: 073d3cc41a8fb65c8fe16dea57d9975b8d408265
comment:23 Changed 7 years ago by Owen Arnold
I'm going to revert back to using my own DisjointElement implementation, because it is twice as fast (according to the ConnectedComponentLabelingPerformanceTest) as the boost implementation. This is probably because the boost implementation involves many map insertions that are not required in my implementation. I originally wanted to lean on the boost implementation for maintenance reasons, but I think speed is king here.
I am only intending on reverting the last three commits, so if required, I can still go back to the boost implementation.
comment:24 Changed 7 years ago by Owen Arnold
Revert "refs #9194. Remove custom DisjointElement."
This reverts commit 073d3cc41a8fb65c8fe16dea57d9975b8d408265.
Changeset: 0fb32737638cec96d9ef501a9df34cca5990c840
comment:25 Changed 7 years ago by Owen Arnold
Revert "refs #9194. Get working with boost::disjoint_set."
This reverts commit c33f672fd301ebba7dbba81742e4333f1d7bb86f.
Changeset: effa3657665c9b7eeeb61a49dbce83b853dd5d21
comment:26 Changed 7 years ago by Owen Arnold
Revert "refs #9194. Try replace custom DisjointElement"
This reverts commit 7b2bd126eba59fba68a4d114381c64e5ff8d14de.
Changeset: ac9a85f69af84990ceaf7a63a9946618a87a6e30
comment:27 Changed 7 years ago by Owen Arnold
refs #9194. Cleanup, document and speedups.
Changeset: c72305363a97580f2c3ca95f145d4102cdc857e7
comment:28 Changed 7 years ago by Martyn Gigg
There are a few compiler warnings.
comment:29 Changed 7 years ago by Owen Arnold
refs #9194. Minor tweaks around reporting
Changeset: 84de4be8e54b79269df2b6ecf466b83135929f1f
comment:30 Changed 7 years ago by Owen Arnold
refs #9194. Document code better.
Changeset: 9a266c544bfddba71bfc35c4e6d484c33bc9ea39
comment:31 Changed 7 years ago by Owen Arnold
refs #9194. Fix gcc warnings.
Changeset: 070489fd928afd9d58fc1bcb0ef153828d92cba6
comment:32 Changed 7 years ago by Owen Arnold
Tester:
The following script shows this working. The cluster output md workspace is used to demonstrate that the clustering algorithm is finding regions to integrate within the expected regions of peaks integrated via IntegratePeaksMD.
SXD23767=Load(Filename='SXD23767.raw',LoadMonitors='Exclude') QLab = ConvertToDiffractionMDWorkspace(InputWorkspace=SXD23767, OutputDimensions='Q (lab frame)', SplitThreshold=50, LorentzCorrection='1',MaxRecursionDepth='13',Extents='-15,15,-15,15,-15,15') peaks_qLab = FindPeaksMD(InputWorkspace='QLab', MaxPeaks=300, DensityThresholdFactor=10,PeakDistanceThreshold=1.0) integrated_via_peaks_md = IntegratePeaksMD(InputWorkspace=QLab, PeaksWorkspace=peaks_qLab, PeakRadius=0.2) binned = BinMD(InputWorkspace=QLab,AlignedDim0='Q_lab_x,-10,10,300',AlignedDim1='Q_lab_y,-10,10,300',AlignedDim2='Q_lab_z,-10,10,300') integrated_via_clusters, clusters = IntegratePeaksUsingClusters(InputWorkspace=binned, PeaksWorkspace=peaks_qLab, Threshold=10000, RadiusEstimate=0.2) svw = plotSlice(source=clusters) sv = svw.getSlicer() sv.setPeaksWorkspaces([integrated_via_clusters, integrated_via_peaks_md])
in the slice viewer order the peaks by Intensity in an descending order for both peaks listed. Clicking on the first few peaks, you should see that the integrated value are in the same ball park and that the most intense peaks have roughly the same integrated value.
comment:33 Changed 7 years ago by Owen Arnold
There is still some more performance work to do here. My performance measurements indicate that the following areas need to (and can) be improved.
- PeakBackground does not need to loop over the peaks workspace. Hard Background would be better. All that we would need to do afterwards is to find the cluster ids corresponding to the peaks centre. This is actually very easy to do.
- There are some parallel implementations of the CCL. With the current implementation I have only managed to parallelise some of the code, but because the labelling needs to by synched with labelling made to neighbouring cells, it cannot be done in parallel in its current form.
comment:35 Changed 7 years ago by Owen Arnold
- Status changed from inprogress to verify
- Resolution set to fixed
comment:36 Changed 7 years ago by Owen Arnold
- Summary changed from FindPeaksUsingClusters to IntegratePeaksUsingClusters
comment:37 Changed 7 years ago by Vickie Lynch
- Status changed from verify to verifying
- Tester set to Vickie Lynch
comment:38 Changed 7 years ago by Vickie Lynch
- Status changed from verifying to closed
Merge remote branch 'origin/feature/9194_connected_component_algorithm'
Full changeset: b08a9518b2ab93b818703fd179d8adc7f65eb81c
comment:40 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 10037