Ticket #9194 (closed: fixed)

Opened 7 years ago

Last modified 5 years ago

IntegratePeaksUsingClusters

Reported by: Owen Arnold Owned by: Owen Arnold
Priority: major Milestone: Release 3.3
Component: Diffraction Keywords:
Cc: Blocked By: #9115
Blocking: #9257 Tester: Vickie Lynch

Description

Connected component analysis: This is an exploratory method for peak finding and integration. Theoretically both can be achieved in the same set of steps very rapidly. Arbitrary shaped peaks, in both detector space, and in reciprocal space can be identified.

The purpose of this ticket is to determine whether this approach is feasible and effective for SCD data. I will limit the initial implementations to work on MDHistoWorkspaces, as these will be the easiest to develop against.

Auxiliary classes for union-find will be required in addition to a new algorithm.

Attachments

NaCl_updated.py (1.2 KB) - added by Owen Arnold 7 years ago.

Change History

comment:1 Changed 7 years ago by Owen Arnold

  • Blocking 9115 added

comment:2 Changed 7 years ago by Nick Draper

  • Status changed from new to assigned

comment:3 Changed 7 years ago by Owen Arnold

  • Blocking 9115 removed
  • Blocked By 9115 added

comment:4 Changed 7 years ago by Owen Arnold

  • Status changed from assigned to inprogress

refs #9194. First implementation of algorithm.

Lots of work still to do yet. But on some basic tests generates expected component images (MD workspaces). One thing that will need to be solved yet, is what strategy for background detection to use. Also speedups will be required across the board.

Changeset: aebfb6efebf6b0408b633f28dcaf5853d402a00a

Changed 7 years ago by Owen Arnold

comment:5 Changed 7 years ago by Owen Arnold

For benchmarking purposes, I've added the NaCl_updated script, which works on a real MDWorkspace. The integration algorithm currently takes 803 seconds to process this workspace.

comment:6 Changed 7 years ago by Owen Arnold

refs #9194. Algorithm seems to be working well.

Working on providing a test suite for the algorithm.

Changeset: 20213917d78d0deab49e9c05edcb0909747cec90

comment:7 Changed 7 years ago by Owen Arnold

refs #9194. Tests added.

Not a complete test suite yet.

Changeset: d3049f0ee4fed25174a82ed31a4d25040187b69d

comment:8 Changed 7 years ago by Owen Arnold

refs #9194. Complete set of functional tests.

Clearly more testing is always possible, but the functional tests present here, as well as those on the worker ConnectedComponentLabelingTest suite provide a decent and useful saftey-net for my upcoming refactoring and performance improvement work.

Changeset: bfc39ce3714b4207b8bd9d80140818a5628e0987

comment:9 Changed 7 years ago by Owen Arnold

refs #9194. Add performance tests.

We know that things aren't fast because we've paid no attention whatsoever to performance. This provides a top-level benchmark for making improvements.

Changeset: 6a3e59adc1f0a09cafcc397fff9893a7d44b6516

comment:10 Changed 7 years ago by Owen Arnold

Performance tests execute in 1.00 seconds on my Windows Debug build. at the time of writing.

comment:11 Changed 7 years ago by Owen Arnold

refs #9194. Add input validators.

Changeset: bc80092eaf2cc6164c6b013963a692a756f9ba75

comment:12 Changed 7 years ago by Owen Arnold

refs #9194. Refactor PeakBackground into separate physical location.

Changeset: 7102ddbde6f575f85306cd5c6c4fda60db54ffc8

comment:13 Changed 7 years ago by Owen Arnold

refs #9194. Move permutation calcuations to init.

These are dimensionality and grid size dependent. They can be performed once because the itertor and mdhistoworkspace at large has a fixed dimensionality. Avoids having to redo the calculation on every findNeighbour call.

Changeset: 0fb4e5074c0eb3d8b25af91dabc82e7fa0ae9aff

comment:14 Changed 7 years ago by Owen Arnold

refs #9194. Kill loops and parralelize remaining loops where possible.

Remove loops that are not required. Use OpenMP to parralelize existing loops. Major change is to pre-calculate background indexes in parallel since the calls to BackgroundStrategy.isBackground() could be expensive.

Changeset: aab6f686386c559145e4716138796d9eba54d341

comment:15 Changed 7 years ago by Owen Arnold

refs #9194. Reserve neighbours to avoid resize on pushback

Changeset: d6172c61073948f1dde404c0bd8ce2a6fbe07280

comment:16 Changed 7 years ago by Owen Arnold

refs #9194. executeAndIntegrate on ConnectedComponentType.

Move integration onto the ConnectedComponentLabel type. This eliminates the need to have two separate loops over the same data.

Changeset: 41405e6f2b39d8bec6a9d65a37e3009ce238f448

comment:17 Changed 7 years ago by Owen Arnold

refs #9194. Remove need for std::find.

Changeset: 347373918bb4f128e1388915e531d1b692e8959d

comment:18 Changed 7 years ago by Owen Arnold

refs #9194. Tests no longer match restrictions.

Changeset: b49329eb70a838acb7ef1877ea4d198b8f9f389a

comment:19 Changed 7 years ago by Owen Arnold

refs #9194. Missing method in test class.

Changeset: 8c71a11809c9c199d66ed9a08d325b7e13b4d1d7

comment:20 Changed 7 years ago by Owen Arnold

refs #9194. Try replace custom DisjointElement

Changeset: 7b2bd126eba59fba68a4d114381c64e5ff8d14de

comment:21 Changed 7 years ago by Owen Arnold

refs #9194. Get working with boost::disjoint_set.

Changeset: c33f672fd301ebba7dbba81742e4333f1d7bb86f

comment:22 Changed 7 years ago by Owen Arnold

refs #9194. Remove custom DisjointElement.

Changeset: 073d3cc41a8fb65c8fe16dea57d9975b8d408265

comment:23 Changed 7 years ago by Owen Arnold

I'm going to revert back to using my own DisjointElement implementation, because it is twice as fast (according to the ConnectedComponentLabelingPerformanceTest) as the boost implementation. This is probably because the boost implementation involves many map insertions that are not required in my implementation. I originally wanted to lean on the boost implementation for maintenance reasons, but I think speed is king here.

I am only intending on reverting the last three commits, so if required, I can still go back to the boost implementation.

comment:24 Changed 7 years ago by Owen Arnold

Revert "refs #9194. Remove custom DisjointElement."

This reverts commit 073d3cc41a8fb65c8fe16dea57d9975b8d408265.

Changeset: 0fb32737638cec96d9ef501a9df34cca5990c840

comment:25 Changed 7 years ago by Owen Arnold

Revert "refs #9194. Get working with boost::disjoint_set."

This reverts commit c33f672fd301ebba7dbba81742e4333f1d7bb86f.

Changeset: effa3657665c9b7eeeb61a49dbce83b853dd5d21

comment:26 Changed 7 years ago by Owen Arnold

Revert "refs #9194. Try replace custom DisjointElement"

This reverts commit 7b2bd126eba59fba68a4d114381c64e5ff8d14de.

Changeset: ac9a85f69af84990ceaf7a63a9946618a87a6e30

comment:27 Changed 7 years ago by Owen Arnold

refs #9194. Cleanup, document and speedups.

Changeset: c72305363a97580f2c3ca95f145d4102cdc857e7

comment:28 Changed 7 years ago by Martyn Gigg

There are a few compiler warnings.

comment:29 Changed 7 years ago by Owen Arnold

refs #9194. Minor tweaks around reporting

Changeset: 84de4be8e54b79269df2b6ecf466b83135929f1f

comment:30 Changed 7 years ago by Owen Arnold

refs #9194. Document code better.

Changeset: 9a266c544bfddba71bfc35c4e6d484c33bc9ea39

comment:31 Changed 7 years ago by Owen Arnold

refs #9194. Fix gcc warnings.

Changeset: 070489fd928afd9d58fc1bcb0ef153828d92cba6

comment:32 Changed 7 years ago by Owen Arnold

Tester:

The following script shows this working. The cluster output md workspace is used to demonstrate that the clustering algorithm is finding regions to integrate within the expected regions of peaks integrated via IntegratePeaksMD.

SXD23767=Load(Filename='SXD23767.raw',LoadMonitors='Exclude')

QLab = ConvertToDiffractionMDWorkspace(InputWorkspace=SXD23767, OutputDimensions='Q (lab frame)', SplitThreshold=50, LorentzCorrection='1',MaxRecursionDepth='13',Extents='-15,15,-15,15,-15,15')

peaks_qLab = FindPeaksMD(InputWorkspace='QLab', MaxPeaks=300, DensityThresholdFactor=10,PeakDistanceThreshold=1.0)

integrated_via_peaks_md = IntegratePeaksMD(InputWorkspace=QLab, PeaksWorkspace=peaks_qLab, PeakRadius=0.2)

binned = BinMD(InputWorkspace=QLab,AlignedDim0='Q_lab_x,-10,10,300',AlignedDim1='Q_lab_y,-10,10,300',AlignedDim2='Q_lab_z,-10,10,300')

integrated_via_clusters, clusters = IntegratePeaksUsingClusters(InputWorkspace=binned, PeaksWorkspace=peaks_qLab, Threshold=10000, RadiusEstimate=0.2)

svw = plotSlice(source=clusters)
sv = svw.getSlicer()
sv.setPeaksWorkspaces([integrated_via_clusters, integrated_via_peaks_md])

in the slice viewer order the peaks by Intensity in an descending order for both peaks listed. Clicking on the first few peaks, you should see that the integrated value are in the same ball park and that the most intense peaks have roughly the same integrated value.

comment:33 Changed 7 years ago by Owen Arnold

There is still some more performance work to do here. My performance measurements indicate that the following areas need to (and can) be improved.

  • PeakBackground does not need to loop over the peaks workspace. Hard Background would be better. All that we would need to do afterwards is to find the cluster ids corresponding to the peaks centre. This is actually very easy to do.
  • There are some parallel implementations of the CCL. With the current implementation I have only managed to parallelise some of the code, but because the labelling needs to by synched with labelling made to neighbouring cells, it cannot be done in parallel in its current form.

comment:34 Changed 7 years ago by Owen Arnold

  • Blocking 9257 added

comment:35 Changed 7 years ago by Owen Arnold

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:36 Changed 7 years ago by Owen Arnold

  • Summary changed from FindPeaksUsingClusters to IntegratePeaksUsingClusters

comment:37 Changed 7 years ago by Vickie Lynch

  • Status changed from verify to verifying
  • Tester set to Vickie Lynch

comment:38 Changed 7 years ago by Vickie Lynch

  • Status changed from verifying to closed

Merge remote branch 'origin/feature/9194_connected_component_algorithm'

Full changeset: b08a9518b2ab93b818703fd179d8adc7f65eb81c

comment:39 Changed 6 years ago by Nick Draper

  • Milestone changed from Backlog to Release 3.3

comment:40 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10037

Note: See TracTickets for help on using tickets.