Ticket #5533 (closed: duplicate)

Opened 8 years ago

Last modified 5 years ago

IntegratePeaksMD seg faults sporadically

Reported by: Dennis Mikkelson Owned by: Vickie Lynch
Priority: major Milestone: Release 3.2
Component: Framework Keywords:
Cc: petersonpf@… Blocked By:
Blocking: Tester: Michael Reuter

Description (last modified by Vickie Lynch) (diff)

When used repeatedly in a script, IntegratePeaksMD will seg fault, typically after processing something between 2 and 6 TOPAZ runs. This is related to OpenMP, since if the line: PRAGMA_OMP( parallel for schedule(dynamic, 10) ) is commented out, IntegratePeaksMD runs reliably.

Change History

comment:1 Changed 8 years ago by Dennis Mikkelson

Re 5533: Comment out OMP Pragma

If the OMP pragma is included in IntegratePeaksMD, the algorithm seg faults sporadically when processing multiple TOPAZ runs in a script, on Scientific Linux 6.2. Typically, it seg faults after 2 to 6 runs are processed, though occasionally it will process all 8 requested in the script without crashing. Since the lower level codes already use OpenMP, parallelizing at this level is only marginally useful, giving about a 5-10% speedup. Perhaps it should just be removed permanantly, but for now it is commented out to avoid the seg faults. Refs #5533

Changeset: a56c8967e4d65bd6c0502742ddc429c0c4a32b6e

comment:2 Changed 8 years ago by Russell Taylor

Dennis, you say the use of OpenMP provides a small benefit, yet the IntegratePeaksMD performances tests have shown a marked slowdown - 0.9 -> 2.7s for one, 1.1 -> 4.0s for the other. How many cores were you running on? (presumably the fewer cores, the less change is seen).

I'll have a look to see if I can spot where the race condition is.

comment:3 Changed 8 years ago by Dennis Mikkelson

Re 5533: Comment out OMP Pragma

If the OMP pragma is included in IntegratePeaksMD, the algorithm seg faults sporadically when processing multiple TOPAZ runs in a script, on Scientific Linux 6.2. Typically, it seg faults after 2 to 6 runs are processed, though occasionally it will process all 8 requested in the script without crashing. Since the lower level codes already use OpenMP, parallelizing at this level is only marginally useful, giving about a 5-10% speedup. Perhaps it should just be removed permanantly, but for now it is commented out to avoid the seg faults. Refs #5533

Changeset: a56c8967e4d65bd6c0502742ddc429c0c4a32b6e

comment:4 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.2 to Release 2.3

Moved at the end of release 2.2

comment:5 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.3 to Release 2.4

Moved to milestone 2.4

comment:6 Changed 8 years ago by Dennis Mikkelson

  • Owner changed from Dennis Mikkelson to Anyone
  • Status changed from new to assigned
  • Milestone changed from Release 2.4 to Release 2.5

comment:7 Changed 7 years ago by Nick Draper

  • Milestone changed from Release 2.5 to Release 2.6

Moved to r2.6 at the end of r2.5

comment:8 Changed 7 years ago by Nick Draper

  • Status changed from assigned to new

comment:9 Changed 7 years ago by Nick Draper

  • Component changed from Mantid to Framework

comment:10 Changed 7 years ago by Nick Draper

  • Milestone changed from Release 2.6 to Backlog

Moved to backlog at the code freeze for R2.6

comment:11 Changed 7 years ago by Nick Draper

  • Status changed from new to assigned

Bulk move to assigned at the introduction of the triage step

comment:12 Changed 7 years ago by Vickie Lynch

  • Owner changed from Anyone to Vickie Lynch
  • Description modified (diff)

comment:13 Changed 7 years ago by Vickie Lynch

Last edited 6 years ago by Vickie Lynch (previous) (diff)

comment:14 Changed 7 years ago by Vickie Lynch

  • Status changed from assigned to inprogress
Last edited 6 years ago by Vickie Lynch (previous) (diff)

comment:15 Changed 7 years ago by Vickie Lynch

Last edited 6 years ago by Vickie Lynch (previous) (diff)

comment:16 Changed 7 years ago by Vickie Lynch

Last edited 6 years ago by Vickie Lynch (previous) (diff)

comment:17 Changed 7 years ago by Russell Taylor

I took the last commit out of the develop branch because it was still causing the Mac build to get stuck. Moreover, it caused crashes in performance and system tests so it's clearly not threadsafe.

comment:18 Changed 6 years ago by Vickie Lynch

Refs #5533 limit number of threads

Changeset: 3d536d35a9bcab324aac9b32d9debda9da90725b

comment:19 Changed 6 years ago by Vickie Lynch

Revert "Refs #5533 limit number of threads"

This reverts commit 3d536d35a9bcab324aac9b32d9debda9da90725b.

Changeset: 908e0beee54161bffe1d6582cba7a1b6e8396e7e

comment:20 Changed 6 years ago by Vickie Lynch

Refs #5533 try threads with fitting moved inside function

Changeset: 787a9cba192bf1e8c386ebfdd73ade3c8102ff04

comment:21 Changed 6 years ago by Vickie Lynch

Refs #5533 fix merge conflict

Changeset: 1f776b496bf10088d41133e58a45ec14d9cf349c

comment:22 Changed 6 years ago by Vickie Lynch

Revert "Refs #5533 try threads with fitting moved inside function"

This reverts commit 787a9cba192bf1e8c386ebfdd73ade3c8102ff04.

Changeset: da7ae980f13a5454373c2e4b5edb088b7cf6f79e

comment:23 Changed 6 years ago by Vickie Lynch

  • Status changed from inprogress to verify
  • Resolution set to duplicate

There is nothing to test in this ticket. All the changes have been reverted primarily because they hang on the Mac build. In ticket 7651 a parallel SCD reduction workflow will be written which will replace the current parallel python scripts and in ticket 9228 IntegratePeaksMD2 will be refactored which may fix the problem with the Mac builds.

comment:24 Changed 6 years ago by Vickie Lynch

  • Milestone changed from Backlog to Release 3.2

comment:25 Changed 6 years ago by Michael Reuter

  • Status changed from verify to verifying
  • Tester set to Michael Reuter

comment:26 Changed 6 years ago by Michael Reuter

  • Status changed from verifying to closed

Verified changes have been backed out.

comment:27 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 6379

Note: See TracTickets for help on using tickets.