Ticket #2738 (closed: fixed)

Opened 10 years ago

Last modified 5 years ago

Random failures of binary operations on OS X

Reported by: Russell Taylor Owned by: Russell Taylor
Priority: major Milestone: Release 2.5
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: #6193 Tester: Martyn Gigg

Description

A follow-up to ticket #2669. Having fixed the problem (or at least stopped the test failures) on Windows, it's now showing up on the new Mac build (64 bit, Intel compiler). Here, building in debug does not remove the failures. Turning off openmp does, but that's highly undesirable as a permanent solution.

Change History

comment:1 Changed 10 years ago by Russell Taylor

(In [10400]) Turn off openmp in binary operations ONLY for Intel compiler until we understand what's going wrong. Re #2738.

comment:2 Changed 10 years ago by Nick Draper

  • Status changed from new to assigned
  • Owner set to Russell Taylor

comment:3 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 28 to Iteration 29

Bulk move of tickets at the end of iteration 28

comment:4 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 29 to Iteration 30

Accepted and assigned tickets moved at iteration 29 code freeze

comment:5 Changed 9 years ago by Russell Taylor

  • Priority changed from critical to major

comment:6 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 30 to Iteration 31

Bulk move of tickets to iteration 31 at the iteration 30 code freeze

comment:7 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 32 to Iteration 33

Moved to iteration 33 at iteration 32 code freeze

comment:8 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.1 to Release 2.2

Moved at end of release 2.1

comment:9 Changed 8 years ago by Russell Taylor

Just checked, and re-enabling OpenMP still leads to unit test failures for Plus, Minus, Multiply & Divide.

comment:10 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.2 to Release 2.3

Moved at the end of release 2.2

comment:11 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.3 to Release 2.4

Moved to milestone 2.4

comment:12 Changed 8 years ago by Nick Draper

  • Milestone changed from Release 2.4 to Release 2.5

Moved at the code freeze for release 2.4

comment:13 Changed 8 years ago by Russell Taylor

Just tried again with the latest version (13.0/2013) of the Intel compiler - the tests still fail :(

comment:14 Changed 8 years ago by Russell Taylor

  • Status changed from assigned to accepted

OK, I think I've got it....

comment:15 Changed 8 years ago by Russell Taylor

Re #2738. Re-enable OpenMP in BinaryOperation on Mac.

The problem looked to be due to the fact that the Intel compiler evaluates function arguments L->R (out other compilers do it R->L). If the LHS & output workspaces were the same one, the references obtained by the readY/E calls could be invalidated by other threads, whereas with R->L evaluation the dataY/E calls would have ensured the readY/E gave the right one. In fact, the tests that showed this up are rather artificial as it's extremely unlikely that data will be shared between Y & E vectors (unlike X vectors), but it's better to make sure it's correct.

Changeset: 6b4fcc9105a7cbff4096b5d282f3bfc1049712c4

comment:16 Changed 8 years ago by Russell Taylor

  • Blocking 6193 added

comment:17 Changed 8 years ago by Russell Taylor

Re #2738. See if the failures on Windows related to the same thing

...as those on the Mac (see commit [6b4fcc91]).

Changeset: 0e77daf379f8000fe0706f474e6a9e492c58ee8a

Last edited 8 years ago by Russell Taylor (previous) (diff)

comment:18 Changed 8 years ago by Russell Taylor

The plan now is to wait a couple of weeks to see if the unit tests ever fail. If they don't (and nothing 'funny' is seen elsewhere that could relate to this), then I'll close the ticket.

comment:19 Changed 8 years ago by Russell Taylor

  • Status changed from accepted to verify
  • Resolution set to fixed

I haven't noticed any failures of these tests (Plus, Minus, Multiply, Divide) on either Mac or Windows over the past 2 weeks. I therefore declare victory!

To test: Check that the tests have kept behaving in the jenkins jobs & inspect the code to satisfy yourself that what I've done should make a difference.

comment:20 Changed 8 years ago by Martyn Gigg

  • Status changed from verify to verifying
  • Tester set to Martyn Gigg

comment:21 Changed 8 years ago by Martyn Gigg

  • Status changed from verifying to closed

The code changes look sensible and I haven't spotted any strange test failures either. Hoorah!

comment:22 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 3585

Note: See TracTickets for help on using tickets.