Ticket #2738 (closed: fixed)
Random failures of binary operations on OS X
Reported by: | Russell Taylor | Owned by: | Russell Taylor |
---|---|---|---|
Priority: | major | Milestone: | Release 2.5 |
Component: | Mantid | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | #6193 | Tester: | Martyn Gigg |
Description
A follow-up to ticket #2669. Having fixed the problem (or at least stopped the test failures) on Windows, it's now showing up on the new Mac build (64 bit, Intel compiler). Here, building in debug does not remove the failures. Turning off openmp does, but that's highly undesirable as a permanent solution.
Change History
comment:2 Changed 10 years ago by Nick Draper
- Status changed from new to assigned
- Owner set to Russell Taylor
comment:3 Changed 9 years ago by Nick Draper
- Milestone changed from Iteration 28 to Iteration 29
Bulk move of tickets at the end of iteration 28
comment:4 Changed 9 years ago by Nick Draper
- Milestone changed from Iteration 29 to Iteration 30
Accepted and assigned tickets moved at iteration 29 code freeze
comment:6 Changed 9 years ago by Nick Draper
- Milestone changed from Iteration 30 to Iteration 31
Bulk move of tickets to iteration 31 at the iteration 30 code freeze
comment:7 Changed 9 years ago by Nick Draper
- Milestone changed from Iteration 32 to Iteration 33
Moved to iteration 33 at iteration 32 code freeze
comment:8 Changed 8 years ago by Nick Draper
- Milestone changed from Release 2.1 to Release 2.2
Moved at end of release 2.1
comment:9 Changed 8 years ago by Russell Taylor
Just checked, and re-enabling OpenMP still leads to unit test failures for Plus, Minus, Multiply & Divide.
comment:10 Changed 8 years ago by Nick Draper
- Milestone changed from Release 2.2 to Release 2.3
Moved at the end of release 2.2
comment:11 Changed 8 years ago by Nick Draper
- Milestone changed from Release 2.3 to Release 2.4
Moved to milestone 2.4
comment:12 Changed 8 years ago by Nick Draper
- Milestone changed from Release 2.4 to Release 2.5
Moved at the code freeze for release 2.4
comment:13 Changed 8 years ago by Russell Taylor
Just tried again with the latest version (13.0/2013) of the Intel compiler - the tests still fail :(
comment:14 Changed 8 years ago by Russell Taylor
- Status changed from assigned to accepted
OK, I think I've got it....
comment:15 Changed 8 years ago by Russell Taylor
Re #2738. Re-enable OpenMP in BinaryOperation on Mac.
The problem looked to be due to the fact that the Intel compiler evaluates function arguments L->R (out other compilers do it R->L). If the LHS & output workspaces were the same one, the references obtained by the readY/E calls could be invalidated by other threads, whereas with R->L evaluation the dataY/E calls would have ensured the readY/E gave the right one. In fact, the tests that showed this up are rather artificial as it's extremely unlikely that data will be shared between Y & E vectors (unlike X vectors), but it's better to make sure it's correct.
Changeset: 6b4fcc9105a7cbff4096b5d282f3bfc1049712c4
comment:17 Changed 8 years ago by Russell Taylor
Re #2738. See if the failures on Windows related to the same thing
...as those on the Mac (see commit [6b4fcc91]).
Changeset: 0e77daf379f8000fe0706f474e6a9e492c58ee8a
comment:18 Changed 8 years ago by Russell Taylor
The plan now is to wait a couple of weeks to see if the unit tests ever fail. If they don't (and nothing 'funny' is seen elsewhere that could relate to this), then I'll close the ticket.
comment:19 Changed 8 years ago by Russell Taylor
- Status changed from accepted to verify
- Resolution set to fixed
I haven't noticed any failures of these tests (Plus, Minus, Multiply, Divide) on either Mac or Windows over the past 2 weeks. I therefore declare victory!
To test: Check that the tests have kept behaving in the jenkins jobs & inspect the code to satisfy yourself that what I've done should make a difference.
comment:20 Changed 8 years ago by Martyn Gigg
- Status changed from verify to verifying
- Tester set to Martyn Gigg
comment:21 Changed 8 years ago by Martyn Gigg
- Status changed from verifying to closed
The code changes look sensible and I haven't spotted any strange test failures either. Hoorah!
comment:22 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 3585
(In [10400]) Turn off openmp in binary operations ONLY for Intel compiler until we understand what's going wrong. Re #2738.