Ticket #4133 (closed: fixed)
Check/fix possible race conditions
Reported by: | Martyn Gigg | Owned by: | Martyn Gigg |
---|---|---|---|
Priority: | critical | Milestone: | Release 2.0 |
Component: | Mantid | Keywords: | |
Cc: | taylorrj@… | Blocked By: | |
Blocking: | Tester: | Peter Peterson |
Description
We have been getting random unit tests failures recently that would indicate we have some race conditions somewhere in the code.
Tools such as helgrind, drd can help with this.
Attachments
Change History
comment:1 Changed 9 years ago by Martyn Gigg
The attached file shows the output after running the convert units test through helgrind. It inidcates a number of problems most notably around the parameter map caching.
comment:4 Changed 9 years ago by Martyn Gigg
Refs #4133. Fixed some race conditions in various places.
The valgrind tool helgrind flagged up some race conditions in the code. This is not a complete list of those that need fixing but some major ones that would affect large portions of the code
Changeset: b04a3eaa36407e9caabf0ea45ae1e76a3eb1b44a
comment:5 Changed 9 years ago by Russell Taylor
Not surprisingly, this has had an impact on the performance tests. Looking at the RHEL6 tests run at ORNL, the following tests show noticeable slowdowns:
- CreateGroupingWorkspaceTestPerformance.test_TOPAZ_2010
- DiffractionFocussing2TestPerformance.test_SNAP_event_one_group_dontPreserveEvents
- DiffractionFocussing2TestPerformance.test_SNAP_event_six_groups
- DiffractionFocussing2TestPerformance.test_SNAP_event_six_groups_dontPreserveEvents
- Q1D2TestPerformance.test_slow_performance
- InstrumentRayTracerTestPerformance.test_TOPAZ
It's not all doom and gloom though. The following tests got faster at this point!
- LoadEventNexusTestPerformance.testDefaultLoad
- LoadEventPreNexusTestPerformance.testDefaultLoad
- MaskDetectorsInShapeTestPerformance.testMaskingLotsOfDetectors
- NearestNeighboursTestPerformance.testUsingNumberOfNeighbours
comment:6 Changed 9 years ago by Martyn Gigg
Refs #4133. Swap assign for a fill to avoid race condition warning.
Changeset: a99e0c533e418b8c0d4c738a3eb0aa523c84dab7
comment:7 Changed 9 years ago by Martyn Gigg
I'm guessing the ones that have slowed down all do a lot of getPos calls which means they now have to go through the critical block to get in to the cache. I suppose one way of avoiding that would be switching to storing the absolute positions rather relative positions. That way it wouldn't have to go into the cache. Although if you moved something it would still have to go into the parameter map so I don't really have a feel for how much gain we would get from that
comment:8 Changed 9 years ago by Martyn Gigg
- Status changed from accepted to verify
- Resolution set to fixed
I ran the WISH script through helgrind last night and it didn't flag anything up either.
comment:9 Changed 9 years ago by Martyn Gigg
- Status changed from verify to reopened
- Resolution fixed deleted
comment:10 Changed 9 years ago by Martyn Gigg
Refs #4133. More race conditions found.
The HFIR system test randomly failed on RHEL last night. Helgrind flagged up a problem with the ParameterMap access again. This will probably affect the performance but correctness is of higher importance.
Changeset: 9775b281b348f086b4d5b47a01ba9bbc782a330a
comment:12 Changed 9 years ago by Martyn Gigg
- Status changed from accepted to verify
- Resolution set to fixed
comment:13 Changed 9 years ago by Peter Peterson
- Status changed from verify to verifying
- Tester set to Peter Peterson
comment:14 Changed 9 years ago by Peter Peterson
- Status changed from verifying to closed
Looks good to me.
comment:15 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 4980