Ticket #10375 (closed: fixed)
Race condition in ParameterMap
Reported by: | Martyn Gigg | Owned by: | Martyn Gigg |
---|---|---|---|
Priority: | blocker | Milestone: | Release 3.3 |
Component: | Framework | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Tester: | Pete Peterson |
Description
A git bisect shows that this commit has caused some systemtests, particular ARCSReduction, to start segfaulting part way through.
gdb shows that the problem is around the ParameterMap access but the segfault actually occurs in boost code.
My suspicion is that switching to boost::unordered_map has uncovered a subtle race condition that was already present.
Change History
comment:2 Changed 6 years ago by Martyn Gigg
This is confirmed by the fact that if I take out the OpenMP look from BinaryOperation then the segfault goes away. Running the helgrind tool shows a race condition between retrieving and modifying the map:
==23510== Possible data race during read of size 8 at 0x578938E0 by thread #14 ==23510== Locks held: 1, at address 0x30CF0320 ==23510== at 0xEC38388: boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::empty() const (unordered_map.hpp:672) ==23510== by 0xEC35EBA: Mantid::Geometry::ParameterMap::positionOf(Mantid::Geometry::IComponent const*, char const*, char const*) const (ParameterMap.cpp:710) ==23510== by 0xEC35AF6: Mantid::Geometry::ParameterMap::get(Mantid::Geometry::IComponent const*, char const*, char const*) const (ParameterMap.cpp:659) ==23510== by 0xEBB3B50: Mantid::Geometry::Instrument::isDetectorMasked(int const&) const (Instrument.cpp:584) ==23510== by 0xEBB3C0D: Mantid::Geometry::Instrument::isDetectorMasked(std::set<int, std::less<int>, std::allocator<int> > const&) const (Instrument.cpp:608) ==23510== by 0x1E687BC0: Mantid::Algorithms::MedianDetectorTest::maskOutliers(std::vector<double, std::allocator<double> >, boost::shared_ptr<Mantid::API::MatrixWorkspace>, std::vector<std::vector<unsigned long, std::allocator<unsigned long> >, std::allocator<std::vector<unsigned long, std::allocator<unsigned long> > > >) [clone ._omp_fn.0] (MedianDetectorTest.cpp:257) ==23510== by 0x6A3204E: gomp_thread_start (team.c:115) ==23510== by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==23510== by 0x4E45181: start_thread (pthread_create.c:312) ==23510== by 0x5155FBC: clone (clone.S:111) ==23510== ==23510== This conflicts with a previous write of size 8 by thread #11 ==23510== Locks held: 1, at address 0x26E13A60 ==23510== at 0xEC3F8B9: boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::add_node(boost::unordered::detail::node_constructor<std::allocator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > >&, unsigned long, boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >) (equivalent.hpp:418) ==23510== by 0xEC3D4B6: boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::emplace_impl(boost::unordered::detail::node_constructor<std::allocator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > >&) (equivalent.hpp:431) ==23510== by 0xEC3BA54: boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::emplace<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (equivalent.hpp:465) ==23510== by 0xEC39F26: boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::emplace<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (unordered_map.hpp:720) ==23510== by 0xEC38A8F: boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::insert(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (unordered_map.hpp:869) ==23510== by 0xEC344AF: Mantid::Geometry::ParameterMap::add(Mantid::Geometry::IComponent const*, boost::shared_ptr<Mantid::Geometry::Parameter> const&) (ParameterMap.cpp:346) ==23510== by 0xEC38D77: void Mantid::Geometry::ParameterMap::add<bool>(std::string const&, Mantid::Geometry::IComponent const*, std::string const&, bool const&) (ParameterMap.h:139) ==23510== by 0xEC35547: Mantid::Geometry::ParameterMap::addBool(Mantid::Geometry::IComponent const*, std::string const&, bool) (ParameterMap.cpp:517)
comment:4 Changed 6 years ago by Martyn Gigg
Fix race condition in ParameterMap
If threads are competing to try and both check and update the map then they will hit the appropriate PARALLEL_CRITICAL sections within the relevant methods. The bug was that the critical sections all had different names and so OpenMP would not block on an ::add while a ::get is being performed. The critical secctions are renamed to reflect the what they are protecting. Refs #10375
Changeset: 63e43e0eb64b35cd3f1fcec7391bd7b4c1f84b7d
comment:5 Changed 6 years ago by Martyn Gigg
Fix up whitespace in ParameterMap.cpp
Refs #10375
Changeset: 7cfbfa666da4da0c5f373a6320bc4833857604da
comment:5 Changed 6 years ago by Martyn Gigg
- Status changed from assigned to verify
- Resolution set to fixed
This is being verified as pull request #44.
comment:6 Changed 6 years ago by Pete Peterson
- Status changed from verify to verifying
- Tester set to Pete Peterson
comment:7 Changed 6 years ago by Pete Peterson
- Status changed from verifying to closed
Merge pull request #44 from mantidproject/bugfix/10375_parameter_map_race_condition
Looks good and everything appears to work. I couldn't find where m_mapAccess was defined though.
Full changeset: 82f92943e5dd4d93dc1a1638e50641ec5391c127