Ticket #10375 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Race condition in ParameterMap

Reported by: Martyn Gigg Owned by: Martyn Gigg
Priority: blocker Milestone: Release 3.3
Component: Framework Keywords:
Cc: Blocked By:
Blocking: Tester: Pete Peterson

Description

A git bisect shows that this commit has caused some systemtests, particular ARCSReduction, to start segfaulting part way through.

gdb shows that the problem is around the ParameterMap access but the segfault actually occurs in boost code.

My suspicion is that switching to boost::unordered_map has uncovered a subtle race condition that was already present.

Change History

comment:1 Changed 6 years ago by Martyn Gigg

  • Status changed from new to assigned

comment:2 Changed 6 years ago by Martyn Gigg

This is confirmed by the fact that if I take out the OpenMP look from BinaryOperation then the segfault goes away. Running the helgrind tool shows a race condition between retrieving and modifying the map:

==23510== Possible data race during read of size 8 at 0x578938E0 by thread #14
==23510== Locks held: 1, at address 0x30CF0320
==23510==    at 0xEC38388: boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::empty() const (unordered_map.hpp:672)
==23510==    by 0xEC35EBA: Mantid::Geometry::ParameterMap::positionOf(Mantid::Geometry::IComponent const*, char const*, char const*) const (ParameterMap.cpp:710)
==23510==    by 0xEC35AF6: Mantid::Geometry::ParameterMap::get(Mantid::Geometry::IComponent const*, char const*, char const*) const (ParameterMap.cpp:659)
==23510==    by 0xEBB3B50: Mantid::Geometry::Instrument::isDetectorMasked(int const&) const (Instrument.cpp:584)
==23510==    by 0xEBB3C0D: Mantid::Geometry::Instrument::isDetectorMasked(std::set<int, std::less<int>, std::allocator<int> > const&) const (Instrument.cpp:608)
==23510==    by 0x1E687BC0: Mantid::Algorithms::MedianDetectorTest::maskOutliers(std::vector<double, std::allocator<double> >, boost::shared_ptr<Mantid::API::MatrixWorkspace>, std::vector<std::vector<unsigned long, std::allocator<unsigned long> >, std::allocator<std::vector<unsigned long, std::allocator<unsigned long> > > >) [clone ._omp_fn.0] (MedianDetectorTest.cpp:257)
==23510==    by 0x6A3204E: gomp_thread_start (team.c:115)
==23510==    by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==23510==    by 0x4E45181: start_thread (pthread_create.c:312)
==23510==    by 0x5155FBC: clone (clone.S:111)
==23510== 
==23510== This conflicts with a previous write of size 8 by thread #11
==23510== Locks held: 1, at address 0x26E13A60
==23510==    at 0xEC3F8B9: boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::add_node(boost::unordered::detail::node_constructor<std::allocator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > >&, unsigned long, boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >) (equivalent.hpp:418)
==23510==    by 0xEC3D4B6: boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::emplace_impl(boost::unordered::detail::node_constructor<std::allocator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > >&) (equivalent.hpp:431)
==23510==    by 0xEC3BA54: boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > boost::unordered::detail::grouped_table_impl<boost::unordered::detail::multimap<std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >, Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const> > >::emplace<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (equivalent.hpp:465)
==23510==    by 0xEC39F26: boost::unordered::iterator_detail::iterator<boost::unordered::detail::grouped_ptr_node<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > > boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::emplace<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > >(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (unordered_map.hpp:720)
==23510==    by 0xEC38A8F: boost::unordered::unordered_multimap<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter>, boost::hash<Mantid::Geometry::IComponent* const>, std::equal_to<Mantid::Geometry::IComponent* const>, std::allocator<std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> > > >::insert(std::pair<Mantid::Geometry::IComponent* const, boost::shared_ptr<Mantid::Geometry::Parameter> >&&) (unordered_map.hpp:869)
==23510==    by 0xEC344AF: Mantid::Geometry::ParameterMap::add(Mantid::Geometry::IComponent const*, boost::shared_ptr<Mantid::Geometry::Parameter> const&) (ParameterMap.cpp:346)
==23510==    by 0xEC38D77: void Mantid::Geometry::ParameterMap::add<bool>(std::string const&, Mantid::Geometry::IComponent const*, std::string const&, bool const&) (ParameterMap.h:139)
==23510==    by 0xEC35547: Mantid::Geometry::ParameterMap::addBool(Mantid::Geometry::IComponent const*, std::string const&, bool) (ParameterMap.cpp:517)

comment:4 Changed 6 years ago by Martyn Gigg

Fix race condition in ParameterMap

If threads are competing to try and both check and update the map then they will hit the appropriate PARALLEL_CRITICAL sections within the relevant methods. The bug was that the critical sections all had different names and so OpenMP would not block on an ::add while a ::get is being performed. The critical secctions are renamed to reflect the what they are protecting. Refs #10375

Changeset: 63e43e0eb64b35cd3f1fcec7391bd7b4c1f84b7d

comment:5 Changed 6 years ago by Martyn Gigg

Fix up whitespace in ParameterMap.cpp

Refs #10375

Changeset: 7cfbfa666da4da0c5f373a6320bc4833857604da

comment:5 Changed 6 years ago by Martyn Gigg

  • Status changed from assigned to verify
  • Resolution set to fixed

This is being verified as pull request #44.

comment:6 Changed 6 years ago by Pete Peterson

  • Status changed from verify to verifying
  • Tester set to Pete Peterson

comment:7 Changed 6 years ago by Pete Peterson

  • Status changed from verifying to closed

Merge pull request #44 from mantidproject/bugfix/10375_parameter_map_race_condition

Looks good and everything appears to work. I couldn't find where m_mapAccess was defined though.

Full changeset: 82f92943e5dd4d93dc1a1638e50641ec5391c127

comment:8 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11217

Note: See TracTickets for help on using tickets.