Ticket #3834 (closed: fixed)
Crash when setting MultiThreaded.MaxCores to > number of machine cores
Reported by: | Russell Taylor | Owned by: | Russell Taylor |
---|---|---|---|
Priority: | minor | Milestone: | Release 2.0 |
Component: | Mantid | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Tester: | Vickie Lynch |
Description
Witnessed when working with WISH data on both my Mac (2 cores) and PC (12 cores). In the former case it crashes (100% of the time) when MultiThreaded.MaxCores is 3 or greater, in the latter case if it's 14 or more.
MultiThreaded.MaxCores is used in Mantid to call the OpenMP omp_set_num_threads directive. In principle, it should be possible to have more threads than cores, so I don't know why it suddenly crashes when this threshold is breached when we've never (afaik) seen a problem in this part of the code otherwise. One suggestion has been whether it's to do with the stack size, but a quick play with setting the OMP_STACKSIZE environment variable had no apparent effect.
The obvious and easy solution is to prevent the setting of the number of threads to greater than the number of processors.
Linux stack trace:
boost::detail::sp_counted_base::use_count() at sp_counted_base_gcc_x86.hpp:165 0x997c20 boost::detail::shared_count::use_count() at shared_count.hpp:269 0x997c4d boost::detail::shared_count::unique() at shared_count.hpp:274 0x7ffff7c2d5fa boost::shared_ptr<Mantid::Geometry::Detector>::unique() at shared_ptr.hpp:432 0x7ffff755bbd4 Mantid::Geometry::ComponentPool<Mantid::Geometry::Detector>::getIndexInCache() at ParComponentFactory.cpp:78 0x7ffff755b2ac Mantid::Geometry::ComponentPool<Mantid::Geometry::Detector>::create() at ParComponentFactory.cpp:46 0x7ffff755ab96 Mantid::Geometry::ParComponentFactory::createDetector() at ParComponentFactory.cpp:118 0x7ffff755a1de Mantid::Geometry::Instrument::getDetector() at Instrument.cpp:412 0x7ffff74ef755 Mantid::Geometry::Instrument::getDetectors() at Instrument.cpp:481 0x7ffff74efe7c Mantid::API::MatrixWorkspace::getDetector() at MatrixWorkspace.cpp:734 0x7ffff6f9ead5 Mantid::Algorithms::ConvertUnits::convertViaTOF() at ConvertUnits.cpp:422 0x7fffdeaa4685
Mac stack trace:
Thread 8 Crashed: 0 libMantidGeometry.dylib 0x0000000100f6ec39 Mantid::Geometry::ParComponentFactory::createDetector(Mantid::Geometry::IDetector const*, Mantid::Geometry::ParameterMap const*) + 713 1 libMantidGeometry.dylib 0x0000000100f20b52 Mantid::Geometry::Instrument::getDetectors(std::set<int, std::less<int>, std::allocator<int> > const&) const + 178 2 libMantidAPI.dylib 0x0000000101294573 Mantid::API::MatrixWorkspace::getDetector(unsigned long) const + 803 3 libMantidAlgorithms.dylib 0x00000001192cbedc Mantid::Algorithms::ConvertUnits::convertViaTOF(boost::shared_ptr<Mantid::Kernel::Unit const>, boost::shared_ptr<Mantid::API::MatrixWorkspace>) + 5052 4 libiomp5.dylib 0x0000000104d45b53 __kmp_invoke_microtask + 147 5 libiomp5.dylib 0x0000000104d26195 __kmpc_invoke_task_func + 181 6 libiomp5.dylib 0x0000000104d2204a __kmp_launch_thread + 490 7 libiomp5.dylib 0x0000000104d45f5d __kmp_launch_worker(void*) + 333 8 libSystem.B.dylib 0x00007fff83bdcfd6 _pthread_start + 331 9 libSystem.B.dylib 0x00007fff83bdce89 thread_start + 13
Change History
comment:2 Changed 9 years ago by Russell Taylor
- Status changed from new to accepted
- Owner set to Russell Taylor
comment:5 Changed 9 years ago by Russell Taylor
- Status changed from accepted to verify
- Resolution set to fixed
To test:
- Set the MultiThreaded.MaxCores property to something larger than your number of physical processors
- Load a WISH data file (or any one that will end up with parameterized detectors - LOQ's another, I think)
- Run ConvertUnits to, e.g., wavelength
- Before the fix, this was a guaranteed crash
comment:6 Changed 9 years ago by Vickie Lynch
- Status changed from verify to verifying
- Tester set to Vickie Lynch
I think I've figured out what's going on. ParComponentFactory initialises its g_detPool static variable based on the maximum number of threads. This happens before whatever value is in MultiThreaded.MaxCores is set. When ComponentPool::getIndexInCache() then accesses things based on the (new) max number of threads, things go bad.
I spotted this when I noticed that if you set the number of threads via the OMP_NUM_THREADS environment variable, then things are fine.