Ticket #3834 (closed: fixed)

Opened 9 years ago

Last modified 5 years ago

Crash when setting MultiThreaded.MaxCores to > number of machine cores

Reported by: Russell Taylor Owned by: Russell Taylor
Priority: minor Milestone: Release 2.0
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: Tester: Vickie Lynch

Description

Witnessed when working with WISH data on both my Mac (2 cores) and PC (12 cores). In the former case it crashes (100% of the time) when MultiThreaded.MaxCores is 3 or greater, in the latter case if it's 14 or more.

MultiThreaded.MaxCores is used in Mantid to call the OpenMP omp_set_num_threads directive. In principle, it should be possible to have more threads than cores, so I don't know why it suddenly crashes when this threshold is breached when we've never (afaik) seen a problem in this part of the code otherwise. One suggestion has been whether it's to do with the stack size, but a quick play with setting the OMP_STACKSIZE environment variable had no apparent effect.

The obvious and easy solution is to prevent the setting of the number of threads to greater than the number of processors.

Linux stack trace:

boost::detail::sp_counted_base::use_count() at sp_counted_base_gcc_x86.hpp:165 0x997c20	
boost::detail::shared_count::use_count() at shared_count.hpp:269 0x997c4d	
boost::detail::shared_count::unique() at shared_count.hpp:274 0x7ffff7c2d5fa	
boost::shared_ptr<Mantid::Geometry::Detector>::unique() at shared_ptr.hpp:432 0x7ffff755bbd4	
Mantid::Geometry::ComponentPool<Mantid::Geometry::Detector>::getIndexInCache() at ParComponentFactory.cpp:78 0x7ffff755b2ac	
Mantid::Geometry::ComponentPool<Mantid::Geometry::Detector>::create() at ParComponentFactory.cpp:46 0x7ffff755ab96	
Mantid::Geometry::ParComponentFactory::createDetector() at ParComponentFactory.cpp:118 0x7ffff755a1de	
Mantid::Geometry::Instrument::getDetector() at Instrument.cpp:412 0x7ffff74ef755	
Mantid::Geometry::Instrument::getDetectors() at Instrument.cpp:481 0x7ffff74efe7c	
Mantid::API::MatrixWorkspace::getDetector() at MatrixWorkspace.cpp:734 0x7ffff6f9ead5	
Mantid::Algorithms::ConvertUnits::convertViaTOF() at ConvertUnits.cpp:422 0x7fffdeaa4685	

Mac stack trace:

Thread 8 Crashed:
0   libMantidGeometry.dylib       	0x0000000100f6ec39 Mantid::Geometry::ParComponentFactory::createDetector(Mantid::Geometry::IDetector const*, Mantid::Geometry::ParameterMap const*) + 713
1   libMantidGeometry.dylib       	0x0000000100f20b52 Mantid::Geometry::Instrument::getDetectors(std::set<int, std::less<int>, std::allocator<int> > const&) const + 178
2   libMantidAPI.dylib            	0x0000000101294573 Mantid::API::MatrixWorkspace::getDetector(unsigned long) const + 803
3   libMantidAlgorithms.dylib     	0x00000001192cbedc Mantid::Algorithms::ConvertUnits::convertViaTOF(boost::shared_ptr<Mantid::Kernel::Unit const>, boost::shared_ptr<Mantid::API::MatrixWorkspace>) + 5052
4   libiomp5.dylib                	0x0000000104d45b53 __kmp_invoke_microtask + 147
5   libiomp5.dylib                	0x0000000104d26195 __kmpc_invoke_task_func + 181
6   libiomp5.dylib                	0x0000000104d2204a __kmp_launch_thread + 490
7   libiomp5.dylib                	0x0000000104d45f5d __kmp_launch_worker(void*) + 333
8   libSystem.B.dylib             	0x00007fff83bdcfd6 _pthread_start + 331
9   libSystem.B.dylib             	0x00007fff83bdce89 thread_start + 13

Change History

comment:1 Changed 9 years ago by Russell Taylor

I think I've figured out what's going on. ParComponentFactory initialises its g_detPool static variable based on the maximum number of threads. This happens before whatever value is in MultiThreaded.MaxCores is set. When ComponentPool::getIndexInCache() then accesses things based on the (new) max number of threads, things go bad.

I spotted this when I noticed that if you set the number of threads via the OMP_NUM_THREADS environment variable, then things are fine.

comment:2 Changed 9 years ago by Russell Taylor

  • Status changed from new to accepted
  • Owner set to Russell Taylor

comment:3 Changed 9 years ago by Russell Taylor

In [14890]:

Indentation fixes only - no code changed. Re #3834.

comment:4 Changed 9 years ago by Russell Taylor

In [14897]:

Fix potential crash if number of OpenMP threads is increased above default via MultiThreaded.MaxCores in properties file. No performance impact seen on the 'usual' case (threads=number of processors). Re #3834.

comment:5 Changed 9 years ago by Russell Taylor

  • Status changed from accepted to verify
  • Resolution set to fixed

To test:

  • Set the MultiThreaded.MaxCores property to something larger than your number of physical processors
  • Load a WISH data file (or any one that will end up with parameterized detectors - LOQ's another, I think)
  • Run ConvertUnits to, e.g., wavelength
  • Before the fix, this was a guaranteed crash

comment:6 Changed 9 years ago by Vickie Lynch

  • Status changed from verify to verifying
  • Tester set to Vickie Lynch

comment:7 Changed 9 years ago by Vickie Lynch

  • Status changed from verifying to closed

I set MultiThreaded.MaxCores = 100 and ConvertUnits worked for WISH00017986. I have only 8 cores.

comment:8 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 4683

Note: See TracTickets for help on using tickets.