Ticket #10092 (closed: fixed)
Provide method to allow custom interfaces to execute a set of algorithms sequentially parallel with the UI
Reported by: | Dan Nixon | Owned by: | Dan Nixon |
---|---|---|---|
Priority: | major | Milestone: | Release 3.3 |
Component: | Framework | Keywords: | |
Cc: | samuel.jackson@… | Blocked By: | |
Blocking: | #7860, #10057, #10065, #10083, #10215 | Tester: | Martyn Gigg |
Description
Better support for running a set of algorithms which must be run sequentially (one depends on the data of another) separate from the UI thread in a custom interface.
Change History
comment:2 Changed 6 years ago by Dan Nixon
- Summary changed from Provide method to allow custom interfaces to execute a set of algorithms sequntially parallel with the UI to Provide method to allow custom interfaces to execute a set of algorithms sequentially parallel with the UI
comment:3 Changed 6 years ago by Dan Nixon
- Status changed from assigned to inprogress
Work in progress commit, started to add framework for seq. algos
Refs #10092
Changeset: 4fd4b2561fe6fbbfa4120d30a08500fdde009f43
comment:4 Changed 6 years ago by Dan Nixon
Added batch algorithm runner
Essentially a deque feeding an AlgorithmRunner
Refs #10092
Changeset: 63923f7d48afb875ad6d3f6bbddeaca467cfc015
comment:7 Changed 6 years ago by Dan Nixon
Fix build errors and actually use the value of the save checkbox
Refs #10092
Changeset: d0e40835d8e7cee826d19d77b068459244b3990a
comment:8 Changed 6 years ago by Dan Nixon
To enable testing of this I added a separate SaveNexus algorithm to Indirect Moments.
To test:
- Ensure logging level is set to at least Information
- Attempt to run Indirect Moments with the Save option enabled
- Note that the log indicates that the SofqwMoments and SaveNexus algorithms were triggered by BatchAlgorithmRunner
- Ensure that Moments and SaveNexus were completed correctly
comment:9 Changed 6 years ago by Dan Nixon
- Status changed from inprogress to verify
- Cc samuel.jackson@… added
- Resolution set to fixed
comment:10 Changed 6 years ago by Dan Nixon
- Status changed from verify to reopened
- Resolution fixed deleted
comment:11 Changed 6 years ago by Dan Nixon
- Status changed from reopened to inprogress
Replaced old implementation with one which should be unit testable
Refs #10092
Changeset: 6692668f64947554b5d2863d6906b605a6671042
comment:12 Changed 6 years ago by Dan Nixon
Added unit test, fixed workspace property issue
Refs #10092
Changeset: 08cab69df0e42a09efbfdb99a7086b5e3cdf4f80
comment:13 Changed 6 years ago by Dan Nixon
Fixed build error and static analysis warnings
Refs #10092
Changeset: ac908977ecef3f44f5c62c8eb57dee5fc315c543
comment:14 Changed 6 years ago by Dan Nixon
comment:15 Changed 6 years ago by Dan Nixon
- Status changed from inprogress to verify
- Resolution set to fixed
Addition to testing, there is a new unit test BatchAlgorithmRunnerTest, which should be passing on all platforms now.
comment:16 Changed 6 years ago by Martyn Gigg
- Status changed from verify to reopened
- Resolution fixed deleted
The new test, BatchAlgorithmRunnerTest, is good to have but it seems to be hanging on Mac and Windows 8, but unfortunately not all of the time.
http://builds.mantidproject.org/job/develop_incremental/label=osx-10.8-build/1411/console
The log seems to suggest that it had run the first algorithm but then was stuck for some reason. One thing I would suggest is not having such generic workspace names, e.g. ws1, in the tests. As the names go in to a singleton object that exists for each process then it maybe the case that some lock on a workspace is being hit somehow. I would suggest prefixing the names with the test name.
Lets then leave it on develop for a day to see if it hangs again.
comment:17 Changed 6 years ago by Dan Nixon
- Status changed from reopened to inprogress
comment:18 Changed 6 years ago by Dan Nixon
Included a catch all in algorithm runner
Should ensure that unit tests cannot hang if starting an algorithm throws a non standard exception
Refs #10092
Changeset: 4be794b85296bc35fc584a1f1094985b710e810e
comment:19 Changed 6 years ago by Dan Nixon
- Status changed from inprogress to verify
- Resolution set to fixed
comment:20 Changed 6 years ago by Dan Nixon
I'm assuming this issue was caused by code that waits for the algorithms to finish entering an infinite loop as en exception thrown when starting the next algorithm in the handler for the Poco notification, this should no longer be an issue with the added additional catch statement.
comment:21 Changed 6 years ago by Martyn Gigg
- Status changed from verify to reopened
- Resolution fixed deleted
I've just seen Windows 8 and Mac hang again on the BatchAlgorithmRunnerTest, unfortunately.
It attached to the process with gdb and all it said was that it was stuck constantly checking isExecuting() in the while loop within in the test. I tried again manually on the machine but the test was successful so it's something that is only happening when the system is under additional load I think.
Is there a particular reason why the algorithms themselves are executed asynchronously rather than having the whole BatchAlgorithmRunner object in a separate thread and have that execute each algorithm serially? That would greatly simplify the algorithm scehduling.
comment:22 Changed 6 years ago by Martyn Gigg
The BatchAlgorithmRunnerTest has hung again on Win8 & OS X. Can we temporarily disable the unit test (just comment the body out) while it is looked at?
comment:23 Changed 6 years ago by Dan Nixon
- Status changed from reopened to inprogress
Work in progress - run all algorithms on an ActiveMethod
Refs #10092
Changeset: da71d31e0014d8e2ae2c4698ea5593457d931ded
comment:24 Changed 6 years ago by Dan Nixon
comment:25 Changed 6 years ago by Dan Nixon
comment:26 Changed 6 years ago by Dan Nixon
New batch algorithm runner implmentation
Added basic unit tests, removed changes made to indirect moments
Refs #10092
Changeset: f634034d3016a62e2b52e09d3692c7e5697aab36
comment:27 Changed 6 years ago by Dan Nixon
comment:28 Changed 6 years ago by Dan Nixon
comment:30 Changed 6 years ago by Dan Nixon
This is a stack trace from Ubuntu 12.04:
(gdb) r BatchAlgorithmRunnerTest Starting program: /home/dan/mantid-build/bin/MantidQtAPITest BatchAlgorithmRunnerTest warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Running 3 tests Program received signal SIGSEGV, Segmentation fault. 0x00007ffff4426eaf in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007ffff4426eaf in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x000000000054f157 in Poco::MutexImpl::lockImpl (this=0x7ffff7de453b) at /usr/include/Poco/Mutex_POSIX.h:81 #2 0x000000000054f2fc in Poco::Mutex::lock (this=0x7ffff7de453b) at /usr/include/Poco/Mutex.h:174 #3 0x0000000000553098 in Poco::ScopedLock<Poco::Mutex>::ScopedLock (this=0x7fffffffd770, mutex=...) at /usr/include/Poco/ScopedLock.h:59 #4 0x00007ffff53f6eca in Poco::NotificationCenter::addObserver (this=0x7ffff7de4523, observer=...) at src/NotificationCenter.cpp:59 #5 0x00007ffff7b47757 in MantidQt::API::BatchAlgorithmRunner::executeBatch (this=0x7fffffffd800) at /home/dan/mantid/Code/Mantid/MantidQt/API/src/BatchAlgorithmRunner.cpp:61 #6 0x00000000005504c7 in BatchAlgorithmRunnerTest::test_basicBatch (this=0xa75000) at /home/dan/mantid/Code/Mantid/MantidQt/API/test/BatchAlgorithmRunnerTest.h:70 #7 0x0000000000550d43 in TestDescription_suite_BatchAlgorithmRunnerTest_test_basicBatch::runTest (this=0x82ec40) at /home/dan/mantid-build/MantidQt/API/test/BatchAlgorithmRunnerTest.cpp:26 #8 0x00000000005565bd in CxxTest::RealTestDescription::run (this=0x82ec40) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/RealDescriptions.cpp:96 #9 0x0000000000559b13 in CxxTest::TestRunner::runTest (this=0x7fffffffdbff, td=...) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:94 #10 0x0000000000559a05 in CxxTest::TestRunner::runSuite (this=0x7fffffffdbff, sd=...) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:79 #11 0x00000000005598a0 in CxxTest::TestRunner::runWorld (this=0x7fffffffdbff) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:63 #12 0x0000000000559663 in CxxTest::TestRunner::runAllTests (listener=...) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:24 #13 0x000000000055f6a2 in CxxTest::XUnitPrinter::run (this=0x7fffffffdcc0) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/XUnitPrinter.h:30 #14 0x0000000000560572 in CxxTest::Main<CxxTest::XUnitPrinter> (tmp=..., argc=2, argv=0x7fffffffe268) at /home/dan/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestMain.h:106 #15 0x00000000005555f8 in main (argc=2, argv=0x7fffffffe268) at /home/dan/mantid-build/MantidQt/API/test/MantidQtAPITest_runner.cpp:27
comment:31 Changed 6 years ago by Dan Nixon
And RHEL6:
(gdb) r BatchAlgorithmRunnerTest Starting program: /home/ibn32760/mantid-build/bin/MantidQtAPITest BatchAlgorithmRunnerTest [Thread debugging using libthread_db enabled] Running 3 testswarning: "/usr/lib/debug/opt/rh/mantidlibs/root/usr/lib64/libQtWebKit.so.4.9.3.debug": separate debug info file has no debug info terminate called after throwing an instance of 'Poco::SystemException' what(): System exception Program received signal SIGABRT, Aborted. 0x0000003528632925 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install gstreamer-0.10.29-1.el6.x86_64 gstreamer-plugins-base-0.10.29-2.el6.x86_64 libxml2-2.7.6-14.el6_5.2.x86_64 nss-pam-ldapd-0.7.5-18.2.el6_4.x86_64 qscintilla-2.4.6-2.el6.x86_64 sqlite-3.6.20-1.el6.x86_64 (gdb) bt #0 0x0000003528632925 in raise () from /lib64/libc.so.6 #1 0x0000003528634105 in abort () from /lib64/libc.so.6 #2 0x0000003532ebea5d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #3 0x0000003532ebcbe6 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x0000003532ebcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x0000003532ebcd0e in __cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00007ffff0edd9bf in Poco::NotificationCenter::addObserver(Poco::AbstractObserver const&) () from /usr/lib64/libPocoFoundationd.so.11 #7 0x00007ffff7d6348b in MantidQt::API::BatchAlgorithmRunner::executeBatch (this=0x7fffffffd7f0) at /home/ibn32760/mantid/Code/Mantid/MantidQt/API/src/BatchAlgorithmRunner.cpp:61 #8 0x000000000055071f in BatchAlgorithmRunnerTest::test_basicBatch (this=0x1298000) at /home/ibn32760/mantid/Code/Mantid/MantidQt/API/test/BatchAlgorithmRunnerTest.h:70 #9 0x000000000055103d in TestDescription_suite_BatchAlgorithmRunnerTest_test_basicBatch::runTest (this=0x830840) at /home/ibn32760/mantid-build/MantidQt/API/test/BatchAlgorithmRunnerTest.cpp:26 #10 0x00000000005568cb in CxxTest::RealTestDescription::run (this=0x830840) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/RealDescriptions.cpp:96 #11 0x0000000000559f5f in CxxTest::TestRunner::runTest (this=0x7fffffffdc0f, td=...) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:94 #12 0x0000000000559e45 in CxxTest::TestRunner::runSuite (this=0x7fffffffdc0f, sd=...) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:79 #13 0x0000000000559cfd in CxxTest::TestRunner::runWorld (this=0x7fffffffdc0f) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:63 #14 0x0000000000559a67 in CxxTest::TestRunner::runAllTests (listener=...) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestRunner.h:24 #15 0x000000000055ff66 in CxxTest::XUnitPrinter::run (this=0x7fffffffdcd0) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/XUnitPrinter.h:30 #16 0x0000000000560e0d in CxxTest::Main<CxxTest::XUnitPrinter> (tmp=..., argc=2, argv=0x7fffffffe268) at /home/ibn32760/mantid/Code/Mantid/TestingTools/cxxtest/cxxtest/TestMain.h:106 #17 0x0000000000555979 in main (argc=2, argv=0x7fffffffe268) at /home/ibn32760/mantid-build/MantidQt/API/test/MantidQtAPITest_runner.cpp:27
comment:32 Changed 6 years ago by Dan Nixon
comment:33 Changed 6 years ago by Dan Nixon
comment:34 Changed 6 years ago by Dan Nixon
- Status changed from inprogress to verify
- Resolution set to fixed
comment:35 Changed 6 years ago by Dan Nixon
New to test (replaces comment 8 and 15):
- Ensure unit test (BatchAlgorithmRunnerTest) is passing reliably
- Force an error on the Indirect Data Reduction > Moments interface (set EMin and EMax to the same value), this is a crude test to ensure the Qt signals are emitted property, you should get warnings in the Results Log from the runner and an alert dialog box
It may be worth someone testing this on Ubuntu 12.04 (since there is no longer a Jenkins job) as it was previously failing on this platform (in a similar way as OSX and RHEL6 so I assume it should be fixed tested and works as intended)
comment:36 Changed 6 years ago by Dan Nixon
- Status changed from verify to reopened
- Resolution fixed deleted
comment:37 Changed 6 years ago by Dan Nixon
- Blocking 7860 added
(In #7860) Batch runner ticket replaces the standard algorithm runner, need Qt signal form either in the UI code.
comment:38 Changed 6 years ago by Dan Nixon
- Status changed from reopened to inprogress
More unit tests, fixed Poco notification issue
Remove observer when algorithm fails, added unit tests for queue length.
Refs #10092
Changeset: 765982d8b85c156871aaef70827a5d4ea218c478
comment:39 Changed 6 years ago by Dan Nixon
Merge branch 'feature/10092_sequential_async_algorithm_runner' into develop
Conflicts:
Code/Mantid/MantidQt/CustomInterfaces/src/IndirectDataReductionTab.cpp
Refs #10092
Changeset: 21135669ebf4d8fe503dffa453c59ac7e858d27d
comment:40 Changed 6 years ago by Dan Nixon
comment:41 Changed 6 years ago by Dan Nixon
- Status changed from inprogress to verify
- Resolution set to fixed
comment:42 Changed 6 years ago by Martyn Gigg
- Status changed from verify to verifying
- Tester set to Martyn Gigg
comment:43 Changed 6 years ago by Martyn Gigg
I haven't seen the build servers hang any more while executing the unit test so it would appear that the threading issue has been resolved.
comment:44 Changed 6 years ago by Martyn Gigg
- Status changed from verifying to closed
Merge remote-tracking branch 'origin/feature/10092_sequential_async_algorithm_runner'
Conflicts:
Code/Mantid/Framework/PythonInterface/plugins/algorithms/WorkflowAlgorithms/SofQWMoments.py
Full changeset: bed7ab69fb911493abc55c5863cfe8f7987101d9
comment:45 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 10934