Ticket #9208 (closed: fixed)
Have GUI Timeout Instead of Hang When ICAT is Down
Reported by: | Peter Parker | Owned by: | Jay Rainey |
---|---|---|---|
Priority: | critical | Milestone: | Release 3.2 |
Component: | Framework | Keywords: | ICAT |
Cc: | Blocked By: | ||
Blocking: | #8538, #9186, #9223 | Tester: | Martyn Gigg |
Description
Given that ICAT is a remote service which we can't necessarily assume will be available at any given time, it would be good to wrap any calls to the Catalog* algorithms from the GUI in their own separate threads, which would then timeout after a certain period of time. We could specify a sensible default timeout value in Mantid's property file, and maybe even have a timeout property on the algorithms themselves.
Currently, if ICAT is not available (due to network problems, service outages or otherwise) then Mantid will hang, and users are forced to lose any unsaved changes since they have to kill the Mantid process and restart to keep on working.
Change History
comment:2 Changed 7 years ago by Jay Rainey
Having discussed this further with Martyn, we agreed on the following solution:
- Move catalog methods from MantidUI (e.g. isValidCatalogLogin & catalogPublishDialog) to CatalogHelper.
- Make use of CatalogHelper for the calls in ApplicationWindow of the moved methods.
- Update executeAsynchronously in CatalogHelper to have a timeOut parameter, and make use of it in the implementation.
- Update executeAsynchronouslys signature to return a string. This will be an error/success message for the user, e.g. Execution of algorithm N has timed out, ICAT appears to be down.
- Set a timeout property in Mantid.properties.template (named Catalog.timeoutValue) that:
- Has a default value of 30.
- If set to negative value then timeout is set to 0 (e.g. turned off)
- A comment explaining why the timeout value is needed.
comment:3 Changed 7 years ago by Jay Rainey
- Status changed from assigned to inprogress
Added implementation to new CatalogHelper methods. Refs #9208.
Changeset: 52e9382434ee181e33dc49c72e01ade3d9ea60d3
comment:4 Changed 7 years ago by Jay Rainey
Call catalogHelper methods and not MantidUI. Refs #9208.
Changeset: 5adda878e06a8549e1ad703bf24426fb5457a95a
comment:5 Changed 7 years ago by Jay Rainey
Remove unused methods. Refs #9208.
Changeset: 4cef12712a255f5a57f3ad07b74552ea217c9b4e
comment:6 Changed 7 years ago by Jay Rainey
Add catalog timeout value to mantid properties. Refs #9208.
Changeset: 1346fc28ddad8927b1d3fd0677ff1ef0b57df0af
comment:7 Changed 7 years ago by Jay Rainey
Add timout property to CatalogHelper. Refs #9208.
Changeset: 2dd4d5c1234e86fcc68405e8694a4fdd571ec387
comment:8 Changed 7 years ago by Jay Rainey
Display ICAT error on GUI. Refs #9208.
Changeset: ce2437c6f3b1d857cdd4a8115755290310a3af11
comment:9 Changed 7 years ago by Jay Rainey
Refactor CatalogPublish. Refs #9208.
Changeset: a8c82d7e06fbe32e0700d05366c55323210c0704
comment:10 Changed 7 years ago by Jay Rainey
Merge remote-tracking branch 'origin/master' into feature/9208_icat_gui_hang_fix. Refs #9208.
Conflicts:
- Code/Mantid/MantidQt/CustomDialogs/inc/MantidQtCustomDialogs/CatalogPublishDialog.h
- Code/Mantid/MantidQt/CustomDialogs/src/CatalogPublishDialog.cpp
Changeset: bac2d4cb25a72b6030a337df5b96bb80626af0a0
comment:11 Changed 7 years ago by Jay Rainey
Add check to prevent crash. Refs #9208.
Changeset: 6de07c2e5b5e0d50f4b87d043fed653a92e7c163
comment:12 Changed 7 years ago by Jay Rainey
Merge branch 'feature/9208_icat_gui_hang_fix' into develop. Refs #9208.
Conflicts:
- Code/Mantid/Framework/DataHandling/src/SaveANSTOAscii.cpp
- Code/Mantid/Framework/PythonInterface/plugins/algorithms/CalibrateRectangularDetectors.py
Changeset: 69d8fa8588e374393f898388c39aa34dbdb13c09
comment:13 Changed 7 years ago by Jay Rainey
Add windows DLL include. Refs #9208.
Changeset: 9cd147baa4b2490a9ea7976d7b34947171621302
comment:14 Changed 7 years ago by Jay Rainey
- Status changed from inprogress to verify
- Resolution set to fixed
Add back multiple facility support. Refs #9208.
Changeset: 3ff7c488f87ae2fb08699f17660613ebd96de31f
comment:15 Changed 7 years ago by Jay Rainey
To test
You need to verify that all the catalog algorithms are now in separate threads, and no longer hang Mantid when they are performed without a network connection. Below are some examples that should be tested before (to verify the hang occurred) and after (to verify I have fixed it):
Verify reachable end-point
- Turn off your Internet connection.
- Attempt to log into the catalog (Catalog -> Login)
- An error will be thrown: ICAT appears to be offline. Please check your connection or report this issue.
Execute algorithm without Internet connection
- Log into the catalog.
- Disable your Internet connection.
- Click Publish (it executes myData). The error message will be displayed on the GUI.
- Click Search. The same error message will be logged to Results log.
Verify unreachable end-point
The SNS end-point is behind a firewall, so is not accessible at ISIS. Previously, if you attempted to log into SNS then Mantid would die. You need to verify that this is no longer the case by:
- Attempt to log into SNS (Catalog -> Login and select SNS as facility)
- Mantid will now execute the algorithm asynchronously, and after a short period of time (30 seconds), the same error message as above will be thrown.
comment:17 Changed 7 years ago by Jay Rainey
- Blocking 8538 added
(In #8538) To add DOI generation support for publishing I need to:
- Generate gSoap files against the DOI generator end-point, which are:
- https://data4.isis.stfc.ac.uk/doi/DOIService?wsdl Develop
- https://sig-03.esc.rl.ac.uk:8181/doi/DOIService?wsdl Production
- Add the end-point (dev for now until changes below are made) to Facilities.xml.
- Update ICatalogInfoService to include a new method (generateDOIForDatafile) that makes use of the gSoap methods above (specifically: registerDatafileDOI) and returns the generated DOI (to output on LOG).
- Add and implement the functionality of the new method above in ICat4Catalog.
- This includes obtaining the end-point from Facilities.xml t oaccomdate other facilities using th
- Add two new properties to CatalogPublish:
- InvestigationID, "",`The database row ID of the investigation that you want to publish to. (Note: this is required to generate a DOI.)
- GenerateDOI, true, "Generates a DOI for the datafile being published. Once a DOI is generated the datafile will be made public."
- Add logic to exec in CatalogPublish to make use of the properties and call generateDOIForDatafile from the catalog. Note: if a file fails to upload (e.g. already exists etc...) a DOI must not be generated.
- Update CatalogPublishDialog to have a checkbox for GenerateDOI. InvestigationID will be hidden on the GUI and obtained from the table generated getPublishInvestigations (once #9223 is complete).
- The generateDOI (make public) checkbox on the GUI should be checked by default.
- To prevent users accidentally generating DOIs that they may not have wanted to, a QMessageBox should be generated IF the user has checked generateDOI to confirm this choice.
comment:18 Changed 7 years ago by Martyn Gigg
- Status changed from verify to verifying
- Tester set to Martyn Gigg
comment:19 Changed 7 years ago by Martyn Gigg
- Status changed from verifying to reopened
- Resolution fixed deleted
So everything seems okay as long as the connection goes down when your not in the middle of a catalog action, e.g search/publish. If you perform a search that takes a while then disconnect , you get a the algorithm stuck without being able to be cancelled.
The GUI remains responsive but the search never stops.
comment:20 Changed 7 years ago by Jay Rainey
- Status changed from reopened to inprogress
Remove use of atoi. Refs #9208.
Changeset: 6fb811d646ccb76c8d0de161a58aee1164947f1f
comment:21 Changed 7 years ago by Jay Rainey
Add timeout to catalog (gSoap) method calls. Refs #9208.
Changeset: 94a2a7aa26cbb799888194a2186e4d59d7f634d1
comment:22 Changed 7 years ago by Jay Rainey
Merge branch 'feature/9208_icat_gui_hang_fix' into develop
Changeset: ca3714aadab53f83165037c62c905943c6096283
comment:23 Changed 7 years ago by Jay Rainey
- Status changed from inprogress to verify
- Resolution set to fixed
comment:24 Changed 7 years ago by Jay Rainey
I've added a timeout call to the gSoap client which will address the problem noted by Martyn in comment:19.
To test
- Perform a long search (e.g. set dates to: 01/01/1990 and 01/01/2014)
- Disconnect from the Internet.
- Wait 30 seconds and you will get the same error message as produced in comment:15.
comment:26 Changed 7 years ago by Martyn Gigg
- Status changed from verifying to reopened
- Resolution fixed deleted
That seems much better now and times out as I would expect.
One minor point now is that I don't think we need to maxTime stuff in CatalogHelper::executeAsynchronously as it's all handled under the hood so that can be left just doing the processEvents stuff.
comment:27 Changed 7 years ago by Jay Rainey
- Status changed from reopened to inprogress
Remove unnecessary variable. Refs #9208.
Changeset: 73da39d33f2d1ace753e5efef74500125ea58de8
comment:28 Changed 7 years ago by Jay Rainey
Merge branch 'feature/9208_icat_gui_hang_fix' into develop
- feature/9208_icat_gui_hang_fix: Remove unnecessary variable. Refs #9208.
Changeset: 178945cc402d5cb1a047e67e3bd106c3377ce835
comment:29 Changed 7 years ago by Jay Rainey
Verify that the minor change made in this commit addresses the suggestion made by Martyn in comment:26.
comment:30 Changed 7 years ago by Jay Rainey
- Status changed from inprogress to verify
- Resolution set to fixed
comment:32 follow-up: ↓ 34 Changed 7 years ago by Martyn Gigg
- Status changed from verifying to reopened
- Resolution fixed deleted
The commit in comment 28 doesn't quite address what I was saying in comment 26. As the timeout is now handled further down then I don't think we need the timer at all in executeAsynchronously so I think that function can just return to being.
Poco::ActiveResult<bool> result(algorithm->executeAsync()); while (!result.available()) { QCoreApplication::processEvents(); }
comment:33 Changed 7 years ago by Jay Rainey
- Status changed from reopened to inprogress
Remove maxTime as it's now handled in catalog. Refs #9208.
Changeset: a6ca05d5ee1dff17e17a1589b5e64fd02237309d
comment:34 in reply to: ↑ 32 Changed 7 years ago by Jay Rainey
- Status changed from inprogress to verify
- Resolution set to fixed
Verify that maxTime has been removed from CatalogHelper as mentioned by Martyn in comment:32. This was removed as gSoap takes care of this for us.
comment:36 Changed 7 years ago by Jay Rainey
Remove unnecessary timer. Refs #9208.
Changeset: 0528095e0f06179ae3b6e09c1bcf5655b06cc023
comment:37 Changed 7 years ago by Martyn Gigg
- Status changed from verifying to closed
Merge remote-tracking branch 'origin/feature/9208_icat_gui_hang_fix'
Full changeset: 602d6bc241792007abba9c34e0490959c5b58bfd
comment:38 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 10051