Ticket #5464 (reopened)

Opened 8 years ago

Last modified 5 years ago

[FileFinder] - Make more efficient for large ranges of files in large directories

Reported by: Peter Parker Owned by: Peter Parker
Priority: major Milestone: Release 2.2
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: Tester: Owen Arnold

Description

A reccuring problem for some users is being unable to type in a range of files into an MWRunFiles widget without suffering extreme slowdown on non-Windows machines.

This is a direct result of FileFinder's current globbing procedure, and is proportional to the number of files in the directory as well as the number of files being searched for.

Some users have been known to store 70,000+ (and rising) files in a single folder, from which they would like to access hundreds of (relatively small) runs using a single call. In this case, slowdown is extremely detrimental to the user experience, even when taking into account recent work done as part of #5036 and #5207.

This needs to be solved permanently.

Adding a performance test first will be necessary to allow measurement of any improvements made, as well as to stave off regression in the future.

Change History

comment:1 Changed 8 years ago by Peter Parker

Refs #5464 - Adding performance test.

Tried to keep test time down by using fopen and fclose instead of Poco, and keeping the number of files generated to 10,000.

Changeset: c741880909d587ac603e5ac29b3f3c385e4674dd

comment:2 Changed 8 years ago by Peter Parker

Refs #5464 - Fix non-win builds.

Changeset: 84c3e20229eb692c518130697749013004d393a7

comment:3 Changed 8 years ago by Peter Parker

Refs #5464. Reverts push of performance test.

There's some linux path quirk that I'm not accounting for here, and I dont have enough time to fix it today, so reverting.

This reverts commit 84c3e20229eb692c518130697749013004d393a7. This reverts commit c741880909d587ac603e5ac29b3f3c385e4674dd.

Changeset: 57c4dd26f2b658191c3699ec4c0be951ce27f415

comment:4 Changed 8 years ago by Peter Parker

Refs #5464 - Recommit performance test.

Hopefully fixed directory quirk. Also added alternating cases of instrument and extension when generating files.

Changeset: e5d90b8f181e6fa3932c896537a02dc9b1de2004

comment:5 Changed 8 years ago by Peter Parker

Refs #5464 - Adding performance test.

Tried to keep test time down by using fopen and fclose instead of Poco, and keeping the number of files generated to 10,000.

Changeset: c741880909d587ac603e5ac29b3f3c385e4674dd

comment:6 Changed 8 years ago by Peter Parker

Refs #5464 - Fix non-win builds.

Changeset: 84c3e20229eb692c518130697749013004d393a7

comment:7 Changed 8 years ago by Peter Parker

Refs #5464. Reverts push of performance test.

There's some linux path quirk that I'm not accounting for here, and I dont have enough time to fix it today, so reverting.

This reverts commit 84c3e20229eb692c518130697749013004d393a7. This reverts commit c741880909d587ac603e5ac29b3f3c385e4674dd.

Changeset: 57c4dd26f2b658191c3699ec4c0be951ce27f415

comment:8 Changed 8 years ago by Peter Parker

Refs #5464 - Recommit performance test.

Hopefully fixed directory quirk. Also added alternating cases of instrument and extension when generating files.

Changeset: e5d90b8f181e6fa3932c896537a02dc9b1de2004

comment:9 Changed 8 years ago by Peter Parker

Refs #5464 - Adding performance test.

Tried to keep test time down by using fopen and fclose instead of Poco, and keeping the number of files generated to 10,000.

Changeset: c741880909d587ac603e5ac29b3f3c385e4674dd

comment:10 Changed 8 years ago by Peter Parker

Refs #5464 - Fix non-win builds.

Changeset: 84c3e20229eb692c518130697749013004d393a7

comment:11 Changed 8 years ago by Peter Parker

Refs #5464. Reverts push of performance test.

There's some linux path quirk that I'm not accounting for here, and I dont have enough time to fix it today, so reverting.

This reverts commit 84c3e20229eb692c518130697749013004d393a7. This reverts commit c741880909d587ac603e5ac29b3f3c385e4674dd.

Changeset: 57c4dd26f2b658191c3699ec4c0be951ce27f415

comment:12 Changed 8 years ago by Peter Parker

Refs #5464 - Recommit performance test.

Hopefully fixed directory quirk. Also added alternating cases of instrument and extension when generating files.

Changeset: e5d90b8f181e6fa3932c896537a02dc9b1de2004

comment:13 Changed 8 years ago by Peter Parker

Refs #5464 - Change to FileFinder.

Change getPath to try the various combinations of extensions / filenames directly, before any globbing is done.

Added a default inst unit test so we can be sure that calling findRuns("100-200") works as expected.

Changeset: 441803c8f1459275144caeba22e8690e150cb4d2

comment:14 Changed 8 years ago by Peter Parker

Refs #5464 - Change to FileFinder.

Change getPath to try the various combinations of extensions / filenames directly, before any globbing is done.

Added a default inst unit test so we can be sure that calling findRuns("100-200") works as expected.

Changeset: 441803c8f1459275144caeba22e8690e150cb4d2

comment:15 Changed 8 years ago by Peter Parker

Refs #5464 - Remove stray call to std::cout.

Changeset: 6ca299c57e84b435d2ea58f833bb966c76006db8

comment:16 Changed 8 years ago by Peter Parker

Refs #5464 - FileFinder.findRuns() to throw on missing file.

Appropriate changes made to files that call the function, to ensure we still report or ignore missing files in the same manner as done previously.

Changeset: 66f3ee14c49a6ea40250db77ba797f2719d9babb

comment:17 Changed 8 years ago by Peter Parker

  • Status changed from new to accepted

comment:18 Changed 8 years ago by Peter Parker

  • Status changed from accepted to verify
  • Resolution set to fixed

Seeing a huge decrease in time spent looking for files using C2E, so closing this ticket.

To test manually, it might be easiest to generate a large number of dummy (empty) run files and see how long it takes to find a large range of these using FileFinder.findRuns(). We do now have performance tests to cover this though.

Further, I made some changes to the parts of Mantid that were calling findRuns(), as it now throws when a file is missing. These changes (which can be found here) should be looked at.

comment:19 Changed 8 years ago by Peter Parker

Refs #5464 - Catch previously unhandled Poco exception.

If it is thrown, the file basically doesn't exist and it is safe to carry on as normal.

While here, wrapping another previously unhandled exception within findRun().

Changeset: e5955f416512ad21d02fc70f7d89b351f71e03f6

comment:20 Changed 8 years ago by Peter Parker

Refs #5464 - Adding performance test.

Tried to keep test time down by using fopen and fclose instead of Poco, and keeping the number of files generated to 10,000.

Changeset: c741880909d587ac603e5ac29b3f3c385e4674dd

comment:21 Changed 8 years ago by Peter Parker

Refs #5464 - Fix non-win builds.

Changeset: 84c3e20229eb692c518130697749013004d393a7

comment:22 Changed 8 years ago by Peter Parker

Refs #5464. Reverts push of performance test.

There's some linux path quirk that I'm not accounting for here, and I dont have enough time to fix it today, so reverting.

This reverts commit 84c3e20229eb692c518130697749013004d393a7. This reverts commit c741880909d587ac603e5ac29b3f3c385e4674dd.

Changeset: 57c4dd26f2b658191c3699ec4c0be951ce27f415

comment:23 Changed 8 years ago by Peter Parker

Refs #5464 - Recommit performance test.

Hopefully fixed directory quirk. Also added alternating cases of instrument and extension when generating files.

Changeset: e5d90b8f181e6fa3932c896537a02dc9b1de2004

comment:24 Changed 8 years ago by Peter Parker

Refs #5464 - Change to FileFinder.

Change getPath to try the various combinations of extensions / filenames directly, before any globbing is done.

Added a default inst unit test so we can be sure that calling findRuns("100-200") works as expected.

Changeset: 441803c8f1459275144caeba22e8690e150cb4d2

comment:25 Changed 8 years ago by Peter Parker

Refs #5464 - Remove stray call to std::cout.

Changeset: 6ca299c57e84b435d2ea58f833bb966c76006db8

comment:26 Changed 8 years ago by Peter Parker

Refs #5464 - FileFinder.findRuns() to throw on missing file.

Appropriate changes made to files that call the function, to ensure we still report or ignore missing files in the same manner as done previously.

Changeset: 66f3ee14c49a6ea40250db77ba797f2719d9babb

comment:27 Changed 8 years ago by Owen Arnold

  • Status changed from verify to verifying
  • Tester set to Owen Arnold

comment:28 Changed 8 years ago by Owen Arnold

  • Status changed from verifying to closed

According to the performance test report, moving to fopen and fclose, yielded a massive reduction in processing time.

comment:29 Changed 8 years ago by Mathieu Doucet

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:31 Changed 8 years ago by Mathieu Doucet

Note: A try-except block should never just say "pass" as a way to deal with an exception.

comment:32 Changed 8 years ago by Mathieu Doucet

Re #5464 Revert back refm changes

Changeset: e617a2351742435a965f41d6943f82427cecb71e

comment:33 Changed 8 years ago by Mathieu Doucet

Re #5464 Revert back refm changes

Changeset: e617a2351742435a965f41d6943f82427cecb71e

comment:34 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 6310

Note: See TracTickets for help on using tickets.