Ticket #1538 (closed: fixed)

Opened 10 years ago

Last modified 5 years ago

Refactor Load algorithm

Reported by: Mathieu Doucet Owned by: Sofia Antony
Priority: major Milestone: Iteration 26
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: Tester: Roman Tolchenov

Description

The current Load algorithm looks for the data file extension to decide which loader to use. Hard-coded try-except blocks are used when multiple loaders can read the same extension (for example .xml). It would be best to have a registration system where a given loader can be associated with various extensions. The Loader would pull out the list of loaders for the extension of the file to be read and try them out in order. Here's a few possible requirements:

  • The associations needs to be modifiable from outside Loader(), so that a given piece of code can decide which loader to try first. For instance, a general ASCII could read any 2-column ASCII file, but a specific loader may be able to parse meta-data. Since the best loader to use will depend on the analysis, the choice of which loader to use shouldn't be hard-coded in Loader().
  • A common problem is to have users at different facilities naming the same data format with different extensions. An n-column ASCII may be called .txt, .dat, .blah. It would be nicer to be able to create the extension-to-loader associations through a config file.
  • Error messages should only appear when a file couldn't be read by any loader (the loaders that we tried but couldn't read the file should fail silently - you don't want to see the error for every single loader that couldn't load the file).

Change History

comment:1 Changed 10 years ago by Nick Draper

  • Status changed from new to assigned
  • Owner set to Roman Tolchenov

I agree mostly with this.

My plan is that all dataloading algorithms should inherit from a common base class (that inherits from algorithm). This will have two abstract methods

  • bool QuickFileCheck(filepath) -this will return true/false if it wants to load the file based on the checks it can do without opening it.
  • int FileCheck(filepath) -this will return a value 0-100 of how much it wants to load the file based on the checks it can do including opening it.

Load will iterate over all dataloading algorithms (register into an additional factory) calling QuickFileCheck, those that return true will have FileCheck called. Loading will then be attempted in the order of the preference values.

comment:2 Changed 10 years ago by Nick Draper

  • Owner changed from Roman Tolchenov to Sofia Antony
  • Component set to Mantid

After discussion with Freddie, he suggested adding passing a buffer of the 1st 100 bytes to each algorithm as part of the QuickFileCheck call. This would allow better chacking without adversely affecting performance.

comment:3 Changed 10 years ago by Sofia Antony

  • Status changed from assigned to accepted

comment:4 Changed 10 years ago by Sofia Antony

(In [7963]) re #1538 - Created two new classes IDataFileChecker and LoadAlgorithmFactory.IDataFileChecker is an interface class which provides two abstract methods to check the file by opening the file and reading few lines. Now all the data file loading algorithms are inherited from this class. LoadAlgorithmFactory is responsible for creating the shared pointer to loading algorithms.

comment:5 Changed 10 years ago by Russell Taylor

(In [7967]) Fix linux build. Re #1538.

comment:6 Changed 10 years ago by Sofia Antony

(In [7968]) re #1538 - removed some redundant lines

comment:7 Changed 10 years ago by Sofia Antony

(In [8009]) re #1538 - Refactored generic load algorithm.Implemented quick file check and file check in all load algorithms.

comment:8 Changed 10 years ago by Sofia Antony

(In [8010]) re #1538 - fix for unit test failure.Forgot to checkin this file.

comment:9 Changed 10 years ago by Sofia Antony

(In [8020]) re #1538 - fixed some issues with ascii file loading

comment:10 Changed 10 years ago by Sofia Antony

(In [8059]) re #1538 - fixed some issues with load algorithms

comment:11 Changed 10 years ago by Sofia Antony

(In [8068]) re #1538 - file check for load raw now looks at the first 100 bytes of the raw file.

comment:12 Changed 10 years ago by Nick Draper

  • Milestone changed from Iteration 26 to Iteration 27

Bulk move of tickets to iteration 27, if your ticket is essential for Iteration 26 then move it back.

comment:13 Changed 10 years ago by Nick Draper

  • Milestone changed from Iteration 27 to Iteration 26

comment:14 Changed 10 years ago by Sofia Antony

  • Status changed from accepted to verify
  • Resolution set to fixed

comment:15 Changed 10 years ago by Roman Tolchenov

  • Status changed from verify to verifying
  • Tester set to Roman Tolchenov

comment:16 Changed 10 years ago by Roman Tolchenov

  • Status changed from verifying to closed

Tried with different data types, it worked.

comment:17 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 2385

Note: See TracTickets for help on using tickets.