Ticket #1538 (closed: fixed)
Refactor Load algorithm
Reported by: | Mathieu Doucet | Owned by: | Sofia Antony |
---|---|---|---|
Priority: | major | Milestone: | Iteration 26 |
Component: | Mantid | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Tester: | Roman Tolchenov |
Description
The current Load algorithm looks for the data file extension to decide which loader to use. Hard-coded try-except blocks are used when multiple loaders can read the same extension (for example .xml). It would be best to have a registration system where a given loader can be associated with various extensions. The Loader would pull out the list of loaders for the extension of the file to be read and try them out in order. Here's a few possible requirements:
- The associations needs to be modifiable from outside Loader(), so that a given piece of code can decide which loader to try first. For instance, a general ASCII could read any 2-column ASCII file, but a specific loader may be able to parse meta-data. Since the best loader to use will depend on the analysis, the choice of which loader to use shouldn't be hard-coded in Loader().
- A common problem is to have users at different facilities naming the same data format with different extensions. An n-column ASCII may be called .txt, .dat, .blah. It would be nicer to be able to create the extension-to-loader associations through a config file.
- Error messages should only appear when a file couldn't be read by any loader (the loaders that we tried but couldn't read the file should fail silently - you don't want to see the error for every single loader that couldn't load the file).
Change History
comment:1 Changed 10 years ago by Nick Draper
- Status changed from new to assigned
- Owner set to Roman Tolchenov
comment:2 Changed 10 years ago by Nick Draper
- Owner changed from Roman Tolchenov to Sofia Antony
- Component set to Mantid
After discussion with Freddie, he suggested adding passing a buffer of the 1st 100 bytes to each algorithm as part of the QuickFileCheck call. This would allow better chacking without adversely affecting performance.
comment:4 Changed 10 years ago by Sofia Antony
(In [7963]) re #1538 - Created two new classes IDataFileChecker and LoadAlgorithmFactory.IDataFileChecker is an interface class which provides two abstract methods to check the file by opening the file and reading few lines. Now all the data file loading algorithms are inherited from this class. LoadAlgorithmFactory is responsible for creating the shared pointer to loading algorithms.
comment:10 Changed 10 years ago by Sofia Antony
comment:11 Changed 10 years ago by Sofia Antony
comment:12 Changed 10 years ago by Nick Draper
- Milestone changed from Iteration 26 to Iteration 27
Bulk move of tickets to iteration 27, if your ticket is essential for Iteration 26 then move it back.
comment:14 Changed 10 years ago by Sofia Antony
- Status changed from accepted to verify
- Resolution set to fixed
comment:15 Changed 10 years ago by Roman Tolchenov
- Status changed from verify to verifying
- Tester set to Roman Tolchenov
comment:16 Changed 10 years ago by Roman Tolchenov
- Status changed from verifying to closed
Tried with different data types, it worked.
comment:17 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 2385
I agree mostly with this.
My plan is that all dataloading algorithms should inherit from a common base class (that inherits from algorithm). This will have two abstract methods
Load will iterate over all dataloading algorithms (register into an additional factory) calling QuickFileCheck, those that return true will have FileCheck called. Loading will then be attempted in the order of the preference values.