Ticket #7677 (closed: fixed)

Opened 7 years ago

Last modified 5 years ago

Reduction for several inelastic instruments hangs part way through reduction

Reported by: Martyn Gigg Owned by: Martyn Gigg
Priority: blocker Milestone: Release 2.6.1
Component: Framework Keywords: PatchCandidate
Cc: Blocked By:
Blocking: #7718 Tester: Wenduo Zhou

Description

This happens with release 2.6.

  • Start MantidPlot
  • Open Interfaces->Convert To Energy and select TOSCA
  • Browse for a file, e.g. TSC11453.raw in systemtests/Data
  • For "Rebin Steps", uncheck "do not rebin", select multiple and enter: -2.5,0.015,3,-0.005,1000
  • Click "Run Energy Transfer"

Change History

comment:1 Changed 7 years ago by Martyn Gigg

  • Status changed from new to inprogress

comment:2 Changed 7 years ago by Martyn Gigg

The problem appears to be with the fix for #7583.

At the point the tree is updated the createInfoNode virtual method on Workspace is called. This in turn calls addInfoNodeTo method, which currently holds a lock while the information is gathered. If the object is actually a WorkspaceGroup then a separate lock is held by WorkspaceGroup in createInfoNode and this then deadlocks with the one trying to be held by addInfoNodeTo.

comment:3 Changed 7 years ago by Martyn Gigg

Remove mutex lock from WorkspaceGroup::createInfoNode

This is called by the base Workspace::addInfoNodeTo method when the object is a WorkspaceGroup. The base method locks the workspace RWLock but then for a group the overloaded createInfoNode was then also locking a mutex that causes a dead-lock while both locks wait for each other to release. Refs #7677

Changeset: 59477d7a6745832606b84ef724b707ae3022de49

comment:4 Changed 7 years ago by Martyn Gigg

Fix a minor bug in naming of group members in ChopData.

The ADS tries to add the members of a group to the ADS if they don't have a name. It's naming scheme is different to what ChopData does and so you effectively end up with an object having two names. The dock then reports workspaces that don't exist and you can't delete them. Refs #7677

Changeset: cc0bf307301d176a2082637cd30fd08996681a00

comment:5 Changed 7 years ago by Nick Draper

  • Milestone changed from Release 3.0 to Release 2.6.1

comment:6 Changed 7 years ago by Nick Draper

  • Summary changed from Convert to Energy for TOSCA hangs part way through reduction to Reduction for several inelastic instruments hangs part way through reduction

comment:7 Changed 7 years ago by Martyn Gigg

Handle locking of run/sample in ExperimentInfo.

This way we can use a more general mutex rather than the specific critical section used in cow_ptr. Also move some calls that access the object directly to use the accessor methods that should prevent access to the object while it is being copied. Refs #7677

Changeset: de17341707935da61e5513f74e5dde9a7a93f9b7

comment:8 Changed 7 years ago by Martyn Gigg

  • Status changed from inprogress to verify
  • Resolution set to fixed

Branch: bugfix/7677_segfault_c2e

Tester: The fix here has been to protect the methods that use a cow_ptr with a mutex so that when they block when copying the cow_ptr.

All scripts MUST be run in MantidPlot as the error is with populating the dock.

At minimum the steps in the description should work. Also rerun the script from #7583 to confirm that it is still okay.

Also, check this LET reduction script (data is in the systemtests/Data directory)

dgreduce.setup('LET')
white_run = 'LET00005545.raw'
sample_run = 'LET00006278.nxs'
ei = 7.0
ebin = [-1,0.002,0.95]
map_file = 'rings_103'
white_ws = 'wb_wksp'
LoadRaw(Filename=white_run,OutputWorkspace=white_ws)
sample_ws = 'w1'
monitors_ws = sample_ws + '_monitors'
LoadEventNexus(Filename=sample_run,OutputWorkspace=sample_ws,
               SingleBankPixelsOnly='0',LoadMonitors='1',
               MonitorsAsEvents='1')
ConjoinWorkspaces(InputWorkspace1=sample_ws, InputWorkspace2=monitors_ws)

energy = ei
emin = 0.2*energy   #minimum energy is with 80% energy loss
lam = (81.81/energy)**0.5
lam_max = (81.81/emin)**0.5
tsam = 252.82*lam*25   #time at sample
tmon2 = 252.82*lam*23.5 #time to monitor 6 on LET
tmax = tsam+(252.82*lam_max*4.1) #maximum time to measure inelastic signal to
t_elastic = tsam+(252.82*lam*4.1)   #maximum time of elastic signal
tbin = [int(tmon2),1.6,int(tmax)]
Rebin(InputWorkspace=sample_ws,OutputWorkspace=sample_ws, Params=tbin, PreserveEvents='1')
energybin = [ebin[0]*energy,ebin[1]*energy,ebin[2]*energy]
energybin = [ '%.4f' % elem for elem in energybin ]  
ebinstring = str(energybin[0])+','+str(energybin[1])+','+str(energybin[2])
argi={}
argi['det_cal_file']='det_corrected7.dat'
argi['bleed'] = False
argi['norm_method']='current'
argi['detector_van_range']=[0.5,200]
argi['bkgd_range']=[int(t_elastic),int(tmax)]
argi['hard_mask_file']='LET_hard.msk'
reduced_ws = dgreduce.arb_units(white_ws, sample_ws, energy, ebinstring, map_file,**argi)

comment:9 Changed 7 years ago by Samuel Jackson

  • Blocking 7718 added

comment:10 Changed 7 years ago by Martyn Gigg

Remove mutex lock from WorkspaceGroup::createInfoNode

This is called by the base Workspace::addInfoNodeTo method when the object is a WorkspaceGroup. The base method locks the workspace RWLock but then for a group the overloaded createInfoNode was then also locking a mutex that causes a dead-lock while both locks wait for each other to release. Refs #7677 (cherry picked from commit 59477d7a6745832606b84ef724b707ae3022de49)

Changeset: 839a0fe6d2b3af69aa1e0c82bd985286f1d5e756

comment:11 Changed 7 years ago by Martyn Gigg

Fix a minor bug in naming of group members in ChopData.

The ADS tries to add the members of a group to the ADS if they don't have a name. It's naming scheme is different to what ChopData does and so you effectively end up with an object having two names. The dock then reports workspaces that don't exist and you can't delete them. Refs #7677

Changeset: 0465652f4023bd56e559ac48570a9345c1d7abf9

comment:12 Changed 7 years ago by Martyn Gigg

Handle locking of run/sample in ExperimentInfo.

This way we can use a more general mutex rather than the specific critical section used in cow_ptr. Also move some calls that access the object directly to use the accessor methods that should prevent access to the object while it is being copied. Refs #7677 (cherry picked from commit de17341707935da61e5513f74e5dde9a7a93f9b7)

Changeset: e77151b599b161cbea47749af3d3888f4a831bfb

comment:13 Changed 7 years ago by Wenduo Zhou

  • Status changed from verify to verifying
  • Tester set to Wenduo Zhou

comment:14 Changed 7 years ago by Wenduo Zhou

Both scripts worked. The test is thus passed.

comment:15 Changed 7 years ago by Wenduo Zhou

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/bugfix/7677_segfault_c2e'

comment:16 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 8522

Note: See TracTickets for help on using tickets.