Ticket #11614 (closed: fixed)

Opened 5 years ago

Last modified 5 years ago

SaveNXSPE: Write speeds very slow on ceph file system

Reported by: Martyn Gigg Owned by: Martyn Gigg
Priority: critical Milestone: Release 3.4
Component: Framework Keywords:
Cc: Blocked By:
Blocking: Tester: Andrei Savici

Description

The excitations group have noticed that a 300Mb file can take ~20mins to write to the ceph file system used for autoreduction!

Change History

comment:1 Changed 5 years ago by Martyn Gigg

The following script uses h5py to verify the problem:

import h5py
import numpy as np
import os
import time
import sys

def write_data(data_group, name, data, slab_size):
    """
    slab_size in bytes
    """
    dset = data_group.create_dataset(name, shape=data.shape,
                                     dtype='f')
    nrows, ncols = data.shape
    nrows_block = slab_size/(8*ncols)
    if nrows_block == 0:
        # not enough for whole slab
        nrows_block = nrows

    for row_idx in range(0, nrows-nrows_block, nrows_block):
        dset[row_idx:row_idx + nrows_block,:] = data[row_idx:row_idx + nrows_block,:]

    # final block
    remainder = nrows % nrows_block
    if remainder > 0:
        start, stop = nrows - remainder, nrows+1
        dset[start:stop,:] = data[start:stop,:]

def write_file(filename, signal, errors, slab_size):
    try:
        os.remove(filename)
    except OSError:
        pass
    hdf_file = h5py.File(filename, 'w')

    # Top level group
    root_group = hdf_file.create_group("workspace_1")
    root_group.attrs["NX_class"] = "NXentry"

    # data group
    data_group = root_group.create_group("data")
    data_group.attrs["NX_class"] = "NXdata"

    # write the data
    write_data(data_group, "data", signal, slab_size)
    write_data(data_group, "error", errors, slab_size)

    hdf_file.close()

#------------------------------------------------------

if len(sys.argv) < 2:
    print "Usage: %s FILEPATH" % sys.argv[0]
    sys.exit(1)

# Gives ~ 300Mb dataset
num_rows = 286720
num_cols = 66
signal = np.arange(num_rows*num_cols)
signal = signal.reshape(num_rows, num_cols)
errors = np.sqrt(signal)

# 4Mb
slab_size = 4*1024*1024
# 1 row (bytes)
#slab_size = 8*num_cols

start_time = time.time()
write_file(sys.argv[1], signal, errors, slab_size)
end_time = time.time()

print "Time to write file: {0}s".format(end_time - start_time)

If you uncomment the slab_size = 8*num_cols line then you will observe a large difference in the write speed. This is because more calls to putslab mean more disk accesses and some parts of ceph must be waiting on previous chunks to be transferred to remote locations before allowing the next to be written.

A solution is to write the data in larger chunks.

Last edited 5 years ago by Martyn Gigg (previous) (diff)

comment:2 Changed 5 years ago by Martyn Gigg

  • Status changed from new to inprogress

Improve tests around SaveNXSPE

The main test now actually verifies the data in the file using HDF. Refs #11614

Changeset: dd3a8634edd87669c069adb38af76c128ce28fb6

comment:3 Changed 5 years ago by Martyn Gigg

Improve SaveNXSPE test to check different sizes.

Refs #11614

Changeset: 4e9598dd01329c6a8406c69548c4a5650d751561

comment:4 Changed 5 years ago by Martyn Gigg

Write the data to the file in larger chunks

Avoids excessive disk writes that slow things down. Refs #11614

Changeset: b2b0434faa29d22aab6115f40e6e95be0686f0ef

comment:5 Changed 5 years ago by Martyn Gigg

Improve tests around SaveNXSPE

The main test now actually verifies the data in the file using HDF. Refs #11614

Changeset: 40fd11d02dc5352791a0c8d2bc527d713fac9d79

comment:6 Changed 5 years ago by Martyn Gigg

Improve SaveNXSPE test to check different sizes.

Refs #11614

Changeset: a91025ca80192cc68bc661dd7f00a1b8a518960f

comment:7 Changed 5 years ago by Martyn Gigg

Write the data to the file in larger chunks

Avoids excessive disk writes that slow things down. Refs #11614

Changeset: 813aee83845fe04e768682a8f96b87664872f978

comment:8 Changed 5 years ago by Martyn Gigg

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #632.

comment:9 Changed 5 years ago by Martyn Gigg

Fix test helper method call.

Refs #11614

Changeset: 751d7350b6c72ea4d2d4d71279546f2a86b92cdb

comment:10 Changed 5 years ago by Andrei Savici

  • Status changed from verify to verifying
  • Tester set to Andrei Savici

comment:11 Changed 5 years ago by Andrei Savici

Works with DAVE Mslice and Horace

comment:12 Changed 5 years ago by Martyn Gigg

Improve tests around SaveNXSPE

The main test now actually verifies the data in the file using HDF. Refs #11614

Changeset: 40fd11d02dc5352791a0c8d2bc527d713fac9d79

comment:13 Changed 5 years ago by Martyn Gigg

Improve SaveNXSPE test to check different sizes.

Refs #11614

Changeset: a91025ca80192cc68bc661dd7f00a1b8a518960f

comment:14 Changed 5 years ago by Martyn Gigg

Write the data to the file in larger chunks

Avoids excessive disk writes that slow things down. Refs #11614

Changeset: 813aee83845fe04e768682a8f96b87664872f978

comment:15 Changed 5 years ago by Martyn Gigg

Fix test helper method call.

Refs #11614

Changeset: 751d7350b6c72ea4d2d4d71279546f2a86b92cdb

comment:16 Changed 5 years ago by Andrei Savici

  • Status changed from verifying to closed

Merge pull request #632 from mantidproject/11614_improve_savenxspe_write_speed

Improve speed of SaveNXSPE

Full changeset: d31c0d839c8821ef8ac2b6061c4013fb0b6dc566

comment:17 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 12452

Note: See TracTickets for help on using tickets.