Ticket #2187 (closed: fixed)

Opened 10 years ago

Last modified 5 years ago

Investigate using alternative malloc's (e.g. tcmalloc) for performance

Reported by: Janik Zikovsky Owned by: Janik Zikovsky
Priority: major Milestone: Iteration 27
Component: Mantid Keywords:
Cc: Blocked By:
Blocking: Tester: Russell Taylor

Description

There are a few alternative malloc's that help multithreaded object allocation: tcmalloc, nedmalloc. Researching how to use these and maybe some speed tests in Mantid.

Change History

comment:1 Changed 10 years ago by Janik Zikovsky

  • Status changed from new to accepted

comment:2 Changed 10 years ago by Janik Zikovsky

Results so far:

  • Downloaded and compiled libunwind (needed a flag): export CFLAGS=-U_FORTIFY_SOURCE
  • Downloaded and installed google-perf-tools (includes tcmalloc).

Then, using:

export LD_PRELOAD="/usr/local/lib/libtcmalloc.so"

I ran a test that loaded a large TOPAZ event file (run 1715). Time went down from 42 seconds to 24 seconds.

Modified CMake to link to libtcmalloc.so.

All compiles and tests run fine. Same test runs in about 30 seconds.

HOWEVER, python crashes with a segfault when you import MantidFramework. This is where I am at now.

comment:3 Changed 10 years ago by Janik Zikovsky

More notes:

Looks like the Python API uses dlopen to load its libraries, and there is a warning about that in the tcmalloc docs. I got around the segfault by compiling tcmalloc with Thread Local Storage turned off.

A memory leak tester that simply loads the same event file 25 times was run:

  • Without tcmalloc: memory usage reported went up to 7.1 GB virt / 6.5 GB res.
  • With tcmalloc: memory usage topped out at 4.3 GB / 4.0 GB.

No performance results yet.

comment:4 Changed 10 years ago by Janik Zikovsky

Performance testing results:

--- Standard allocator: ---

12.4525589943  seconds for TOPAZ instrument loading
14.2845599651  seconds for TOPAZ 1715 loading
32.47803092  seconds PG3_1370 data reduction
59.2151498795  seconds elapsed total


--- tcmalloc with Local Thread Storage turned off ---

9.26182699203  seconds for TOPAZ instrument loading
8.47131490707  seconds for TOPAZ 1715 loading
29.6662449837  seconds PG3_1370 data reduction
47.3993868828  seconds elapsed total

So the improvement is across the board but it is more noticeable for TOPAZ (perhaps because it has lots of pixels with not so many events in each = lots of small allocations).

comment:5 Changed 10 years ago by Janik Zikovsky

Follow-up: This is the python code used in the tests above:

import time

import MantidFramework
MantidFramework.mtd.initialise()

t0 = time.time()

LoadEmptyInstrument("/home/8oz/Code/Mantid/Code/Mantid/Instrument/TOPAZ_Definition.xml", "topaz_instrument")

t1 = time.time()

LoadSNSEventNexus("/home/8oz/data/TOPAZ_1715_event.nxs", "topaz")

t2 = time.time()

if 1:
	calib = "../../../../Test/AutoTestData/pg3_mantid_det.cal"
	data_file = "/home/8oz/data/PG3_1370_event.nxs"
	wksp = "pg3"

	LoadSNSEventNexus(data_file, wksp)
	AlignDetectors(InputWorkspace=wksp, OutputWorkspace=wksp, CalibrationFile=calib)
	DiffractionFocussing(InputWorkspace=wksp, OutputWorkspace=wksp, GroupingFileName=calib)
	# Sort(InputWorkspace=wksp, SortBy="Time of Flight")
	ConvertUnits(InputWorkspace=wksp, OutputWorkspace=wksp, Target="TOF")
	NormaliseByCurrent(InputWorkspace=wksp, OutputWorkspace=wksp)

t3 = time.time()
print
print
print t1 - t0, " seconds for TOPAZ instrument loading"
print t2 - t1, " seconds for TOPAZ 1715 loading"
print t3 - t2, " seconds PG3_1370 data reduction"
print t3 - t0, " seconds elapsed total"

comment:6 Changed 10 years ago by Janik Zikovsky

Extra note: when we have a statically-linked, "supercomputing" mantid (Vickie's ticket), then we might enable the TLS for TCMalloc

comment:7 Changed 10 years ago by Janik Zikovsky

Another memory usage test, this time from within MantidPlot, loading PG3_1370 15 times in a script:

  • With tcmalloc, 4.7 GB virt / 4.0 GB resident memory; seemed stable here.
  • With standard allocator, 6.0 GB virt / 5.1 GB resident memory; looked like it would keep going up slowly.

comment:8 Changed 10 years ago by Janik Zikovsky

Continued: 4.7 GB virt / 4.0 GB resident memory still after 125 loads.

comment:9 Changed 10 years ago by Janik Zikovsky

(In [8960]) Refs #2187: Added some calls to release free memory if you linked against TCmalloc; commented out for now.

comment:10 Changed 10 years ago by Janik Zikovsky

System tests run with/without TCMalloc. All tests pass (a couple were removed that failed in both cases):

Without TCMalloc:

  • 374.4 sec user
  • 2:55.45 time elapsed

With TCMalloc:

  • 351.4 sec user
  • 2:45.50 time elapsed

comment:11 Changed 10 years ago by Janik Zikovsky

(which is a 6% speedup using TCMalloc).

comment:12 Changed 10 years ago by Russell Taylor

(In [8988]) Support for optinally linking to tcmalloc library. Re #2187.

comment:13 Changed 10 years ago by Russell Taylor

(In [8989]) Small correction. Re #2187.

comment:14 Changed 10 years ago by Janik Zikovsky

(In [9006]) Refs #2187: Define and flags added around TCMalloc-specific code should allow it to compile whether or not you have TCMalloc.

comment:15 Changed 10 years ago by Janik Zikovsky

(In [9007]) Refs #2187: Commented out a debug statement.

comment:16 Changed 10 years ago by Janik Zikovsky

  • Status changed from accepted to verify
  • Resolution set to fixed

TCMalloc is now integrated in cmake so it is up to developers to try it or not. Later it could be added to the linux build, but I am closing this ticket now.

comment:17 Changed 10 years ago by Russell Taylor

  • Status changed from verify to verifying
  • Tester set to Russell Taylor

comment:18 Changed 10 years ago by Russell Taylor

  • Status changed from verifying to closed

CMake build will link to tcmalloc if it finds it on the system. Unfortunately, rhel only has a 32 bit google-perf-tools in the epel repo so we'd have to build our own if we want to use it there.

comment:19 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 3034

Note: See TracTickets for help on using tickets.