Ticket #7798 (closed: wontfix)

Opened 7 years ago

Last modified 5 years ago

Linux: TCMalloc does not release free memory when requested

Reported by: Martyn Gigg Owned by: Martyn Gigg
Priority: critical Milestone: Release 3.3
Component: Framework Keywords:
Cc: Blocked By:
Blocking: Tester: Anders Markvardsen

Description

There are calls to TCMalloc's ReleaseFreeMemory both before and after algorithm execution and also when calling FrameworkManager::clear.

It would appear that these calls do not always do what is expected. For example, the system test performance reports seem to suggest alot of memory loss in many tests:

https://builds.sns.gov/view/All/job/ornl_test_rhel6_develop/System_tests_performance/?

whereas the Windows servers do not show this:

https://builds.sns.gov/view/All/job/ornl_test_windows7_develop/System_tests_performance/?

Investigate what is happening here. The script below (data in systemtests/Data) should leave memory around when the clear should have freed it up:

import mantid
from mantid.kernel import MemoryStats
from mantid.simpleapi import *
mantid.api.FrameworkManager.clear()

## TEST ##
memory_before = MemoryStats().residentMem()/1024
wish_ws = Load(Filename='WISH00016748.raw',OutputWorkspace='wish_ws')
mantid.api.FrameworkManager.clear()

heldMemory = MemoryStats().residentMem()/1024 - memory_before

print "Memory held:",heldMemory

Change History

comment:1 Changed 7 years ago by Martyn Gigg

  • Status changed from new to inprogress

comment:3 Changed 7 years ago by Martyn Gigg

  • Milestone changed from Release 3.0 to Release 3.1

comment:4 Changed 7 years ago by Martyn Gigg

  • Milestone changed from Release 3.1 to Release 3.2

comment:5 Changed 7 years ago by Martyn Gigg

  • Status changed from inprogress to assigned

comment:6 Changed 6 years ago by Martyn Gigg

  • Milestone changed from Release 3.2 to Release 3.3

comment:6 Changed 6 years ago by Martyn Gigg

  • Status changed from assigned to inprogress

Fix link order of libaries so that tcmalloc is first.

This allows tcmalloc to replace malloc and report the correct memory usage when running through Python, with the exception of systems running gcc 4.4. On these systems, if stdc++ is not linked/loaded first then a segfault occurs when throwing an exception across a dll boundary. Refs #7798

Changeset: 3746a914ff5ab333718ada501c5c0b7fb740b05e

comment:7 Changed 6 years ago by Martyn Gigg

Simplify to process of loading the Python plugins on Linux.

Rearranging the library link order seems to have made obselete the process of of having to force certain libraries to load first. Refs #7798

Changeset: 3b72565c6b22dd965f9e68466efe88cef8573321

comment:8 Changed 6 years ago by Martyn Gigg

Fix segfault on RHEL6.

Each Python module was separately linked to each core library, which is unnecessary. They only each need to link to the boost python libraries. Most systems handled this multiple linking without a problem but RHEL6 would segfault when accessing the NeXus C api. Refs #7798

Changeset: 362641f19a26aa2082c69df6a6caf67d8d1ebb77

comment:9 Changed 6 years ago by Martyn Gigg

Don't be as aggressive trying to force memory release.

It can be a time consuming operation and now that managed workspaces have gone we're not so reliant on being able to programatically judge the amount of memory available. Refs #7798

Changeset: 97802eedeb8836ad3c71a42d72c2a8a1127164e3

comment:10 Changed 6 years ago by Martyn Gigg

Put back MPI libraries in the Python layers.

Refs #7798

Changeset: 8b5796b916554920d88e6aed1f5680d8a6412f0c

comment:11 Changed 6 years ago by Martyn Gigg

Link Python libraries for C++ tests.

Refs #7798

Changeset: a68ab6c8306d845935ccdd4aa23d15bb44fc097a

comment:12 Changed 6 years ago by Martyn Gigg

Link Python libraries to the _kernel module and nothing else.

Refs #7798

Changeset: f45fd5995802b83b150eb8efb0b0d5e24cdb8ad8

comment:13 Changed 6 years ago by Martyn Gigg

Fix gcc test in Kernel link line.

It needs to catch all of 4.4 so restricts to > 4.5 Refs #7798

Changeset: 691e8a938f1cd973c55c803ba03b2217646a53ac

comment:14 Changed 6 years ago by Martyn Gigg

Also place NeXus first in linker list in Kernel for gcc 4.4

If it is not after stdc++ then we get a crash in Python when accessing the C api. Refs #7798

Changeset: 20230faf14bfc3c4cdabe2c16ad733492caa20f4

comment:15 Changed 6 years ago by Martyn Gigg

Also place NeXus first in linker list in Kernel for gcc 4.4

If it is not after stdc++ then we get a crash in Python when accessing the C api. Refs #7798

Changeset: 20230faf14bfc3c4cdabe2c16ad733492caa20f4

comment:16 Changed 6 years ago by Martyn Gigg

  • Status changed from inprogress to verify
  • Resolution set to wontfix

After discussions with the TSC it has been decided to go down a different path with tcmalloc, as described in #10271. As a result I am abandoning the changes here to avoid confusion with what will become the actual work.

The branch has been deleted so there is nothing to merge.

comment:17 Changed 6 years ago by Anders Markvardsen

  • Status changed from verify to verifying
  • Tester set to Anders Markvardsen

comment:18 Changed 6 years ago by Anders Markvardsen

  • Status changed from verifying to closed

Superceeded by #10271

comment:19 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 8643

Note: See TracTickets for help on using tickets.