Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated memory management #543

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

samschott
Copy link
Member

@samschott samschott commented Nov 19, 2024

This PR changes the memory management model of Rubicon. Previously, we would release objects on Python __del__ calls only if we created them ourselves and own them from alloc and similar calls.

In this PR, we always ensure that we own objects when we create a Python wrapper, by explicitly calling retain if we did not get them from alloc etc and always calling autorelease when the Python wrapper is garbage collected. This has a few advantages:

  1. Users will no longer need to manually retain objects when receiving them from non-alloc methods and to release them before the Python object goes out of scope to avoid memory leaks.
  2. Unless users manually release an object, Rubicon guarantees that a Python wrapper always points to an existing Objective-C object -- the one that it was created for.

This change should be backward compatible for most users because existing manual retain and release calls don't cause any issues if balanced and would have already caused segfaults if there are more releases than retains.

TODO:

  • Update docs.
  • Test with toga.
  • Figure out a decent transition plan for clients.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@samschott
Copy link
Member Author

@mhsmith, care to have first look? Please note the TODOs still listed above.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a lot less invasive than I thought it would be.

I've done a check of the toga-chart issue that triggered this change; and it seems to resolve that problem.

I also did a quick check with Toga's testbed suite on macOS; that code has a bunch of manual retains and autorelease/releases. I thought they should all be balanced though - worst case, objects would be over-retained - so I was a little surprised that it the testbed segfaults almost immediately (and the stack trace doesn't give any obvious pointers what is causing the issue).

If I remove all the retains and releases, the testbed segfaults; but on inspection, some of the uses are for objects that are created in Python, then handed to ObjC to manage (e.g., Toolbar items created here), or the copyWithZone handler here). But I guess those uses of memory handling make sense - and they're a lot closer aligned to the "spirit" of ObjC memory handling, Plus, in at least the ToolbarItem case, it could be avoided by keeping the toolbar instance in the cache of items.

@freakboy3742
Copy link
Member

Related - if we land this, I suspect a version bump to 0.5 might be called for. This is just backwards incompatible enough that I think it's worth flagging the significance of the change.

@samschott
Copy link
Member Author

samschott commented Nov 20, 2024

Thanks for the thorough checks! I've updated the PR description to give a better summary of the change and also discuss why this should be non-breaking for most users.

I'll have a closer look at the segfaults that you encountered, later. They might be caused by the usage of release instead of autorelease in __del__ which does not give Objective-C a chance to take over ownership.

Maybe there are also ways to prevent users from shooting themselves in the foot, e.g., raise an exception on manual release calls if there is only a single reference left.

@mhsmith
Copy link
Member

mhsmith commented Nov 20, 2024

Thanks, this looks great. I'm busy today, but I'll take a look at this as soon as I can.

@samschott samschott marked this pull request as ready for review November 21, 2024 01:12
@samschott
Copy link
Member Author

samschott commented Nov 23, 2024

I've had a closer look now at the segfaults and could identify two cases where they happen:

  1. The object is released by Rubicon but still needed in ObjC, for example because it is being assigned to a property. Replacing release by autorelease in __del__ fixes this by giving ObjC a chance to take over ownership.
  2. Toga has a few "stray" release or autorelease calls sprinkled throughout the codebase to manually clean up memory, for example because of point (1). This relies on Rubicon internally setting _needs_release = False after those calls and disabling its own cleanup logic. This no longer works and results in one too many release calls now.

beeware/toga#2978 contains all the changes that I found to (1) prevent segfaults and (2) remove now unneeded manual memory management.

Copy link
Member

@mhsmith mhsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #539 and #48 to the Fixes list in the top comment.

docs/how-to/memory-management.rst Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
changes/256.bugfix.rst Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
changes/256.removal.rst Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
changes/256.bugfix.rst Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
@freakboy3742
Copy link
Member

@samschott FYI - I'm on the last day of PyCon AU sprints; I might not get a chance to look at this until tomorrow when I'm back in my office for a return to semi-normal business.

@samschott
Copy link
Member Author

No worries. I've gotten a quite a thorough review from @mhsmith in the meantime, but will wait for both of your approvals before merging. Enjoy PyCon AU!

# it here to prevent leaking memory, Python already owns a refcount from
# when the item was put in the cache.
if _returned_from_method.startswith(_OWNERSHIP_METHOD_PREFIXES):
send_message(object_ptr, "release", restype=objc_id, argtypes=[])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a bit that the mental load of following when we retain and release is a bit much now since the reader needs to think though the __new__ method being invoked multiple times with the same pointer.

Alternatives such as explicitly tracking a "Python refcount" might be easier to understand but would also add complexity.

Suggestions are welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this approach: if it's possible to deal with the situation immediately after the copy, that's definitely easier to follow than maintaining an extra refcount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. An extra refcount starts to get into re-implementing garbage collection territory, and I'm not sure that's a game that is worth it. I'm comfortable with this as a known edge case, and documenting it with some related usage patterns and advice.

@samschott
Copy link
Member Author

I've made one more change, to accommodate how copyWithZone: is often implemented on immutable objects such as NSString or NSDictionary where the original object is returned but with an additional retain count. When an already cached object is returned from copy and related methods, this retain count is now released immediately.

@samschott
Copy link
Member Author

samschott commented Nov 26, 2024

There is yet another hairy issue with the current implementation which will lead to memory leaks: alloc().init() chains where init changes the object's memory address, as documented at https://developer.apple.com/documentation/objectivec/nsobject/1418641-init?language=objc:

In some cases, a custom implementation of the init method might return a substitute object. You must therefore always use the object returned by init, and not the one returned by alloc or allocWithZone:, in subsequent code.

This is the case for example with NSDictionary.alloc().init() where the init returns a different object. The catch is that the object returned by init in this case seems to be owned by the caller as well (not only the object returned by alloc) and we start leaking memory. Example code with the implementation from this PR:

from rubicon.objc import *
from rubicon.objc.runtime import autoreleasepool


with autoreleasepool():
  adict = NSDictionary.alloc()
  idict = adict.initWithObjects(["some_object"], forKeys=["some_key"])

assert idict.retainCount() == 2  # Retained manually by Rubicon and implicitly.
print(adict)  # segaults because adict was deallocated

The above has two problems:

  1. It leads to segfaults if the user attempts to use the allocated object after the autorelease pool was drained.
  2. It leaks memory because the initialized object is retained twice but Rubicon is only aware of one of the retains.

Edit: Segfault seems to be for different reasons.

src/rubicon/objc/api.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
samschott and others added 3 commits November 26, 2024 21:29
Co-authored-by: Malcolm Smith <[email protected]>
Co-authored-by: Malcolm Smith <[email protected]>
Co-authored-by: Malcolm Smith <[email protected]>
@samschott
Copy link
Member Author

Looks like my assessment from #543 (comment) was not quite correct:

The adict object does not seem to be deallocated, the print statement seems to segfault for other reasons.

@mhsmith
Copy link
Member

mhsmith commented Nov 26, 2024

NSDictionary.alloc().init() where the init returns a different object

The above has two problems:

  1. It leads to segfaults if the user attempts to use the allocated object after the autorelease pool was drained.
  2. It leaks memory because the initialized object is retained twice but Rubicon is only aware of one of the retains.

I guess then we'll also have to special-case methods starting with init. If init returns a different object to the one it was called on, then instead of creating a new ObjCInstance, replace the existing instance's pointer with the new one and return it.

The adict object does not seem to be deallocated, the print statement seems to segfault for other reasons.

I wouldn't worry too much about that object. It might be deallocated, or it might be a placeholder that gets recycled by every call to NSDictionary.alloc, but either way, we have no right to access it after calling init. As the init documentation says, "You must therefore always use the object returned by init, and not the one returned by alloc or allocWithZone:, in subsequent code."

@freakboy3742
Copy link
Member

There is yet another hairy issue with the current implementation which will lead to memory leaks: alloc().init() chains where init changes the object's memory address, as documented at https://developer.apple.com/documentation/objectivec/nsobject/1418641-init?language=objc:

Interesting... I wonder if this was the thing triggering #539...

NSDictionary.alloc().init() where the init returns a different object
The above has two problems:

  1. It leads to segfaults if the user attempts to use the allocated object after the autorelease pool was drained.
  2. It leaks memory because the initialized object is retained twice but Rubicon is only aware of one of the retains.

I guess then we'll also have to special-case methods starting with init. If init returns a different object to the one it was called on, then instead of creating a new ObjCInstance, replace the existing instance's pointer with the new one and return it.

To make sure we're on the same page, AIUI, your proposal is extra handling in ObjCBountMethod.__call__ that will be invoked if self.method.name.startswith("init"); this logic will:

  1. check that self.receiver (the object on which the method is invoked) is the same value as the return value of the method; 2. If it isn't the same pointer, __call__ will modify the ObjCInstance instance itself, and the ObjCInstance cache to reflect the new value, releasing the "old" version from the alloc
  2. If it is the same pointer, it does nothing and just returns the pointer as it normally would.

That sounds broadly reasonable to me; I have 2 questions/concerns:

  1. Are there "init" methods that don't start with init? I can't think of any examples off the top of my head, but that might be my memory fading with age.

  2. Is there a particular reason you suggest patching the old instance, rather than creating a new instance? It seems like this would be less complex to implement (possibly even the default behavior?), with the only downsides being the cost of creating a second Python object, and the fact that the alloc object and the init object aren't the same - but that mirrors the underlying ObjC behavior, so it strikes me as the sort of thing that can be documented.

@freakboy3742
Copy link
Member

I guess then we'll also have to special-case methods starting with init. If init returns a different object to the one it was called on, then instead of creating a new ObjCInstance, replace the existing instance's pointer with the new one and return it.

Something that just occurred to me in reviewing beeware/toga#2978 in the context of this change - does this also resolve the NSImage "init fail" issue? The return value from init will be different to the alloc'd object... so the alloc'd object should be released; the only catch is that we don't need to create the ObjCInstance because it's a None object.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good to me. Toga has an existing <0.5 pin, so we're safe to do a major update here without breaking Toga; and beeware/toga#2978 is a pretty clear "Delete all your manual retain/release" PR, except for one copy edge case, and the NSImage init failure case (which, AFAICT, will be solved if we address the "init may change address" issue.

cached_obj = cls._cached_objects[object_ptr.value]

# If a cached instance was returned from a call such as `copy` or
# `mutableCopy`, we take ownership of an additional refcount. Release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging so it isn't forgotten - a link here to the NSCopying docs would be helpful, plus highlighting that copy can return the same object as an optimisation if the object is immutable.

# it here to prevent leaking memory, Python already owns a refcount from
# when the item was put in the cache.
if _returned_from_method.startswith(_OWNERSHIP_METHOD_PREFIXES):
send_message(object_ptr, "release", restype=objc_id, argtypes=[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. An extra refcount starts to get into re-implementing garbage collection territory, and I'm not sure that's a game that is worth it. I'm comfortable with this as a known edge case, and documenting it with some related usage patterns and advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants