Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the expected semantics of changing a method in a class? #35

Open
ltratt opened this issue May 31, 2020 · 8 comments
Open

What are the expected semantics of changing a method in a class? #35

ltratt opened this issue May 31, 2020 · 8 comments
Labels
bug Fixes an issue, incorrect implementation help wanted Pull requests very welcome (usually easier than other issues) language design This issue requires design decisions spec Needs specification, or is specification related.

Comments

@ltratt
Copy link

ltratt commented May 31, 2020

Consider this program:

mutate_methods = (
    run = (
        mutate_methods methods at:2 put: #a.
        self f.
    )

    f = ( ^#b. )
)

[There is obviously a fragile assumption here that the f is the second method in methods, but let's not worry about that for the time being.]

On Java SOM it crashes on the self f call:

Exception in thread "main" java.lang.ClassCastException: class som.vmobjects.SSymbol cannot be cast to class som.vmobjects.SInvokable (som.vmobjects.SSymbol and som.vmobjects.SInvokable are in unnamed module of loader 'app')

which makes sense. However this seemingly inconsequential variation:

mutate_methods = (
    run = (
        self f.
        mutate_methods methods at:2 put: #a.
        self f.
    )

    f = ( ^#b. )
)

does not crash, presumably because the first self f call populates an inline cache which is not invalidated when the methods Array is mutated.

I assume that this is probably not the intended semantics for this second program (or perhaps it is?), but I'm not sure if e.g. a) SOM should notice the method change b) forbid chnages to the methods Array (e.g. via some sort of "this Array can't be altered?"). Any thoughts?

@smarr
Copy link
Member

smarr commented May 31, 2020

Thoughts: this is an unsatisfactory situation and reflection is considered unsafe in all SOM implementations.

I believe it is an example of "We know" how to make it fast and correct, and thus, noone ever bothered with it...

In Smalltalk, one would invalidate the caches. At least some Smalltalks assume they are used by "responsible programmers" and put the burden on the programmer by having an explicit API for invalidation: a flushCache primitive http://www.mirandabanda.org/bluebook/bluebook_chapter29.html
It's typically called by the "compiler" or method dictionary implementation.
Another reason why Smalltalk is an inherently unsafe language...

I am not entirely sure who added the lookup caching in SOM. Looking at the available code history it may have involved @mhaupt or @krono, but in the end, I am the one to blame for the current situation.

The "easy" thing to do would be to wrap the method array with something that invalidates all caches on write. In fast systems, one wants a more fine-grained approach, e.g. only removing relevant cache entries.

Currently, none of the SOMs do that. Reflection is simply not used in a mutating way.

@smarr smarr added language design This issue requires design decisions spec Needs specification, or is specification related. bug Fixes an issue, incorrect implementation help wanted Pull requests very welcome (usually easier than other issues) labels May 31, 2020
@ltratt
Copy link
Author

ltratt commented May 31, 2020

I guess there are two questions that we can tease apart here:

  1. What are the intended semantics. I think you're saying that SOMs should automatically crash on the second of the two examples I gave without the user having to do anything else? [i.e. I don't think you're saying that flushCache is a good idea?]

  2. How to make the intended semantics fast.

@smarr
Copy link
Member

smarr commented May 31, 2020

In an ideal world, reflective changes would be safe.

This may include an exception when putting a non-callable object into the method array.
It should likely automatically invalidate lookup caches, as class loading in Java does. Java being an example for a safe language here...

@ltratt
Copy link
Author

ltratt commented May 31, 2020

Can we choose one of those models so that I can implement it? :p

@smarr
Copy link
Member

smarr commented May 31, 2020

Sure!

Evil Laugh

@ltratt
Copy link
Author

ltratt commented May 31, 2020

I probably deserved that ;)

I will definitely invalidate caches. I'm not sure about the "raise an exception when something incorrect is put in there", because that makes Arrays appear to behave differently in ways that users can't reflect on (i.e. if I pass you an array, you can't tell if putting a symbol in it will bork until you put a symbol in it). But if you tell me that's the SOM way then I will do it!

@smarr
Copy link
Member

smarr commented May 31, 2020

Yes, invalidating caches is the best first step :)

@krono
Copy link
Member

krono commented Jun 2, 2020

For the record, CSOM did not have any lookup cache and both snippets crash at the second invocation of #f.

@ltratt ltratt changed the title What are the expected semantics of chaning a method in a class? What are the expected semantics of changing a method in a class? Jun 2, 2020
bors bot added a commit to softdevteam/yksom that referenced this issue Jun 3, 2020
129: Call the System initialize: method r=ptersilie a=ltratt

SOM runs programs by calling the `initialize:` method in `System.som` and having it load the user's class. This PR makes SOM do just this (f942f27). To get to that point, we have to do various bits of preparatory work, notably supporting the `methods` (c64f237) and `signature` (926cd5e) primitives.

Note that the `methods` primitive exposes a mutable array and we are thus currently subject to the same flaw as Java SOM in this regard (see SOM-st/SOM#35). Fixing that is for a later PR.

Co-authored-by: Laurence Tratt <[email protected]>
bors bot added a commit to softdevteam/yksom that referenced this issue Jun 5, 2020
130: Invalidate inline caches on method mutation r=ptersilie a=ltratt

This fixes the problem identified in SOM-st/SOM#35 and which, AFAIK, affects all SOM implementations: mutating a method in a class causes incorrect behaviour if an inline cache exists for that call (e.g. yksom (before this PR) and Java SOM could "ignore" the change if a cache had been created; and C SOM apparently crashes!). This PR fixes the problem for yksom.

It does so in a brute force way: when a method in any class is changed, every inline cache in the VM is reset. This makes correctness easy to reason about, though clearly it is less efficient than more nuanced approaches.

The way we do this is sort-of interesting from a Rust perspective. There are two obvious ways to model this:

* Have every SOM `Array` have a `bool` that says "if I'm mutated, flush all inline caches".
* Subclass the `Array` struct (in Rust) so that only instances of the `MethodArray` subclass flush inline caches on mutation.

The former is ugly, the latter seems impossible: Rust doesn't have subclasses, and traits only partially model them. In this case, the `bool` would probably be most efficient, but it's just really ugly. So I tried to make the latter case work, even though it doesn't appear to be possible. In particular, and perhaps surprisingly (though there are good reasons for this), a trait object `T1 + T2` cannot be coerced into a trait object of either `T1` or `T2`. This PR uses a simple way to solve this problem:

1. We define a trait `Array`
2. We define two structs `NormalArray` (which is the old `Array` struct renamed) and `MethodsArray` (which is only used for the arrays of methods in `Class`es).
3. We add a `to_array` function to `Obj` which returns (simplified) `&dyn Array`. Given a `&dyn Obj` we can then turn it into a `&dyn Array` by calling `to_array` *if and only if* an implementation of `to_array` is provided. Only `NormalArray` and `MethodsArray` implement this.

Using this, we can get from a `Val` to a `&dyn Array` by calling (simplified) `val.tobj().to_array()`. Yes, it means we have two dynamically dispatched function calls, but it means that we have a relatively nice way of separating out the behaviour of normal arrays from method arrays.

Most of the PR is thus shuffling code around (and duplicating some code): the real meat is in 0889b12.

Co-authored-by: Laurence Tratt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes an issue, incorrect implementation help wanted Pull requests very welcome (usually easier than other issues) language design This issue requires design decisions spec Needs specification, or is specification related.
Projects
None yet
Development

No branches or pull requests

3 participants