Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async GL shader compilation #576

Closed
wants to merge 28 commits into from

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Jul 8, 2022

@dranikpg
Copy link
Contributor Author

dranikpg commented Jul 8, 2022

This is still a WIP, so no worries about incomplete doc comments, undefined constants for GLES targets, formatting, etc. 😄

While implementing the proposed changes for Shader&Program is no more than splitting code into two functions, handling compile states requires some planning.

Compilation states, I suppose, should be only data bags. All logic should be in FooGL::compile or the final constructor. This way is is much cleaner and easier to manage. However AbstractShaderProgram constrainedness stands in the way. A shader program has to be created and submitted for linkage in FooGL::compile or the compilation state. Besides that, typical FooGL::compile requires a program to exist early on to call bindAttributeLocation/bindFragmentDataLocation, which have to be arranged strictly before linkage. So it'd make sense fully to reuse AbstractShaderProgram, which supports all operations.

I came up with the following solution and implemented it for FlatGL:

  • I assumed that AbstractShaderProgram (ASP) should remain abstract with protected members. After all, that was its intention. But to reuse its functionality, we will make ASP::CompileState extend it.
  • FlatGL::ComplieState will extend ASP::CompileState as planned.
  • FlatGL::ComplieState will declare FlatGL::compile and FlatGL(CompileState&&) as friend, to allow access to ASP functions.
  • FlatGL(CompileState&&) will simply move the ASP out of the compile state, to initialize its base class (which is another ASP)
  • All common public faced helper functions (like isLinkFinished) will be in ASP::CS
                          -- submit/check/... 
                          |
                          v
                       +++++++++++++   
                       +    ASP    +  <-------- moves out of
                       +++++++++++++          | to init its ASP
                            ||                |
                       +++++++++++++          |
FooGL::compile         +  ASP::CS  +       FooGL(FooGL::CS&&)
       |               +++++++++++++          |
       |                    ||                |
       |               +++++++++++++          |
       ------------->  + FooGL::CS +  <-------- friend to 
creates & returns      +++++++++++++            use ASP functions 
friend

This solution is compact and doesn't change the way ASP works, however it feels a bit tricky to me...

Things to consider when generalizing for more than FlatGL:

  • Instead of using friend, we can reimplement all required functions for ASP::CS. More boilerplate, but less tricky?
  • We can make ASP::CS store vertex&fragment shaders and call attach in constructor. However submitLink still has to be called manually after all bind calls.
  • Instead of calling bind & submitLink manually, we can pass a function/templatize by a functor, which will be called after creation, but before linkage. This way FooGL::compile wont contain any ASP specific logic

What are your thoughts on this?

@mosra
Copy link
Owner

mosra commented Jul 12, 2022

Hi, apologies for a delayed response here -- I'm stuck deep in a certain other feature and its taking way more time than expected.

Thanks for the code, and especially for the detailed description of your thought process. Very appreciated 👍

I'll post a more detailed reply once I manage ship that other feature (hopefully later this week), but after a brief look at your initial changes it seems my original propsal wasn't exactly great in terms of the amount of boilerplate inflicted on subclasses. (My fault, sorry.) I have to think more about that. The ideal state would be that subclasses wouldn't need to implement that much extra apart from what they did before (because all duplicated state would also mean twice as much testing effort), only splitting the initial setup somewhat, perhaps at the cost of some safety in case the async compilation is used. Which is fine I guess, it's an advanced feature after all.

I have a vague idea that the construction could get split via a chain of delegating constructors, which would either directly call into each other in the straightforward case, or an object gets "partially constructed", waited upon, and then moved to a fully-constructed state once the compilation finishes. I'll outline that better once I get my hands free.

Until then, in 74f1778 I added support for KHR_parallel_shader_compile at least, so you don't have to patch it in by hand. (It caused a conflict -- sorry.)

@mosra mosra added this to the 2022.0a milestone Jul 12, 2022
@dranikpg
Copy link
Contributor Author

dranikpg commented Jul 18, 2022

Just throwing this in as as suggestion before you get back to it.
Instead of having separate compile states like described in the issue, we might be just fine with one generic one that immediately constructs the ASP, but guards it unitl its not compiled.

template<typename FooGL>
struct CompileState {
private: // no access to shader until compiled
  FooGL asp; // moved in from FooGL::compile
  Shader vert, frag;
  optional<Shader> geom; // or parameterize by N - number of shaders?
}

then

FooGL(CompileState<FooGL>&& cs)

blocks and finishes consturction.
All common functions can be handled by the compile state directly. Further more, there will be no duplicated fields like in the previous case

@mosra
Copy link
Owner

mosra commented Jul 22, 2022

Okay, that actually seems like the best middle ground :)

The delegating constructors from my comment above wouldn't work on its own because there would be nowhere to store the temporary GL::Shader instances, while OTOH putting them into the shader program itself would be wasteful. But combined with your suggestion it could work:

class FlatGL::CompileState: public FlatGL {
    private:
        friend FlatGL;
        using FlatGL::FlatGL; // low-effort way to get access to the NoCreate constructor to populate this
        GL::Shader frag, vert;
        GL::Version version; // + whatever else the constructor needs to preserve to avoid duplicated code
};

It's publicly derived, so the getter APIs like isLinkFinished() or flags() are easy to access if user code needs to. (Which means the dangerous APIs like draw() are accessible too, but that's fine, the same danger would be with a NoCreate'd instance.) I'm not 100% sure if it should be an inner class or defined externally as FlatGLCompileState, but having it as an inner class avoids further polluting docs of the Shaders namespace at least. It's also not a template because I think it's simpler to just type out all extra state you'd need (for compute shaders there would need to be Optional<GL::Shader> compute, etc, getting very ugly very fast).

Then, there would be FlatGL::compile() returning FlatGL::CompileState, and FlatGL(CompileState&&) that finishes construction, like you said, and like you already mostly have in your code.

Additionally:

  • I would make isLinkFinished() public in AbstractShaderProgram, to avoid a need for an AbstractShaderProgram::CompileState -- for a finalized instance it would always return true which is useless but not dangerous, so it doesn't need to be hidden from the public API.
  • I would make the submitCompile() / submitLink(), checkCompile() / checkLink() APIs operate just on a single shader/program, instead of a list. The list was there originally as a non-ideal parallelization attempt, and since you have a much better alternative now, calling submitCompile({a, b}) doesn't really have any advantage over a.submitCompile(), b.submitCompile() (besides avoiding some temporary allocations, but that's my problem to solve). The original list-taking compile() / link() would stay (and sequentially call into the submit/check APIs) for backwards compatibility for 3rd party code.

Does this make sense? I hope I didn't forget something critical again :) And apologies for the delay.

@dranikpg
Copy link
Contributor Author

Makes sense and should work 😄 I'll try this out next week

@dranikpg
Copy link
Contributor Author

I sketched the proposed changes. If it looks acceptable, I'll move on to tidying and implementing the same for other shaders.

FlatGL needs an additional constructor that will be called from its compile state. I suggest

protected:
/** @brief Construct without running shader compilation and linking */
FlatGL(NoInitT, Flags flags, etc.)

that will create the ASP and set variables, but not trigger compilation (or else we're stuck in a loop 😆). NoInit seems to be the closest from the available tags.


besides avoiding some temporary allocations, but that's my problem to solve

I saw the containers and it made me not go for the split initially. There is a todo to implement stack based allocation, I can try this out in this PR.

Copy link
Owner

@mosra mosra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you. I think the scaffolding essentials are quite close to completion, so I commented more thoroughly -- hope that's okay :)

What's left (apart from eventually stamping this out for the other shader variants once we have the base finalized) is some testing. I imagine this could be sufficient:

  • In GL/Test/ShaderGLTest.cpp there's a compile() test case. I'd split it into compile(), compileFailure(), compileAsync() and compileAsyncFailure(), verifying that isCompileFinished() is always true after compile() or checkCompile(), and that the failure message is never printed after submitCompile() but only after checkCompile(). Given there would be the suggested fallback for drivers without KHR_parallel_shader_compile, the test wouldn't really need to behave differently depending on extension presence. On the other hand, drivers can expose the extension but not actually parallelize anything, so I don't see a way to verify if isCompileFinished() actually ever returns false. Though if you have any idea, feel free to try that out.
  • Similarly in AbstractShaderProgramGLTest.cpp for the linking case.
  • For the flat shader, there's compile() and compileUniformBuffers() in Shaders/Test/FlatGLTest.cpp. Those are instanced and test a ton of combinations. I don't think it's needed to test all that again, I'd only add compileAsync() and compileUniformBuffersAsync() that check a single case with non-trivial input arguments, use the CompileState workflow there, verify that the flags/material/draw count gets properly propagated right from the start and verify that isLinkFinished() returns true after.

Tests can be enabled with MAGNUM_BUILD_TESTS and MAGNUM_BUILD_GL_TESTS in CMake, then for easy execution they're all put into Debug/bin or Release/bin. I'm pretty sure you'll manage to get around, but just in case here's a small tutorial for the test suite and here's how debug/warning/error output can be captured and verified. You could use for example Compare::StringHasPrefix to check that the output is reasonable without having to worry about random difference in the output across platform.

When you push, the GLES2 and GLES3 CI jobs as well as the Android GLES2 job run the GL tests with SwiftShader. I'm not sure if SwiftShader implements this extension (probably not?), but in any case most of the code gets verified. The other jobs don't run any GL tests, at the very least they'll check that the code compiles.

src/Magnum/Shaders/FlatGL.h Outdated Show resolved Hide resolved
src/Magnum/Shaders/FlatGL.cpp Outdated Show resolved Hide resolved
src/Magnum/GL/Shader.cpp Outdated Show resolved Hide resolved
src/Magnum/GL/Shader.cpp Show resolved Hide resolved
src/Magnum/GL/Shader.cpp Outdated Show resolved Hide resolved
src/Magnum/GL/AbstractShaderProgram.h Show resolved Hide resolved
src/Magnum/GL/Shader.cpp Show resolved Hide resolved
Copy link
Owner

@mosra mosra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor things, and I'm tired so apologies if my sentences don't make sense :D

I'll give it another look when you get to the tests. Thank you!

src/Magnum/Shaders/FlatGL.h Outdated Show resolved Hide resolved
src/Magnum/Shaders/FlatGL.h Show resolved Hide resolved
src/Magnum/GL/AbstractShaderProgram.h Outdated Show resolved Hide resolved
@dranikpg
Copy link
Contributor Author

dranikpg commented Jul 31, 2022

As it turns out, querying GL_LINK_STATUS might return success before the completion query does so 😕 I've discovered this when I was updating the tests

This means that the following is possible:

FlatGL2D s{{}, 1,1}; // calls checkLink internally
s.isLinkFinished(); // returns false

I've made a small example that works in this odd way on my pc:

Renderer: AMD RENOIR (DRM 3.41.0, 5.13.0-52-generic, LLVM 12.0.1) by AMD
OpenGL version: 4.6 (Core Profile) Mesa 21.2.6

I'm not sure if this is the norm, because I couldn't find anything in the khronos docs on whether it can't be used after linking. In my case, it even returns false when the shader is being used - it seems like the driver just needs some time to notify its completion query mechanism 🤔


Update: I've made a raw OpenGL example to make sure the result is not influenced by magnum in any way. Same behaviour: LINK_STATUS returns GL_TRUE, but querying completion does not

@dranikpg
Copy link
Contributor Author

dranikpg commented Aug 3, 2022

I'll give it another look when you get to the tests

@mosra just a friendly ping in case you didn't see my message above. No hurries if you're busy 😅 😓 I implemented the proposed test changes. If they look ok, I'll move to the other shaders. However, the is a small issue with the query correctness. I've described it above

@mosra
Copy link
Owner

mosra commented Aug 4, 2022

I saw the notifications but didn't find time yet, sorry for the wait again. I'll try to check this later today or tomorrow.

Copy link
Owner

@mosra mosra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be away until Monday, but I think that apart from these few comments this is ready to be replicated for other shaders.

Great work, thank you 👍

src/Magnum/GL/Test/AbstractShaderProgramGLTest.cpp Outdated Show resolved Hide resolved
src/Magnum/Shaders/Test/FlatGLTest.cpp Outdated Show resolved Hide resolved
src/Magnum/Shaders/Test/FlatGLTest.cpp Show resolved Hide resolved
src/Magnum/GL/AbstractShaderProgram.h Show resolved Hide resolved
Copy link
Owner

@mosra mosra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice, so the sleep was enough to fix that strange issue :)

It's just DistanceFieldVector left, right? Awesome. You also managed to match the library coding style quite well, if there will be any minor issues left I'll happily do that post-merge. And I can also write the overview docs / example snippet myself after -- don't want to put all the burden on your first contribution ;)

The linux-nondeprecated build is failing because it seems you're accidentally using a deprecated alias (MeshVisualizer2D instead of MeshVisualizerGL2D, the fomer is a relic of the past and not present on the non-deprecated build). Easiest way to discover all such cases is if you disable the MAGNUM_BUILD_DEPRECATED CMake option in your local build, then you get the same errors as on the CI.

@dranikpg
Copy link
Contributor Author

Ah, nice, so the sleep was enough to fix that strange issue :)

Well, it does it for the tests 😅 The API still doesn't work "as expected", but i've left a warning in the doc comments

if there will be any minor issues left I'll happily do that post-merge

I'll have a pass at the end and try to fix as much as I can. It's just that every time I forget how something is supposed to be aligned, I open a random file and look how its done there 😄 I know there is Corrades styleguide, but it doesn't cover cases like line-breaking a function with 5 arguments and 3 ifdefs

The linux-nondeprecated build is failing

Just didn't get to it. Thanks for the build option, I have been fishing them out of the logs. I'm more worried about AppVeyour... It spits out like 2k lines of warning in other parts of magnum and then the linker fails. There seems to be something wrong with the templated shaders (because MeshVisualizerGL2D is not part of the error). Specifically, the FooGL(Flags) constructor seems to be unresolved (after being moved to the header).

I don't have a dev-ready windows pc right next to me to test on, so it'll be cool if you could tell me the cause if you now it right away. In case this requires a little investigation, you can leave it on me.

@mosra
Copy link
Owner

mosra commented Aug 15, 2022

I'll have a pass at the end [...] line-breaking a function with 5 arguments and 3 ifdefs

🙇 thank you! For line breaking, it's comments on 79 cols (mainly just to make them easy to read, unless there's an excessively long identifier or expression that can't really be broken). For other stuff, having the whole signature as a single long line is fine. Preprocessor #ifdefs in signatures I indent with 4 extra spaces compared to surrounding code.

If I may, the only other thing I noticed is a space before :, I have it always without, be it a class base or a range-for. But again, don't worry too much, I can fix it up after. I can't stand the mess clang-format makes so there's no automated way and thus the formatting burden is on me.

then the linker fails

It complains about a vftable, which is a vtable, which is due to AbstractShaderProgram having a virtual destructor. My theory is that, IIRC, the vtable usually gets attached to the first or last method declaration in the class, or something like that. So if the definition is inline and it's combined with the fact that it's a template, then it could happen that the vtable isn't stored in the DLL. Or something, heh.

TL;DR: try to deinline the explicit FlatGL(NoInitT) {} and others, since these are the last methods in the class (i.e., make a = defaulted definition of them in the .cpp files), maybe that makes it go away?

I don't have a Windows machine around either, so I usually just abuse AppVeyor, pushing random tries until it starts cooperating. Feel free to do the same, haha.

@dranikpg dranikpg marked this pull request as ready for review August 24, 2022 18:23
@dranikpg
Copy link
Contributor Author

dranikpg commented Aug 24, 2022

I guess we don't need full checks in constructors

FooGL(CompileState&&) {
    CORRADE_INTERNAL_ASSERT_OUTPUT(checkLink());
    CORRADE_INTERNAL_ASSERT_OUTPUT(cs._vert.checkCompile());
    CORRADE_INTERNAL_ASSERT_OUTPUT(cs._frag.checkCompile());

because in case the first check fails, the other two fill be skipped. In case the first is successful, the others are guaranteed to be the same

Copy link
Owner

@mosra mosra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Except for one 32-bit-specific issue, all tests are passing, so that's great.

I didn't have time to make a final review pass the past weekend, but on a quick look everything looks alright. Your comment about CORRADE_INTERNAL_ASSERT_OUTPUT(cs._vert.checkCompile()); etc. makes sense, keep just the checkLink() assert there.

I'll hopefully find time to merge everything the following weekend. No further review rounds necessary I think, I just want to go over everything one last time locally. Thanks a lot for a top-notch contribution! 👍

src/MagnumExternal/OpenGL/GL/flextGL.h Outdated Show resolved Hide resolved
src/Magnum/GL/Implementation/ShaderState.h Outdated Show resolved Hide resolved
@mosra mosra changed the title WIP Async GL shader compilation Async GL shader compilation Sep 5, 2022
@mosra mosra mentioned this pull request Sep 7, 2022
14 tasks
@mosra
Copy link
Owner

mosra commented Sep 7, 2022

Okay, finally -- merged this as 96f97d4 and eb17d77, and wrote some introductory docs in 4580c30 and 48326ac.

What I only realized when testing locally was that the linker error message alone doesn't contain anything useful if the compilation failed -- somehow I falsely remembered that it contains also the compilation errors. So to provide the full context in case of a failure, as of c9d7365 the checkLink() now accepts also a list of the input shader instances, and it calls checkCompile() on those first. That felt like the easiest way to achieve this while keeping it as async as possible, compared to forcing the shader code to wait on / check for individual shader compilations. Finally, in eede671 I made a change to not need to include GL/Shader.h in all shader headers (which pulls in <vector> and other nasty STL stuff). I have to fix the root cause properly (#499), but until I find time to do that I didn't want to increase compile times for everybody.

But those are all just minor additions on top. Thanks again for this work, and sorry for the extreme delays! 👍

@mosra mosra closed this Sep 7, 2022
@mosra mosra mentioned this pull request Sep 8, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants