Assertion quasi-derivations #11955

Ericson2314 · 2024-11-25T19:06:28Z

A fixed-output derivation is nothing but a regular "floating" content-addressing derivation, along with an assertion that the output is in fact a certain content-address.

Other "output checks" can likewise be turned into assertions that the input satisfies the checks.

It would be conceptually cleaner to instead of making these checks part of a derivation (and thus have more knobs on what is a derivation) instead decompose them to special assertion nodes / in the derivation dependency graph --- or equivalently, special quasi derivations.

I had long thought this would be conceptually more elegant, but hadn't yet found a real tangible concrete use-case they would make things better from the user's perspective. But in #11954 I believe I finally found one. The short version of that is that for CA derivations, it is impossible simultaneously satisfy all of these:

Keep our current notation of unconditional immediate dependencies that always must be downloaded
Only do derivation substitutions (of placeholders today, or generalized versions of this one might imagine)
Support allowed/disallowed dependencies of things we might not end up having in our input runtime closure at all
Don't unnecessarily download stuff we don't actually need at build-time

Assertion quasi-derivations however provide a way out of that:

Dependencies can still be considered regular and unconditional (we have more types of nodes in our drv graph, but the dependency edge structure of each node is the same for all node types)
Rewriting nodes is still done exclusively for incoming immediate edges --- no "non-local" rewriting for transitive deps is ever required
Support arbitrary allowed/disallowed dependencies based on inputs
Since we can/must special-case these new node types, simply don't download anything at all based on them, we just need the rewrites to end up with the concrete store paths we need to scan for.

This nicely gets us all 4 desiderata, plus the separation of concerns (building vs linting), with minimal extra complexity.

Ericson2314 · 2024-11-25T19:31:44Z

@emilazy tells me about https://ninja-build.org/manual.html#validations, great prior art!

roberth · 2024-11-26T11:58:07Z

This would be an complement or alternative to #7662, which proposes to track validation using the string context, i.e. something handled above the store+build layer.

It would only be a complete alternative if it provides speculative building for increased concurrency, as mentioned in the issue, but also echoed in the ninja docs:

Marking the static analysis rule as an implicit input of the main build rule of the source files or of the rules that depend on the main build rule would slow down the critical path of the build, but using a validation would allow the build to proceed in parallel with the static analysis rule once the main build rule is complete.

A small amount of extra concurrency could be extracted by allowing to evaluate these validation nodes after evaluating the main derivation and after its dependents.

An combination of validation nodes and dynamic derivations may be of interest as well, as previously raised by amjoseph-nixpkgs:

Opt-in for running installCheckPhase on cross builds nixpkgs#273110 (comment)

(This would be independent of aforementioned evaluation optimization, unless we perform Nixpkgs evaluation as part of a set of (generated) dynamic derivations, which would require making Nixpkgs available to the builder (ie the store))

roberth · 2024-11-26T12:05:35Z

How exactly do these quasi-derivations fit into a build graph, and does that allow for speculative execution of dependent builds, i.e. assuming that the validation succeeds?
If the validation nodes are not between the dependency and dependent, this could still be compensated for by complicating the execution rules (i.e. the scheduling, or the graph-based "evaluation").

Ericson2314 · 2024-11-26T16:08:05Z

@roberth I initially wasn't thinking of fancier checks, and not speculative execution either. But it would also work for that.

(Indeed, doing a speculative build of an unverified impure floating CA derivation is a bit scary, but we can simply say there is no speculation if the input is impure.)

I would still have the validation nodes between the dependency and dependent, because otherwise we are scooping validations globally (which I don't think makes sense store-wide). But yes the scheduler can simply "rewrite" the graph so that the validations don't block anything, but it doesn't declare things finished until they pass.

Ericson2314 added the ca-derivations Derivations with content addressed outputs label Nov 25, 2024

This was referenced Nov 25, 2024

Better allowed/disallowed references, sound, and not causing spurious deps #11954

Open

stdenv: don't discard string context from ContentAddressed derivations NixOS/nixpkgs#214044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion quasi-derivations #11955

Assertion quasi-derivations #11955

Ericson2314 commented Nov 25, 2024

Ericson2314 commented Nov 25, 2024

roberth commented Nov 26, 2024

roberth commented Nov 26, 2024

Ericson2314 commented Nov 26, 2024

Assertion quasi-derivations #11955

Assertion quasi-derivations #11955

Comments

Ericson2314 commented Nov 25, 2024

Ericson2314 commented Nov 25, 2024

roberth commented Nov 26, 2024

roberth commented Nov 26, 2024

Ericson2314 commented Nov 26, 2024