Convergence test and validation experiment Buildkite pipeline #1223

ali-ramadhan · 2020-11-27T12:21:38Z

This PR finally sets up a CI pipeline to run convergence tests and validation experiments for CPU and GPU on Buildkite.

Don't think this should run on every push like the main pipeline does and I couldn't figure out how to trigger it via a GitHub comment (see CliMA/slurm-buildkite#13).

We can trigger this pipeline manually from Buildkite and I've scheduled it to run every night at 3am EST (on the master branch).

Cool thing is that it uploads the convergence plots as artifacts so we can view them from Buildkite!

Resolves #1216

ali-ramadhan · 2020-11-27T12:54:23Z

Quite concerning that some of the convergence tests do not pass on the GPU...

This is probably due to #1170 since the simpler convergence tests that do not rely on a pressure solver seem to pass (also the fact that it passes forced flow free slip with doubly periodic (x, z) but not with a wall-bounded dimension (x, y)). Hmmm, but the Taylor-Green one is doubly periodic...

glwagner

This is great. Some minor comments:

test/test_convergence.jl is rather boiler-plate-y. Is there any way to use a list of convergence tests scripts? This will make it easier to add new tests in the future...
Could make sense to raise an issue about converting convergence tests plots to Plots.jl or CairoMakie.jl or whatever when this PR is merged, so we don't have to wrestle PyPlot in the test scripts.

ali-ramadhan · 2021-01-27T15:43:27Z

test/test_convergence.jl is rather boiler-plate-y. Is there any way to use a list of convergence tests scripts? This will make it easier to add new tests in the future...

Yeah it could be more automated. I'll give it a try once the tests are all passing.

Could make sense to raise an issue about converting convergence tests plots to Plots.jl or CairoMakie.jl or whatever when this PR is merged, so we don't have to wrestle PyPlot in the test scripts.

I don't think it's worth spending much effort on switching plotting libraries (especially since the plots work now) but I'll use CairoMakie.jl for future plots. CairoMakie.jl doesn't support log axes yet (gotta hack it in for now) but hopefully it'll get added soon.

navidcy · 2021-01-28T08:48:32Z

Some tests fail..

ali-ramadhan · 2021-01-28T15:08:17Z

All tests should pass (see https://buildkite.com/clima/oceananigans-validation-experiments/builds/96) but Tartarus went down so a lot of builds died leading to failing tests.

glwagner · 2021-01-28T17:21:28Z

validation/convergence_tests/one_dimensional_advection_schemes.jl

        xlabel(L"N_x")
        ylabel("\$L\$-norms of \$ | c_\\mathrm{sim} - c_\\mathrm{analytical} |\$")
        removespines("top", "right")
-        legend = legend(loc="upper right", bbox_to_anchor=(1.4, 1.0), prop=Dict(:size=>6))
+        lgd = legend(loc="upper right", bbox_to_anchor=(1.4, 1.0), prop=Dict(:size=>6))


ali-ramadhan · 2021-01-29T15:28:41Z

44 successful checks 🚀

I think this PR is ready to be merged. All the existing convergence tests are now in CI and pass, and the plots are uploaded as Buildkite artifacts. This PR might conflict with @francispoulin's PR #1276 (I'm happy to fix conflicts no matter which PR is merged first).

There are a few tests that take a long time. This can be shortened in a future PR as it might involve some trial and error and fiddling with rate of convergence tolerances. Other validation experiments should also be added.

Right now I have set the validation pipeline to only run on master (and every night at 3am ET).

francispoulin · 2021-01-29T15:32:39Z

44 successful checks

I think this PR is ready to be merged. All the existing convergence tests are now in CI and pass, and the plots are uploaded as Buildkite artifacts. This PR might conflict with @francispoulin's PR #1276 (I'm happy to fix conflicts no matter which PR is merged first).

There are a few tests that take a long time. This can be shortened in a future PR as it might involve some trial and error and fiddling with rate of convergence tolerances. Other validation experiments should also be added.

Right now I have set the validation pipeline to only run on master (and every night at 3am ET).

Please go ahead and don't wait for me. I will work around whatever you have done here in my PR.

ali-ramadhan added 27 commits November 26, 2020 18:55

Start a validation pipeline

757ff2f

Fix typo in pipeline.yml

386544d

Turn ConvergenceTests into a true package

63f46a4

Split up convergence tests into separate groups

07c71d5

Fix convergence test

671cc50

Convergence plots as artifacts

ec7ae60

PyPlot plz work

d049444

Escape double quotes

ee8b232

Fix --project

164a8ac

Don't use system Python

ce9e61d

Run point decay convergence test on CPU and GPU

78dc2ef

Last resort

81d37a7

Last last resort

aa2f5d1

Fix artifact paths

7b8fc06

Fix artifact path again

a91dd68

Include a couple more convergence tests

b5583eb

Fixup and 2D diffusion

ecba2d4

Add the rest of the convergence tests

fcebff2

Fixes

51dd2e7

Clean new validation pipeline

89d7e64

More minute fixes

35c5f25

Even more fixes

23609cc

Final fixes for the day

75406c3

Overnight fixes

fe9551c

Try to get validation working on Tartarus

f2383be

Ensure each pipeline uses separate depots

a5ec05e

Remove bad call to CUDA.versioninfo()

ba1cc4c

ali-ramadhan requested a review from glwagner November 27, 2020 12:21

Never ending fixes on this pipeline

3669b05

ali-ramadhan added 6 commits November 29, 2020 17:50

Revive cosine convergence test and fix it for GPU

4452879

Cosine convergence test without scalar ops

cf6861c

Revive Gaussian advection-diffusion convergence test

d722abe

basex is being deprecated

adbe6cb

Build PyCall and add PyPlot in init GPU job

6ae43c8

Fix typo in validation-pipeline.yml

1eb1165

ali-ramadhan mentioned this pull request Dec 1, 2020

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

Merged

ali-ramadhan and others added 2 commits December 2, 2020 11:44

Try using tkagg matplotlib backend

ddc754b

merge master

4c3e8f9

glwagner approved these changes Dec 2, 2020

View reviewed changes

Merge branch 'master' into ali/validation-pipeline

33f5a66

ali-ramadhan added 7 commits January 27, 2021 11:29

Fix Taylor-Green ROC atol

152e053

New way of setting basex=2 and basey=10

87252f9

Update more ROC atol bounds

4f12b43

Fix and update 1D advection convergence test

434bd3e

Add advection schemes convergence test to pipeline

d63f62b

Fix plot and add title

0c47bc5

Update Manifest.toml

7dc7434

ali-ramadhan marked this pull request as draft January 28, 2021 16:16

ali-ramadhan marked this pull request as ready for review January 28, 2021 16:16

Trigger Buildkite

6e85232

glwagner reviewed Jan 28, 2021

View reviewed changes

ali-ramadhan merged commit d5171c3 into master Jan 29, 2021

ali-ramadhan deleted the ali/validation-pipeline branch January 29, 2021 15:37

ali-ramadhan mentioned this pull request Feb 3, 2021

one_dimensional_cosine_advection_diffusion.jl fails #1273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convergence test and validation experiment Buildkite pipeline #1223

Convergence test and validation experiment Buildkite pipeline #1223

ali-ramadhan commented Nov 27, 2020 •

edited

Loading

ali-ramadhan commented Nov 27, 2020 •

edited

Loading

glwagner left a comment

ali-ramadhan commented Jan 27, 2021

navidcy commented Jan 28, 2021

ali-ramadhan commented Jan 28, 2021

glwagner Jan 28, 2021

ali-ramadhan commented Jan 29, 2021

francispoulin commented Jan 29, 2021

Convergence test and validation experiment Buildkite pipeline #1223

Convergence test and validation experiment Buildkite pipeline #1223

Conversation

ali-ramadhan commented Nov 27, 2020 • edited Loading

ali-ramadhan commented Nov 27, 2020 • edited Loading

glwagner left a comment

Choose a reason for hiding this comment

ali-ramadhan commented Jan 27, 2021

navidcy commented Jan 28, 2021

ali-ramadhan commented Jan 28, 2021

glwagner Jan 28, 2021

Choose a reason for hiding this comment

ali-ramadhan commented Jan 29, 2021

francispoulin commented Jan 29, 2021

ali-ramadhan commented Nov 27, 2020 •

edited

Loading

ali-ramadhan commented Nov 27, 2020 •

edited

Loading