Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

Merged
merged 14 commits into from
Dec 1, 2020

Conversation

francispoulin
Copy link
Collaborator

This generalized the runge_kutta3.jl script to work for any model. Also, it combines the time stepping for the fields, say velocities, and the tracers into one loop. Thanks to @glwagner and @ali-ramadhan !

I do get the following warnings that I would like to eliminate. Any suggestions how i can do that?

WARNING: ignoring conflicting import of Models.fields into IncompressibleModels
WARNING: using Models.fields in module Oceananigans conflicts with an existing identifier.

@francispoulin
Copy link
Collaborator Author

Given some checks were not successful I now see that there are some problems with this PR. I glanced at the errors but I don't prettend to understand what's going on.

@ali-ramadhan
Copy link
Member

Ah I think the method conflict warnings were because function fields end was being defined in src/Oceananigans.jl and in src/Models/Models.jl. I think you meant to only define it in src/Oceananigans.jl since it's needed in src/TimeSteppers.jl (which is defined before src/Models/Models.jl).

By the time I figured this out I already modified quite a bit so I just pushed my changes, but warnings should be gone now!

@ali-ramadhan
Copy link
Member

I think the tests almost pass! It's just a few tests in test_abstract_operations.jl that don't: https://buildkite.com/clima/oceananigans/builds/670#3ec51acd-7496-449e-afb6-ace178715cf3/14-394

Maybe they're just missing a using Oceananigans: fields or maybe fields needs to be exported from the Oceananigans and/or Models modules?

@francispoulin francispoulin changed the title Fjp/generalize runge kutta 3 Generalize runge kutta 3 for any model Nov 25, 2020
@francispoulin
Copy link
Collaborator Author

Thanks @ali-ramadhan for finding the problem and fixing it.

I will pull the updated code and try the test that you pointed out. Not sure I will be able to figure it out or not though but I will try.

@francispoulin francispoulin changed the title Generalize runge kutta 3 for any model Generalize unge kutta 3 and QuasiAdamsBashforth2 for any model Nov 25, 2020
@francispoulin francispoulin changed the title Generalize unge kutta 3 and QuasiAdamsBashforth2 for any model Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model Nov 25, 2020
Copy link
Collaborator Author

@francispoulin francispoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a pretty straightforward change. One benefits is that we don't have to double up the time stepping for tracers, as they are done at the same time as the other fields, and it does allow time stepping for virtually any model.

I am happy that the tests all seem to pass and no more warnings, thanks to @ali-ramadhan

Copy link
Member

@ali-ramadhan ali-ramadhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Less code, more features too.

I think there are some conflicts with the master branch so you might have to git merge master into this branch and resolve the conflicts locally and push the changes before the PR can be merged.

Sometimes the conflicts are pretty minor in which case you could resolve them directly on GitHub. But in this case GitHub says they're too complex.

@francispoulin
Copy link
Collaborator Author

Looks great! Less code, more features too.

I think there are some conflicts with the master branch so you might have to git merge master into this branch and resolve the conflicts locally and push the changes before the PR can be merged.

Sometimes the conflicts are pretty minor in which case you could resolve them directly on GitHub. But in this case GitHub says they're too complex.

Sounds good. Luckily those are files that I created. I was planning on fixing them up tomorrow anyhow, now theres just added incentive.

@glwagner
Copy link
Member

Benchmark? The reason we combined the updates for velocities was a perceived performance gain. Probably we were wrong about that, but it'd be good to show it.

@francispoulin
Copy link
Collaborator Author

Benchmark? The reason we combined the updates for velocities was a perceived performance gain. Probably we were wrong about that, but it'd be good to show it.

Do I understand that to mean that you want to test the performance of the code before and after the PR? If there are tests that I can run to do this with both versions, I would be happy to try that on my desktop, only CPU, but that probably wouldn't be as nice as trying it on a better computer.

@ali-ramadhan
Copy link
Member

So there's a script (https://github.com/CliMA/Oceananigans.jl/blob/master/benchmark/benchmark_regression.jl) that benchmarks the current branch against the master branch but it doesn't currently print the results so I'll fix it and run it on this branch on Tartarus.

@glwagner
Copy link
Member

Do I understand that to mean that you want to test the performance of the code before and after the PR? If there are tests that I can run to do this with both versions, I would be happy to try that on my desktop, only CPU, but that probably wouldn't be as nice as trying it on a better computer.

We would like to test the performance of the version of Oceananigans on this PR versus Oceananigans#master on the CPU and GPU and for a variety of problem sizes. Maintaining good performance is a top priority of ours. Generally speaking we would like to avoid performance regressions --- even small ones (which accumulated over many PRs could become significant).

@francispoulin
Copy link
Collaborator Author

francispoulin commented Nov 26, 2020

New branch fjp/generalize-runge-kutta-3:

I did 2 trials to try and get an idea of the variance we can expect. Sorry if this is too much information.

Trial 1:

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.770 ms │   3.925 ms │   3.975 ms │   4.535 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  24.751 ms │  24.945 ms │  25.124 ms │  26.909 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 218.012 ms │ 218.721 ms │ 219.037 ms │ 220.987 ms │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   4.253 ms │   4.437 ms │   4.509 ms │   5.229 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  29.137 ms │  29.446 ms │  29.689 ms │  31.794 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 257.251 ms │ 258.619 ms │ 259.852 ms │ 270.451 ms │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.489 ms │   2.591 ms │   2.755 ms │   3.150 ms │ 814.41 KiB │  11740 │
│           GPU │     Float32 │  64 │  10.374 ms │  13.950 ms │  13.590 ms │  14.010 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 128 │  88.020 ms │ 125.190 ms │ 122.408 ms │ 133.906 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   5.323 ms │   5.438 ms │   5.431 ms │   5.573 ms │ 892.33 KiB │  11574 │
│           GPU │     Float64 │  64 │  34.741 ms │  43.748 ms │  42.586 ms │  44.978 ms │ 892.30 KiB │  11580 │
│           GPU │     Float64 │ 128 │ 279.110 ms │ 333.392 ms │ 328.209 ms │ 335.085 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 16:03:50.829] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.51499 │ 3.28804 │ 6.12735 │
│     Float32 │  64 │  1.78816 │ 3.28791 │ 6.13048 │
│     Float32 │ 128 │   1.7471 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │  0.81598 │ 2.97644 │ 6.04071 │
│     Float64 │  64 │ 0.673082 │ 2.97634 │ 6.04384 │
│     Float64 │ 128 │ 0.775721 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

Trial 2:

                                       Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.658 ms │   3.795 ms │   3.878 ms │   4.654 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  25.066 ms │  25.346 ms │  25.454 ms │  26.454 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 212.990 ms │ 213.482 ms │ 215.127 ms │ 224.655 ms │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   4.111 ms │   4.206 ms │   4.303 ms │   5.146 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  31.135 ms │  32.233 ms │  32.367 ms │  33.781 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 252.694 ms │ 262.043 ms │ 263.575 ms │ 275.524 ms │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.458 ms │   2.506 ms │   2.572 ms │   3.091 ms │ 814.19 KiB │  11726 │
│           GPU │     Float32 │  64 │  12.548 ms │  15.949 ms │  15.599 ms │  16.169 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 128 │  79.348 ms │ 117.337 ms │ 114.294 ms │ 121.564 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   5.351 ms │   5.401 ms │   5.461 ms │   5.947 ms │ 893.42 KiB │  11644 │
│           GPU │     Float64 │  64 │  37.831 ms │  38.582 ms │  38.586 ms │  39.330 ms │ 892.30 KiB │  11580 │
│           GPU │     Float64 │ 128 │ 265.749 ms │ 335.525 ms │ 326.271 ms │ 353.112 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 18:37:36.115] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.51403 │ 3.28716 │ 6.12004 │
│     Float32 │  64 │  1.58924 │ 3.28791 │ 6.13048 │
│     Float32 │ 128 │  1.81939 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │ 0.778644 │ 2.98009 │ 6.07724 │
│     Float64 │  64 │ 0.835425 │ 2.97634 │ 6.04384 │
│     Float64 │ 128 │ 0.780995 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

Old branch master:

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.731 ms │   4.014 ms │   4.048 ms │   4.752 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │  64 │  25.071 ms │  25.897 ms │  26.004 ms │  27.032 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 128 │ 214.549 ms │ 216.681 ms │ 218.408 ms │ 227.438 ms │ 242.42 KiB │   1876 │
│           CPU │     Float64 │  32 │   4.230 ms │   4.334 ms │   4.430 ms │   5.244 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │  64 │  28.847 ms │  29.348 ms │  29.573 ms │  30.704 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 128 │ 254.216 ms │ 254.715 ms │ 255.230 ms │ 260.031 ms │ 293.44 KiB │   1876 │
│           GPU │     Float32 │  32 │   2.474 ms │   2.625 ms │   2.764 ms │   3.510 ms │ 802.67 KiB │  11417 │
│           GPU │     Float32 │  64 │  10.381 ms │  13.617 ms │  13.292 ms │  13.719 ms │ 802.48 KiB │  11413 │
│           GPU │     Float32 │ 128 │  76.589 ms │ 114.593 ms │ 113.372 ms │ 132.651 ms │ 802.48 KiB │  11413 │
│           GPU │     Float64 │  32 │   5.366 ms │   5.420 ms │   5.439 ms │   5.610 ms │ 877.02 KiB │  11251 │
│           GPU │     Float64 │  64 │  33.735 ms │  38.491 ms │  38.027 ms │  38.614 ms │ 876.83 KiB │  11247 │
│           GPU │     Float64 │ 128 │ 293.481 ms │ 316.512 ms │ 316.715 ms │ 343.279 ms │ 876.83 KiB │  11247 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 16:09:31.583] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.52907 │ 3.31105 │ 6.08582 │
│     Float32 │  64 │  1.90176 │ 3.31028 │ 6.08369 │
│     Float32 │ 128 │  1.89087 │ 3.31028 │ 6.08369 │
│     Float64 │  32 │ 0.799624 │ 2.98876 │ 5.99733 │
│     Float64 │  64 │  0.76246 │ 2.98813 │  5.9952 │
│     Float64 │ 128 │ 0.804754 │ 2.98813 │  5.9952 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

For this one single test (clearly more needs to be done) it seems that on average the speedup is slightly lower and the memory is also slightly lower, compared to master

@francispoulin
Copy link
Collaborator Author

francispoulin commented Nov 26, 2020

New branch fjp/generalize-runge-kutta-3:

                              Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬─────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │  allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼─────────┤
│  512 │       1 │ 23.857 s │ 23.857 s │ 23.857 s │ 23.857 s │ 300.64 KiB │    1970 │
│  512 │       2 │ 19.049 s │ 19.049 s │ 19.049 s │ 19.049 s │ 127.11 MiB │ 8291846 │
│  512 │       4 │  9.636 s │  9.636 s │  9.636 s │  9.636 s │  59.15 MiB │ 3839470 │
│  512 │       8 │  5.231 s │  5.231 s │  5.231 s │  5.231 s │  25.81 MiB │ 1644097 │
│  512 │      16 │  3.759 s │  3.786 s │  3.786 s │  3.814 s │ 119.15 MiB │ 1939597 │
│  512 │      32 │  3.517 s │  3.521 s │  3.521 s │  3.524 s │  11.72 MiB │  648129 │
│  512 │      36 │  3.532 s │  3.533 s │  3.533 s │  3.534 s │  11.09 MiB │  566273 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴─────────┘
[2020/11/26 18:22:08.583] INFO  Writing Multithreading_benchmarks.html...
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │  1.2524 │ 432.949 │ 4209.06 │
│  512 │       4 │ 2.47574 │ 201.484 │ 1948.97 │
│  512 │       8 │ 4.56069 │  87.897 │ 834.567 │
│  512 │      16 │ 6.30058 │ 405.837 │ 984.567 │
│  512 │      32 │ 6.77607 │ 39.9205 │ 328.999 │
│  512 │      36 │ 6.75252 │ 37.7587 │ 287.448 │
└──────┴─────────┴─────────┴─────────┴─────────┘

Old branch master:

                             Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬──────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │   allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼──────────┤
│  512 │       1 │ 24.654 s │ 24.654 s │ 24.654 s │ 24.654 s │ 294.28 KiB │     1930 │
│  512 │       2 │ 21.380 s │ 21.380 s │ 21.380 s │ 21.380 s │ 172.25 MiB │ 11250274 │
│  512 │       4 │  9.585 s │  9.585 s │  9.585 s │  9.585 s │  63.01 MiB │  4093201 │
│  512 │       8 │  5.417 s │  5.417 s │  5.417 s │  5.417 s │  34.91 MiB │  2242285 │
│  512 │      16 │  3.989 s │  3.991 s │  3.991 s │  3.993 s │ 123.02 MiB │  2196707 │
│  512 │      32 │  3.655 s │  3.676 s │  3.676 s │  3.698 s │  11.86 MiB │   663272 │
│  512 │      36 │  3.783 s │  3.794 s │  3.794 s │  3.804 s │ 115.48 MiB │  1646037 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴──────────┘
[2020/11/26 16:58:13.592] INFO  Writing Multithreading_benchmarks.html...
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │ 1.15317 │ 599.384 │ 5829.16 │
│  512 │       4 │ 2.57214 │ 219.267 │ 2120.83 │
│  512 │       8 │ 4.55103 │ 121.466 │ 1161.81 │
│  512 │      16 │ 6.17726 │ 428.071 │ 1138.19 │
│  512 │      32 │ 6.70601 │ 41.2683 │ 343.664 │
│  512 │      36 │ 6.49899 │ 401.838 │ 852.869 │
└──────┴─────────┴─────────┴─────────┴─────────┘

Seems comparable to me but two observations:

  1. New branch typically uses less memory
  2. Mean of the new branch tends to be faster.

@ali-ramadhan
Copy link
Member

Thanks for running the benchmarks!

benchmark_incompressible_model.jl only times 10 time steps so the statistics probably aren't super robust but it does seem that the incompressible model has slowed down a bit in all cases...

@ali-ramadhan
Copy link
Member

I ran the benchmark_incompressible_model.jl script on the master branch (twice) and this branch (also twice), and
actually see a tiny bit of a speedup, maybe only significant for larger CPU models though.

Hard to say whether it's noise, it might be more due to other processes causing small variations in runtime.

To me I don't think this PR slows down or speeds up the code, but it simplifies and improves the time stepping code so it should be merged.

There's a few more memory allocations now (due to extra kernel launches) but this shouldn't affect performance.

System info

Oceananigans v0.44.1
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

Master branch

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   5.399 ms │   5.668 ms │   5.758 ms │   7.186 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │  64 │  36.710 ms │  37.583 ms │  37.974 ms │  41.678 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 128 │ 312.780 ms │ 313.477 ms │ 313.622 ms │ 314.726 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 256 │    2.802 s │    2.819 s │    2.819 s │    2.836 s │ 242.42 KiB │   1876 │
│           CPU │     Float64 │  32 │   5.828 ms │   6.049 ms │   6.157 ms │   7.044 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │  64 │  43.084 ms │  43.619 ms │  43.650 ms │  44.363 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 128 │ 365.051 ms │ 365.317 ms │ 365.475 ms │ 366.288 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 256 │    3.602 s │    3.653 s │    3.653 s │    3.703 s │ 293.44 KiB │   1876 │
│           GPU │     Float32 │  32 │   2.797 ms │   2.870 ms │   2.918 ms │   3.435 ms │ 802.70 KiB │  11419 │
│           GPU │     Float32 │  64 │   3.120 ms │   3.207 ms │   3.300 ms │   4.224 ms │ 802.52 KiB │  11415 │
│           GPU │     Float32 │ 128 │   4.019 ms │   4.066 ms │   4.192 ms │   5.244 ms │ 802.52 KiB │  11415 │
│           GPU │     Float32 │ 256 │  15.942 ms │  23.497 ms │  22.763 ms │  23.588 ms │ 802.48 KiB │  11413 │
│           GPU │     Float64 │  32 │   3.079 ms │   3.166 ms │   3.226 ms │   3.728 ms │ 877.05 KiB │  11253 │
│           GPU │     Float64 │  64 │   3.458 ms │   3.522 ms │   3.591 ms │   3.981 ms │ 876.86 KiB │  11249 │
│           GPU │     Float64 │ 128 │   4.536 ms │   4.572 ms │   4.723 ms │   6.000 ms │ 876.58 KiB │  11231 │
│           GPU │     Float64 │ 256 │  21.794 ms │  32.107 ms │  31.073 ms │  32.198 ms │ 876.83 KiB │  11247 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬─────────┬─────────┬─────────┐
│ Float_types │  Ns │ speedup │  memory │  allocs │
├─────────────┼─────┼─────────┼─────────┼─────────┤
│     Float32 │  32 │  1.9749 │ 3.31118 │ 6.08689 │
│     Float32 │  64 │   11.72 │ 3.31041 │ 6.08475 │
│     Float32 │ 128 │ 77.0954 │ 3.31041 │ 6.08475 │
│     Float32 │ 256 │  119.98 │ 3.31028 │ 6.08369 │
│     Float64 │  32 │ 1.91043 │ 2.98887 │  5.9984 │
│     Float64 │  64 │ 12.3861 │ 2.98823 │ 5.99627 │
│     Float64 │ 128 │ 79.9049 │ 2.98727 │ 5.98667 │
│     Float64 │ 256 │ 113.772 │ 2.98813 │  5.9952 │
└─────────────┴─────┴─────────┴─────────┴─────────┘

This branch

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   5.444 ms │   5.636 ms │   6.018 ms │   9.357 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  36.689 ms │  37.147 ms │  37.348 ms │  38.446 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 314.926 ms │ 316.673 ms │ 318.545 ms │ 338.621 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 256 │    2.778 s │    2.781 s │    2.781 s │    2.783 s │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   5.735 ms │   6.063 ms │   6.136 ms │   7.018 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  43.243 ms │  43.446 ms │  43.607 ms │  44.871 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 366.596 ms │ 367.479 ms │ 367.682 ms │ 369.125 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 256 │    3.281 s │    3.331 s │    3.331 s │    3.381 s │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.888 ms │   2.939 ms │   2.994 ms │   3.485 ms │ 814.47 KiB │  11744 │
│           GPU │     Float32 │  64 │   3.148 ms │   3.224 ms │   3.293 ms │   3.913 ms │ 814.28 KiB │  11740 │
│           GPU │     Float32 │ 128 │   4.002 ms │   4.089 ms │   4.210 ms │   5.287 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 256 │  16.015 ms │  23.712 ms │  22.928 ms │  23.994 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   3.159 ms │   3.190 ms │   3.249 ms │   3.757 ms │ 892.39 KiB │  11578 │
│           GPU │     Float64 │  64 │   3.472 ms │   3.534 ms │   3.640 ms │   4.632 ms │ 892.20 KiB │  11574 │
│           GPU │     Float64 │ 128 │   4.479 ms │   4.537 ms │   4.700 ms │   6.206 ms │ 891.98 KiB │  11560 │
│           GPU │     Float64 │ 256 │  21.481 ms │  31.610 ms │  30.599 ms │  31.686 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬─────────┬─────────┬─────────┐
│ Float_types │  Ns │ speedup │  memory │  allocs │
├─────────────┼─────┼─────────┼─────────┼─────────┤
│     Float32 │  32 │ 1.91728 │ 3.28829 │ 6.12944 │
│     Float32 │  64 │ 11.5212 │ 3.28753 │ 6.12735 │
│     Float32 │ 128 │ 77.4361 │ 3.28791 │ 6.13048 │
│     Float32 │ 256 │ 117.271 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │ 1.90073 │ 2.97665 │  6.0428 │
│     Float64 │  64 │ 12.2925 │ 2.97603 │ 6.04071 │
│     Float64 │ 128 │    81.0 │  2.9753 │  6.0334 │
│     Float64 │ 256 │ 105.386 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴─────────┴─────────┴─────────┘

@glwagner
Copy link
Member

glwagner commented Dec 1, 2020

I think a slow down for small models, speed up for large models makes sense given that this PR splits one relatively large kernel into three smaller ones (two times). Seems like an acceptable trade off to me (and also nearly unnoticeable). Why are the validation experiments failing?

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Dec 1, 2020

Ah we can ignore the validation experiments pipeline failure.

It's failing because .buidlkite/validation-pipeline.yml is not on this branch. I've since disabled GitHub triggers for the validation pipeline. Now it's triggered every night at 3 am EST and can be triggered manually but needs more work at PR #1223.

@francispoulin
Copy link
Collaborator Author

@ali-ramadhan Any idea why there are two failures above? Should we fix these before we merge? If it's a bit tricky I'm happy to leave the merging upto you, as you are co-author on this PR. Happy to chat if that would help.

@ali-ramadhan
Copy link
Member

@francispoulin Ah those two failures are unrelated to this PR (it's the validation experiments pipeline failure I mentioned above) so this should be good to merge if you're happy with the PR.

@francispoulin
Copy link
Collaborator Author

Thank you @ali-ramadhan ! That is what I thought but wanted to make sure.

I will now go and press the big red button, at long last.

@francispoulin francispoulin merged commit 1b2efe5 into master Dec 1, 2020
@francispoulin francispoulin deleted the fjp/generalize-runge-kutta-3 branch December 1, 2020 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants