-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 #1300
Open
yvettep321
wants to merge
338
commits into
google:v2
Choose a base branch
from
yvettep321:v2
base: v2
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
V2 #1300
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
My knowledge of python is not great, so this is kinda horrible. Two things: 1. If there were repetitions, for the RHS (i.e. the new value) we were always using the first repetition, which naturally results in incorrect change reports for the second and following repetitions. And what is even worse, that completely broke U test. :( 2. A better support for different repetition count for U test was missing. It's important if we are to be able to report 'iteration as repetition', since it is rather likely that the iteration count will mismatch. Now, the rough idea on how this is implemented now. I think this is the right solution. 1. Get all benchmark names (in order) from the lhs benchmark. 2. While preserving the order, keep the unique names 3. Get all benchmark names (in order) from the rhs benchmark. 4. While preserving the order, keep the unique names 5. Intersect `2.` and `4.`, get the list of unique benchmark names that exist on both sides. 6. Now, we want to group (partition) all the benchmarks with the same name. ``` BM_FOO: [lhs]: BM_FOO/repetition0 BM_FOO/repetition1 [rhs]: BM_FOO/repetition0 BM_FOO/repetition1 BM_FOO/repetition2 ... ``` We also drop mismatches in `time_unit` here. _(whose bright idea was it to store arbitrarily scaled timers in json **?!** )_ 7. Iterate for each partition 7.1. Conditionally, diff the overlapping repetitions (the count of repetitions may be different.) 7.2. Conditionally, do the U test: 7.2.1. Get **all** the values of `"real_time"` field from the lhs benchmark 7.2.2. Get **all** the values of `"cpu_time"` field from the lhs benchmark 7.2.3. Get **all** the values of `"real_time"` field from the rhs benchmark 7.2.4. Get **all** the values of `"cpu_time"` field from the rhs benchmark NOTE: the repetition count may be different, but we want *all* the values! 7.2.5. Do the rest of the u test stuff 7.2.6. Print u test 8. ??? 9. **PROFIT**! Fixes google#677
…master branch. (google#691) See google/googletest#1815 Fixes google#689.
The State constructor should not be part of the public API. Adding a utility method to BenchmarkInstance allows us to avoid leaking the RunInThread method into the public API.
Ok, so, i'm still trying to get to the state when it will be a trivial change to report all the separate iterations. The old code (LHS of the diff) was rather convoluted i'd say. I have tried to refactor it a bit into *small* logical chunks, with proper comments. As far as i can tell, i preserved the intent of the code, what it was doing before. The road forward still isn't clear, but i'm quite sure it's not with the old code :)
For several versions now, CMake by default refers to macOS’ Clang as AppleClang instead of just Clang, which would fail STREQUAL. Fixed by changing it to MATCHES.
As prevously written, "--benchmark_color=auto" was treated as true, because IsTruthyFlagValue("auto") returned true. The fix is to rely on IsColorTerminal test only if the flag value is "auto", and fall back to IsTruthyFlagValue otherwise. I also integrated force_no_color check into the same block.
* Fix SOURCE_DIR in HandleGTest.cmake If benchmark added as cmake subproject, HandleGTest throws an error as does return absolute source dir. Change it to , so it will be refering to it's own source dir.
This reverts commit 6097523.
If benchmark added as cmake subproject, HandleGTest throws an error as does return absolute source dir. Change it to , so it will be refering to it's own source dir. Also see PR google#703.
…le#707) That is the real purpose of that bool. A follow-up change will make it consider something else other than repetitions.
google#708) It is better to let the RunBenchmarks(), report() decide whether to actually *only* output aggregates or not, depending on whether there are actually aggregates. It's subtle indeed. Previously, `BenchmarkRunner()` always said that "if there are no repetitions, then you should never output only the repetitions". And the `report()` simply assumed that the `report_aggregates_only` bool it received makes sense, and simply used it. Now, the logic is the same, but the blame has shifted. `BenchmarkRunner()` always propagates what those benchmarks would have wanted to happen wrt the aggregates. And the `report()` lambda has to actually consider both the `report_aggregates_only` bool, and it's meaningfulness. To put it in the context of the patch series - if the repetition count was `1`, but `*_report_aggregates_only` was set to `true`, and we capture each iteration separately, then we will compute the aggregates, but then output everything, both the iteration, and aggregates, despite `*_report_aggregates_only` being set to `true`.
It is incorrect to say that an aggregate is computed over run's iterations, because those iterations already got averaged. Similarly, if there are N repetitions with 1 iterations each, an aggregate will be computed over N measurements, not 1. Thus it is best to simply use the count of separate reports. Fixes google#586.
s390 has another line structure for processor-field. It should be differently parsed.
This is the copy of patch proposed to LLVM's copy of benchmark via https://reviews.llvm.org/D52998.
…to atdt-report_loadavg
Used that example as a snippet, and it took a moment to notice what needed to be changed to make it compile..
std::tmpnam is deprecated and its use is discouraged. For our purposes in the tests, we really just need a file name which is unlikely to exist. This patch converts the tests to using a dummy random file name generator, which should hopefully avoid name conflicts.
) Unit-tests fail to build due to the following errors: /home/cfx/Dev/google-benchmark/benchmark.git/test/string_util_gtest.cc:12:5: required from here /home/cfx/Applications/googletest-1.8.1/include/gtest/gtest.h:1444:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] if (lhs == rhs) { ~~~~^~~~~~ Fixes google#741
* Adding Host Name and test * Addressing Review Comments * Adding Test for JSON Reporter * Adding HOST_NAME_MAX for MacOS systems * Adding Explaination for MacOS HOST_NAME_MAX Addition * Addressing Peer Review Comments * Adding codecvt in windows header guard * Changing name SystemInfo and adding empty message incase host name fetch fails * Adding Comment on Struct SystemInfo
It should now match reality.
As pointed out in IRC, these are not documented.
Some benchmarks are particularly sensitive and they run in less than a nanosecond. In order for the console reporter to provide meaningful output for such benchmarks it needs to be able to display the times using more resolution than a single nanosecond. This patch changes the console reporter to print at least three significant digits for all results. Unlike the initial attempt, this patch does not align the decimal point.
* Fix argument order in StrSplit * Update AUTHORS, CONTRIBUTORS
…flag `--benchmark_repetitions=`
…erve() It takes the whole total new capacity, not the increase.
It may be useful for those wishing to further post-process JSON results, but it is mainly geared towards better support for run interleaving, where results from the same family may not be close-by in the JSON. While we won't be able to do much about that for outputs, the tools can and perhaps should reorder the results to that at least in their output they are in proper order, not run order. Note that this only counts the families that were filtered-in, so if e.g. there were three families, and we filtered-out the second one, the two families (which were first and third) will have family indexes 0 and 1.
Much like it makes sense to enumerate all the families, it makes sense to enumerate stuff within families. Alternatively, we could have a global instance index, but i'm not sure why that would be better. This will be useful when the benchmarks are run not in order, for the tools to sort the results properly.
While the current variant works, it assumes that all the instances of a single family will be run together, with nothing inbetween them. Naturally, that won't work once the runs may be interleaved.
Currently, the tooling just keeps the whatever benchmark order that was present, and this is fine nowadays, but once the benchmarks will be optionally run interleaved, that will be rather suboptimal. So, now that i have introduced family index and per-family instance index, we can define an order for the benchmarks, and sort them accordingly. There is a caveat with aggregates, we assume that they are in-order, and hopefully we won't mess that order up..
Based on original implementation by Hai Huang @haih-g in google#1105
Currently the lifetime of a single BenchmarkRunner is constrained to a RunBenchmark(), but that will have to change for interleaved benchmark execution, because we'll need to keep it around to not forget how much repetitions of an instance we've done.
…e#1169) * Fix leak in test, and provide path to remove leak from library * make doc change
…le#1051) (google#1163) Inspired by the original implementation by Hai Huang @haih-g from google#1105. The original implementation had design deficiencies that weren't really addressable without redesign, so it was reverted. In essence, the original implementation consisted of two separateable parts: * reducing the amount time each repetition is run for, and symmetrically increasing repetition count * running the repetitions in random order While it worked fine for the usual case, it broke down when user would specify repetitions (it would completely ignore that request), or specified per-repetition min time (while it would still adjust the repetition count, it would not adjust the per-repetition time, leading to much greater run times) Here, like i was originally suggesting in the original review, i'm separating the features, and only dealing with a single one - running repetitions in random order. Now that the runs/repetitions are no longer in-order, the tooling may wish to sort the output, and indeed `compare.py` has been updated to do that: google#1168.
* Enable various sanitizer builds in github actions * try with off the shelf versions * nope * specific version? * rats * oops * remove msan for now * reorder so env is set before building libc++
* Use modern clang/libc++ for sanitizers * update ubuntu * new llvm builds differently * clang, not clang-3.8 * just build what we need
Some downstream projects (e.g. V8) treat warnings as errors and cannot roll the latest changes.
…#1179) This can be used together with ArgsProduct() to allow multiple ranges with different multipliers and mixing dense and sparse ranges. Example: BENCHMARK(MyTest)->ArgsProduct({ CreateRange(0, 1024, /*multi=*/32), CreateRange(0, 100, /*multi=*/4), CreateDenseRange(0, 4, /*step=*/1) }); Co-authored-by: Jen-yee Hong <[email protected]>
* Add missing trailing commas Fixes google#1181 * Better trailing commas
This avoids clashes with other libraries that might define the same flags.
is this bringing the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thank you