Let us start with a couple of old jokes:
And with that said we can start look at some of the problems with benchmarks.
The general problem with benchmarks is that the benchmark and the actual application may behave differently.
The actual application has some performance characterstics - it can be CPU bound, it can be IO bound or it can be memory bound. CPU usage is a mix of integer operations, floating point operations and string operations. IO can be read only, write only or read and write. IO can be sequential or random.
The chance that the actual application is really close to any benchmark in performance characteristics is small.
Also the performance characterstics are impacted by the involved hardware and software.
The relative speed of different code may be different among different CPU architectures: x86-64, ARM, Power, RISC-V.
The relative speed of different code may even be different between different implementations of the same CPU architecture and therefore change after a hardware upgrade.
The relative speed of different code may depend on the level of optimization done by the compiler and therefore change after a compiler upgrade.
The relative speed of different code may depend on the platform (runtime library, OS etc.) and therefore change after a platform upgrade.
The only reeally valid benchmark is the actual application on the actual system.
But for practical reasons syntetical benchmarks are often used anyway.
Lots of standard benchmarks exist. To list a few well-known:
CPU speed benchmarks has some unique problems.
Problem: the benchmark is so small and only use a few constructs and as a result the benchmark only measure those few constructs and that may not be valid for more general cases.
Mitigation: use a large benchmark (suite) that covers a lot.
Problem: sometimes the optimizers detect that there is no point in executing the core benchmark code because the result is not used for anything and as a result completely eliminate the core benchmark code showing way to good results.
Mitigation: prevent such optimization - often by calling external (non-optimizable) code with the result from the benchmark.
Problem: on a multi-user OS there may be a difference between wall time usage and CPU time usage if the process is swapped out of CPU during the benchmark for something else to run on the CPU.
Mitigation: try not to run anything else during benchmark *and* run the bencmark multiple times to avoid rare incidences of something else running.
Problem: there are special problems when it comes to JIT compiled languages.
Measuring AOT compiled (native) code is simple:
But if we look at .NET code then the JIT compilation is happening the first time the code is called (unless NGEN is used):
Java is even worse as it starts by interpreting the byte code and first do the JIT compilation when it detects that the code is being executed a lot:
Mitigation: use "warmup" iterations - before starting measuring call the code some times to "warmup" aka trigger the JIT compilation.
The article Language Speed has benchmark code in many languages (Java, C#, C, Pascal, Ada, Python, PHP).
But special benchmnarks frameworks has been introduced to make it easier to write good (micro-)benchmarks.
Let us show some Java/C# code to illustrate.
Library code:
package test;
public class Work {
public static double sqrtBuiltin(double a) {
return Math.sqrt(a);
}
public static double sqrtNewton(double a) {
double x = a / 2;
double xprev;
do {
xprev = x;
x = x + (a - x * x) / (2 * x);
} while(Math.abs(x - xprev) > 0.0000001);
return x;
}
}
using System;
namespace Lib
{
public class Work
{
public static double SqrtBuiltin(double a)
{
return Math.Sqrt(a);
}
public static double SqrtNewton(double a)
{
double x = a / 2;
double xprev;
do
{
xprev = x;
x = x + (a - x * x) / (2 * x);
}
while(Math.Abs(x - xprev) > 0.0000001);
return x;
}
}
}
Functional test to verify library is working correct:
We will test with JUnit. For more about JUnit see here.
package test;
import static org.junit.Assert.*;
import org.junit.Test;
public class FuncTestWork {
@Test
public void testSqrtBuiltin() {
assertEquals("sqrt(4)", 2.0, Work.sqrtBuiltin(4), 0.0000001);
assertEquals("sqrt(9)", 3.0, Work.sqrtBuiltin(9), 0.0000001);
assertEquals("sqrt(16)", 4.0, Work.sqrtBuiltin(16), 0.0000001);
assertEquals("sqrt(25)", 5.0, Work.sqrtBuiltin(25), 0.0000001);
assertEquals("sqrt(36)", 6.0, Work.sqrtBuiltin(36), 0.0000001);
}
@Test
public void testIsSqrtNewton() {
assertEquals("sqrt(4)", 2.0, Work.sqrtNewton(4), 0.0000001);
assertEquals("sqrt(9)", 3.0, Work.sqrtNewton(9), 0.0000001);
assertEquals("sqrt(16)", 4.0, Work.sqrtNewton(16), 0.0000001);
assertEquals("sqrt(25)", 5.0, Work.sqrtNewton(25), 0.0000001);
assertEquals("sqrt(36)", 6.0, Work.sqrtNewton(36), 0.0000001);
}
}
We will test with NUnit. For more about NUnit see here.
using System;
using NUnit.Framework;
using Lib;
namespace FuncTest
{
[TestFixture]
public class FuncTestWork
{
[Test]
public void TestSqrtBuiltin()
{
Assert.AreEqual(2.0, Work.SqrtBuiltin(4), 0.0000001, "Sqrt(4)");
Assert.AreEqual(3.0, Work.SqrtBuiltin(9), 0.0000001, "Sqrt(9)");
Assert.AreEqual(4.0, Work.SqrtBuiltin(16), 0.0000001, "Sqrt(16)");
Assert.AreEqual(5.0, Work.SqrtBuiltin(25), 0.0000001, "Sqrt(25)");
Assert.AreEqual(6.0, Work.SqrtBuiltin(36), 0.0000001, "Sqrt(936");
}
[Test]
public void TestIsSqrtNewton()
{
Assert.AreEqual(2.0, Work.SqrtBuiltin(4), 0.0000001, "Sqrt(4)");
Assert.AreEqual(3.0, Work.SqrtNewton(9), 0.0000001, "Sqrt(9)");
Assert.AreEqual(4.0, Work.SqrtNewton(16), 0.0000001, "Sqrt(16)");
Assert.AreEqual(5.0, Work.SqrtNewton(25), 0.0000001, "Sqrt(25)");
Assert.AreEqual(6.0, Work.SqrtNewton(36), 0.0000001, "Sqrt(936");
}
}
}
(strictly speaking the functional test is not necessary to do the benchmark, but there is really no point in benchmarking code that is not working correct, so it makes sense to verify correctness first)
Performance test without benchmark framework:
package test;
import java.util.Random;
public class PerfTestManual {
private static Random rng = new Random();
private static void testBuiltin(int rep) {
long t1 = System.currentTimeMillis();
for(int j = 0; j < rep; j++) {
double a = rng.nextDouble() * 1000.0;
double sqrta = Work.sqrtBuiltin(a);
if(Math.abs(a - sqrta * sqrta) > 0.0000001) throw new RuntimeException("Ooops");
}
long t2 = System.currentTimeMillis();
System.out.printf("Builtin : %.1f ns\n", (t2 - t1) * 1000000.0 / rep);
}
private static void testNewton(int rep) {
long t1 = System.currentTimeMillis();
for(int j = 0; j < rep; j++) {
double a = rng.nextDouble() * 1000.0;
double sqrta = Work.sqrtNewton(a);
if(Math.abs(a - sqrta * sqrta) > 0.0000001) throw new RuntimeException("Ooops");
}
long t2 = System.currentTimeMillis();
System.out.printf("Newton : %.1f ns\n", (t2 - t1) * 1000000.0 / rep);
}
public static void main(String[] args) {
for(int k = 0; k < 5; k++) {
testBuiltin(10000000);
}
for(int k = 0; k < 5; k++) {
testNewton(10000000);
}
}
}
Output:
java -cp . test.PerfTestManual Builtin : 32.0 ns Builtin : 31.0 ns Builtin : 31.0 ns Builtin : 31.0 ns Builtin : 32.0 ns Newton : 118.0 ns Newton : 118.0 ns Newton : 117.0 ns Newton : 118.0 ns Newton : 117.0 ns
using System;
using Lib;
namespace PerfTestManual
{
public class Program
{
private static Random rng = new Random();
private static void TestBuiltin(int rep)
{
DateTime t1 = DateTime.Now;
for(int j = 0; j < rep; j++)
{
double a = rng.NextDouble() * 1000.0;
double sqrta = Work.SqrtBuiltin(a);
if(Math.Abs(a - sqrta * sqrta) > 0.0000001) throw new Exception("Ooops");
}
DateTime t2 = DateTime.Now;
Console.WriteLine("Builtin : {0:F1} ns", (t2 - t1).TotalMilliseconds * 1000000.0 / rep);
}
private static void TestNewton(int rep)
{
DateTime t1 = DateTime.Now;
for(int j = 0; j < rep; j++)
{
double a = rng.NextDouble() * 1000.0;
double sqrta = Work.SqrtNewton(a);
if(Math.Abs(a - sqrta * sqrta) > 0.0000001) throw new Exception("Ooops");
}
DateTime t2 = DateTime.Now;
Console.WriteLine("Newton : {0:F1} ns", (t2 - t1).TotalMilliseconds * 1000000.0 / rep);
}
public static void Main(string[] args)
{
for(int k = 0; k < 5; k++)
{
TestBuiltin(10000000);
}
for(int k = 0; k < 5; k++)
{
TestNewton(10000000);
}
}
}
}
Output:
PerfTestManual Builtin : 29.0 ns Builtin : 27.0 ns Builtin : 27.0 ns Builtin : 26.0 ns Builtin : 27.0 ns Newton : 152.0 ns Newton : 152.0 ns Newton : 152.0 ns Newton : 152.0 ns Newton : 151.0 ns
Performance test with benchmark framework:
JMH (Java Microbenchmark Harness) is an open source benchmark framework for benchmarking Java code. It is created by the OpenJDK project. It is available from GitHub.
package test;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(value = 1)
@Warmup(iterations = 5, timeUnit = TimeUnit.MILLISECONDS, time = 1000)
@Measurement(iterations = 5, timeUnit = TimeUnit.MILLISECONDS, time = 1000)
public class PerfTestJMH {
private static Random rng = new Random();
@Benchmark
public void testBuiltin() {
double a = rng.nextDouble() * 1000.0;
double sqrta = Work.sqrtBuiltin(a);
if(Math.abs(a - sqrta * sqrta) > 0.0000001) throw new RuntimeException("Ooops");
}
@Benchmark
public void testNewton() {
double a = rng.nextDouble() * 1000.0;
double sqrta = Work.sqrtNewton(a);
if(Math.abs(a - sqrta * sqrta) > 0.0000001) throw new RuntimeException("Ooops");
}
}
The only practical way to use JMH seems to be via Maven. Create the proper Maven project and build and one get an executable benchmark.jar file that are ready to run.
Output:
java -jar benchmarks.jar # JMH version: 1.36 # VM version: JDK 11.0.16.1, Java HotSpot(TM) 64-Bit Server VM, 11.0.16.1+1-LTS-1 # VM invoker: C:\SUN\64bit\jdk-11.0.16.1\bin\java.exe # VM options: <none> # Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 5 iterations, 1000 ms each # Measurement: 5 iterations, 1000 ms each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: test.PerfTestJMH.testBuiltin # Run progress: 0.00% complete, ETA 00:00:20 # Fork: 1 of 1 # Warmup Iteration 1: 38.110 ns/op # Warmup Iteration 2: 32.375 ns/op # Warmup Iteration 3: 31.702 ns/op # Warmup Iteration 4: 31.723 ns/op # Warmup Iteration 5: 31.687 ns/op Iteration 1: 31.710 ns/op Iteration 2: 31.707 ns/op Iteration 3: 31.710 ns/op Iteration 4: 31.694 ns/op Iteration 5: 31.706 ns/op Result "test.PerfTestJMH.testBuiltin": 31.705 ±(99.9%) 0.027 ns/op [Average] (min, avg, max) = (31.694, 31.705, 31.710), stdev = 0.007 CI (99.9%): [31.679, 31.732] (assumes normal distribution) # JMH version: 1.36 # VM version: JDK 11.0.16.1, Java HotSpot(TM) 64-Bit Server VM, 11.0.16.1+1-LTS-1 # VM invoker: C:\SUN\64bit\jdk-11.0.16.1\bin\java.exe # VM options: <none> # Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 5 iterations, 1000 ms each # Measurement: 5 iterations, 1000 ms each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: test.PerfTestJMH.testNewton # Run progress: 50.00% complete, ETA 00:00:10 # Fork: 1 of 1 # Warmup Iteration 1: 118.710 ns/op # Warmup Iteration 2: 118.403 ns/op # Warmup Iteration 3: 119.326 ns/op # Warmup Iteration 4: 119.328 ns/op # Warmup Iteration 5: 119.340 ns/op Iteration 1: 119.344 ns/op Iteration 2: 119.415 ns/op Iteration 3: 119.330 ns/op Iteration 4: 119.334 ns/op Iteration 5: 119.335 ns/op Result "test.PerfTestJMH.testNewton": 119.351 ±(99.9%) 0.139 ns/op [Average] (min, avg, max) = (119.330, 119.351, 119.415), stdev = 0.036 CI (99.9%): [119.213, 119.490] (assumes normal distribution) # Run complete. Total time: 00:00:21 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), designfactorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units PerfTestJMH.testBuiltin avgt 5 31.705 ± 0.027 ns/op PerfTestJMH.testNewton avgt 5 119.351 ± 0.139 ns/op
The last few lines are really all you need.
BenchmarkDotNet is an open source benchmark framework for benchmarking .NET code. It is available from GitHub.
using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Lib;
namespace PerfTestBMDN
{
public class PerfTest
{
private static Random rng = new Random();
[Benchmark]
public void TestBuiltin() {
double a = rng.NextDouble() * 1000.0;
double sqrta = Work.SqrtBuiltin(a);
if(Math.Abs(a - sqrta * sqrta) > 0.0000001) throw new Exception("Ooops");
}
[Benchmark]
public void TestNewton() {
double a = rng.NextDouble() * 1000.0;
double sqrta = Work.SqrtNewton(a);
if(Math.Abs(a - sqrta * sqrta) > 0.0000001) throw new Exception("Ooops");
}
}
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run(typeof(PerfTest).Assembly);
Console.ReadKey();
}
}
}
BenchmarkDotNet is very easy to use in .NET Core / .NET 5+. It should also work with .NET 4.x, but I could not get it working.
Output:
dotnet run -c Release // Validating benchmarks: // ***** BenchmarkRunner: Start ***** // ***** Found 2 benchmark(s) in total ***** // ***** Building 1 exe(s) in Parallel: Start ***** ... // ***** Done, took 00:00:05 (5.63 sec) ***** // Found 2 benchmarks: // PerfTest.TestBuiltin: DefaultJob // PerfTest.TestNewton: DefaultJob Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance) // ************************** // Benchmark: PerfTest.TestBuiltin: DefaultJob // *** Execute *** // Launch: 1 / 1 ... // BeforeAnythingElse // Benchmark Process Environment Information: // Runtime=.NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2 // GC=Concurrent Workstation // HardwareIntrinsics=SSE4.2,POPCNT VectorSize=128 // Job: DefaultJob OverheadJitting 1: 1 op, 277124.43 ns, 277.1244 us/op WorkloadJitting 1: 1 op, 1198531.24 ns, 1.1985 ms/op OverheadJitting 2: 16 op, 239131.57 ns, 14.9457 us/op WorkloadJitting 2: 16 op, 236896.69 ns, 14.8060 us/op WorkloadPilot 1: 16 op, 1277.07 ns, 79.8169 ns/op WorkloadPilot 2: 32 op, 1596.34 ns, 49.8856 ns/op WorkloadPilot 3: 64 op, 1915.61 ns, 29.9314 ns/op WorkloadPilot 4: 128 op, 3192.68 ns, 24.9428 ns/op WorkloadPilot 5: 256 op, 5746.82 ns, 22.4485 ns/op WorkloadPilot 6: 512 op, 11174.37 ns, 21.8249 ns/op WorkloadPilot 7: 1024 op, 20752.41 ns, 20.2660 ns/op WorkloadPilot 8: 2048 op, 39908.47 ns, 19.4866 ns/op WorkloadPilot 9: 4096 op, 83648.16 ns, 20.4219 ns/op WorkloadPilot 10: 8192 op, 164422.91 ns, 20.0712 ns/op WorkloadPilot 11: 16384 op, 325972.40 ns, 19.8958 ns/op WorkloadPilot 12: 32768 op, 648752.13 ns, 19.7983 ns/op WorkloadPilot 13: 65536 op, 1294950.11 ns, 19.7594 ns/op WorkloadPilot 14: 131072 op, 2600436.06 ns, 19.8398 ns/op WorkloadPilot 15: 262144 op, 5195763.83 ns, 19.8203 ns/op WorkloadPilot 16: 524288 op, 10350980.65 ns, 19.7429 ns/op WorkloadPilot 17: 1048576 op, 20663968.43 ns, 19.7067 ns/op WorkloadPilot 18: 2097152 op, 41421163.05 ns, 19.7511 ns/op WorkloadPilot 19: 4194304 op, 60674606.43 ns, 14.4660 ns/op WorkloadPilot 20: 8388608 op, 93013558.98 ns, 11.0881 ns/op WorkloadPilot 21: 16777216 op, 186054894.26 ns, 11.0897 ns/op WorkloadPilot 22: 33554432 op, 371953985.85 ns, 11.0851 ns/op WorkloadPilot 23: 67108864 op, 743663093.32 ns, 11.0814 ns/op OverheadWarmup 1: 67108864 op, 190947672.97 ns, 2.8453 ns/op OverheadWarmup 2: 67108864 op, 130623941.83 ns, 1.9464 ns/op OverheadWarmup 3: 67108864 op, 130713017.54 ns, 1.9478 ns/op OverheadWarmup 4: 67108864 op, 130717806.55 ns, 1.9478 ns/op OverheadWarmup 5: 67108864 op, 130686199.04 ns, 1.9474 ns/op OverheadWarmup 6: 67108864 op, 130760588.44 ns, 1.9485 ns/op OverheadWarmup 7: 67108864 op, 130657464.94 ns, 1.9469 ns/op OverheadActual 1: 67108864 op, 130673109.06 ns, 1.9472 ns/op OverheadActual 2: 67108864 op, 130671512.73 ns, 1.9472 ns/op OverheadActual 3: 67108864 op, 133605902.88 ns, 1.9909 ns/op OverheadActual 4: 67108864 op, 130707589.98 ns, 1.9477 ns/op OverheadActual 5: 67108864 op, 130695457.81 ns, 1.9475 ns/op OverheadActual 6: 67108864 op, 130761226.97 ns, 1.9485 ns/op OverheadActual 7: 67108864 op, 130718125.82 ns, 1.9479 ns/op OverheadActual 8: 67108864 op, 130679813.69 ns, 1.9473 ns/op OverheadActual 9: 67108864 op, 130669597.12 ns, 1.9471 ns/op OverheadActual 10: 67108864 op, 130655549.34 ns, 1.9469 ns/op OverheadActual 11: 67108864 op, 130659061.28 ns, 1.9470 ns/op OverheadActual 12: 67108864 op, 130691307.33 ns, 1.9475 ns/op OverheadActual 13: 67108864 op, 130686837.58 ns, 1.9474 ns/op OverheadActual 14: 67108864 op, 130777190.36 ns, 1.9487 ns/op OverheadActual 15: 67108864 op, 130686837.58 ns, 1.9474 ns/op WorkloadWarmup 1: 67108864 op, 743604667.31 ns, 11.0806 ns/op WorkloadWarmup 2: 67108864 op, 743614883.88 ns, 11.0807 ns/op WorkloadWarmup 3: 67108864 op, 743785692.14 ns, 11.0833 ns/op WorkloadWarmup 4: 67108864 op, 743542090.83 ns, 11.0796 ns/op WorkloadWarmup 5: 67108864 op, 743537621.08 ns, 11.0796 ns/op WorkloadWarmup 6: 67108864 op, 746775634.89 ns, 11.1278 ns/op WorkloadWarmup 7: 67108864 op, 743703640.32 ns, 11.0820 ns/op // BeforeActualRun WorkloadActual 1: 67108864 op, 743610094.86 ns, 11.0807 ns/op WorkloadActual 2: 67108864 op, 743580402.96 ns, 11.0802 ns/op WorkloadActual 3: 67108864 op, 743660858.44 ns, 11.0814 ns/op WorkloadActual 4: 67108864 op, 743643617.98 ns, 11.0812 ns/op WorkloadActual 5: 67108864 op, 743647129.93 ns, 11.0812 ns/op WorkloadActual 6: 67108864 op, 743662454.78 ns, 11.0814 ns/op WorkloadActual 7: 67108864 op, 743865509.09 ns, 11.0845 ns/op WorkloadActual 8: 67108864 op, 743897435.86 ns, 11.0849 ns/op WorkloadActual 9: 67108864 op, 743752488.29 ns, 11.0828 ns/op WorkloadActual 10: 67108864 op, 743456846.33 ns, 11.0784 ns/op WorkloadActual 11: 67108864 op, 743575294.68 ns, 11.0801 ns/op WorkloadActual 12: 67108864 op, 743666924.53 ns, 11.0815 ns/op WorkloadActual 13: 67108864 op, 744301628.87 ns, 11.0910 ns/op WorkloadActual 14: 67108864 op, 744567259.66 ns, 11.0949 ns/op WorkloadActual 15: 67108864 op, 743596685.62 ns, 11.0805 ns/op // AfterActualRun WorkloadResult 1: 67108864 op, 612923257.28 ns, 9.1333 ns/op WorkloadResult 2: 67108864 op, 612893565.38 ns, 9.1328 ns/op WorkloadResult 3: 67108864 op, 612974020.86 ns, 9.1340 ns/op WorkloadResult 4: 67108864 op, 612956780.40 ns, 9.1338 ns/op WorkloadResult 5: 67108864 op, 612960292.35 ns, 9.1338 ns/op WorkloadResult 6: 67108864 op, 612975617.20 ns, 9.1340 ns/op WorkloadResult 7: 67108864 op, 613178671.51 ns, 9.1371 ns/op WorkloadResult 8: 67108864 op, 613210598.29 ns, 9.1375 ns/op WorkloadResult 9: 67108864 op, 613065650.71 ns, 9.1354 ns/op WorkloadResult 10: 67108864 op, 612770008.75 ns, 9.1310 ns/op WorkloadResult 11: 67108864 op, 612888457.10 ns, 9.1327 ns/op WorkloadResult 12: 67108864 op, 612980086.95 ns, 9.1341 ns/op WorkloadResult 13: 67108864 op, 612909848.04 ns, 9.1331 ns/op // AfterAll // Benchmark Process 9848 has exited with code 0. Mean = 9.134 ns, StdErr = 0.000 ns (0.01%), N = 13, StdDev = 0.002 ns Min = 9.131 ns, Q1 = 9.133 ns, Median = 9.134 ns, Q3 = 9.134 ns, Max = 9.138 ns IQR = 0.001 ns, LowerFence = 9.131 ns, UpperFence = 9.136 ns ConfidenceInterval = [9.132 ns; 9.136 ns] (CI 99.9%), Margin = 0.002 ns (0.02% of Mean) Skewness = 0.53, Kurtosis = 2.61, MValue = 2 // ** Remained 1 (50.0 %) benchmark(s) to run. Estimated finish 2023-01-08 19:13 (0h 0m from now) ** Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance) // ************************** // Benchmark: PerfTest.TestNewton: DefaultJob // *** Execute *** // Launch: 1 / 1 ... // BeforeAnythingElse // Benchmark Process Environment Information: // Runtime=.NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2 // GC=Concurrent Workstation // HardwareIntrinsics=SSE4.2,POPCNT VectorSize=128 // Job: DefaultJob OverheadJitting 1: 1 op, 257649.10 ns, 257.6491 us/op WorkloadJitting 1: 1 op, 1260469.19 ns, 1.2605 ms/op OverheadJitting 2: 16 op, 234023.28 ns, 14.6265 us/op WorkloadJitting 2: 16 op, 239131.57 ns, 14.9457 us/op WorkloadPilot 1: 16 op, 2554.14 ns, 159.6339 ns/op WorkloadPilot 2: 32 op, 4150.48 ns, 129.7025 ns/op WorkloadPilot 3: 64 op, 7662.43 ns, 119.7254 ns/op WorkloadPilot 4: 128 op, 14367.05 ns, 112.2426 ns/op WorkloadPilot 5: 256 op, 27776.30 ns, 108.5012 ns/op WorkloadPilot 6: 512 op, 57148.93 ns, 111.6190 ns/op WorkloadPilot 7: 1024 op, 111105.19 ns, 108.5012 ns/op WorkloadPilot 8: 2048 op, 217421.36 ns, 106.1628 ns/op WorkloadPilot 9: 4096 op, 434204.18 ns, 106.0069 ns/op WorkloadPilot 10: 8192 op, 862342.27 ns, 105.2664 ns/op WorkloadPilot 11: 16384 op, 1731069.90 ns, 105.6561 ns/op WorkloadPilot 12: 32768 op, 3451923.22 ns, 105.3443 ns/op WorkloadPilot 13: 65536 op, 6901292.30 ns, 105.3054 ns/op WorkloadPilot 14: 131072 op, 13801946.06 ns, 105.3005 ns/op WorkloadPilot 15: 262144 op, 27687540.29 ns, 105.6196 ns/op WorkloadPilot 16: 524288 op, 55175218.95 ns, 105.2384 ns/op WorkloadPilot 17: 1048576 op, 107179470.32 ns, 102.2143 ns/op WorkloadPilot 18: 2097152 op, 206276676.82 ns, 98.3604 ns/op WorkloadPilot 19: 4194304 op, 412525577.34 ns, 98.3538 ns/op WorkloadPilot 20: 8388608 op, 825079569.51 ns, 98.3571 ns/op OverheadWarmup 1: 8388608 op, 23877398.62 ns, 2.8464 ns/op OverheadWarmup 2: 8388608 op, 23884422.51 ns, 2.8472 ns/op OverheadWarmup 3: 8388608 op, 23871651.80 ns, 2.8457 ns/op OverheadWarmup 4: 8388608 op, 23880591.30 ns, 2.8468 ns/op OverheadWarmup 5: 8388608 op, 23874205.94 ns, 2.8460 ns/op OverheadActual 1: 8388608 op, 23873248.14 ns, 2.8459 ns/op OverheadActual 2: 8388608 op, 23873886.67 ns, 2.8460 ns/op OverheadActual 3: 8388608 op, 23872928.87 ns, 2.8459 ns/op OverheadActual 4: 8388608 op, 23873248.14 ns, 2.8459 ns/op OverheadActual 5: 8388608 op, 23873248.14 ns, 2.8459 ns/op OverheadActual 6: 8388608 op, 23871651.80 ns, 2.8457 ns/op OverheadActual 7: 8388608 op, 23876760.08 ns, 2.8463 ns/op OverheadActual 8: 8388608 op, 23874844.48 ns, 2.8461 ns/op OverheadActual 9: 8388608 op, 23875802.28 ns, 2.8462 ns/op OverheadActual 10: 8388608 op, 23871332.53 ns, 2.8457 ns/op OverheadActual 11: 8388608 op, 20873408.09 ns, 2.4883 ns/op OverheadActual 12: 8388608 op, 16329269.80 ns, 1.9466 ns/op OverheadActual 13: 8388608 op, 16329908.34 ns, 1.9467 ns/op OverheadActual 14: 8388608 op, 16327992.73 ns, 1.9464 ns/op OverheadActual 15: 8388608 op, 16327673.46 ns, 1.9464 ns/op OverheadActual 16: 8388608 op, 16329908.34 ns, 1.9467 ns/op OverheadActual 17: 8388608 op, 16328631.26 ns, 1.9465 ns/op OverheadActual 18: 8388608 op, 16323203.71 ns, 1.9459 ns/op OverheadActual 19: 8388608 op, 16341401.98 ns, 1.9480 ns/op OverheadActual 20: 8388608 op, 16328631.26 ns, 1.9465 ns/op WorkloadWarmup 1: 8388608 op, 824957289.95 ns, 98.3426 ns/op WorkloadWarmup 2: 8388608 op, 825080527.32 ns, 98.3573 ns/op WorkloadWarmup 3: 8388608 op, 824679207.72 ns, 98.3094 ns/op WorkloadWarmup 4: 8388608 op, 825740773.08 ns, 98.4360 ns/op WorkloadWarmup 5: 8388608 op, 824774030.25 ns, 98.3207 ns/op WorkloadWarmup 6: 8388608 op, 825357651.75 ns, 98.3903 ns/op // BeforeActualRun WorkloadActual 1: 8388608 op, 825830806.59 ns, 98.4467 ns/op WorkloadActual 2: 8388608 op, 826759556.56 ns, 98.5574 ns/op WorkloadActual 3: 8388608 op, 825054028.09 ns, 98.3541 ns/op WorkloadActual 4: 8388608 op, 825348712.25 ns, 98.3892 ns/op WorkloadActual 5: 8388608 op, 825339772.75 ns, 98.3882 ns/op WorkloadActual 6: 8388608 op, 825481527.65 ns, 98.4051 ns/op WorkloadActual 7: 8388608 op, 825647866.16 ns, 98.4249 ns/op WorkloadActual 8: 8388608 op, 825199933.46 ns, 98.3715 ns/op WorkloadActual 9: 8388608 op, 825344881.04 ns, 98.3888 ns/op WorkloadActual 10: 8388608 op, 825101279.72 ns, 98.3597 ns/op WorkloadActual 11: 8388608 op, 825100002.65 ns, 98.3596 ns/op WorkloadActual 12: 8388608 op, 825331791.06 ns, 98.3872 ns/op WorkloadActual 13: 8388608 op, 826023644.33 ns, 98.4697 ns/op WorkloadActual 14: 8388608 op, 826543093.01 ns, 98.5316 ns/op WorkloadActual 15: 8388608 op, 825075738.30 ns, 98.3567 ns/op // AfterActualRun WorkloadResult 1: 8388608 op, 803458436.28 ns, 95.7797 ns/op WorkloadResult 2: 8388608 op, 802681657.78 ns, 95.6871 ns/op WorkloadResult 3: 8388608 op, 802976341.94 ns, 95.7222 ns/op WorkloadResult 4: 8388608 op, 802967402.44 ns, 95.7212 ns/op WorkloadResult 5: 8388608 op, 803109157.33 ns, 95.7381 ns/op WorkloadResult 6: 8388608 op, 803275495.85 ns, 95.7579 ns/op WorkloadResult 7: 8388608 op, 802827563.15 ns, 95.7045 ns/op WorkloadResult 8: 8388608 op, 802972510.73 ns, 95.7218 ns/op WorkloadResult 9: 8388608 op, 802728909.41 ns, 95.6927 ns/op WorkloadResult 10: 8388608 op, 802727632.34 ns, 95.6926 ns/op WorkloadResult 11: 8388608 op, 802959420.75 ns, 95.7202 ns/op WorkloadResult 12: 8388608 op, 803651274.02 ns, 95.8027 ns/op WorkloadResult 13: 8388608 op, 804170722.70 ns, 95.8646 ns/op WorkloadResult 14: 8388608 op, 802703367.99 ns, 95.6897 ns/op // AfterAll // Benchmark Process 9388 has exited with code 0. Mean = 95.735 ns, StdErr = 0.014 ns (0.01%), N = 14, StdDev = 0.051 ns Min = 95.687 ns, Q1 = 95.696 ns, Median = 95.721 ns, Q3 = 95.753 ns, Max = 95.865 ns IQR = 0.057 ns, LowerFence = 95.610 ns, UpperFence = 95.839 ns ConfidenceInterval = [95.678 ns; 95.793 ns] (CI 99.9%), Margin = 0.057 ns (0.06% of Mean) Skewness = 1.18, Kurtosis = 3.45, MValue = 2 // ** Remained 0 (0.0 %) benchmark(s) to run. Estimated finish 2023-01-08 19:13 (0h 0m from now) ** Successfully reverted power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance) // ***** BenchmarkRunner: Finish ***** // * Export * BenchmarkDotNet.Artifacts\results\PerfTestBMDN.PerfTest-report.csv BenchmarkDotNet.Artifacts\results\PerfTestBMDN.PerfTest-report-github.md BenchmarkDotNet.Artifacts\results\PerfTestBMDN.PerfTest-report.html // * Detailed results * PerfTest.TestBuiltin: DefaultJob Runtime = .NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2; GC = Concurrent Workstation Mean = 9.134 ns, StdErr = 0.000 ns (0.01%), N = 13, StdDev = 0.002 ns Min = 9.131 ns, Q1 = 9.133 ns, Median = 9.134 ns, Q3 = 9.134 ns, Max = 9.138 ns IQR = 0.001 ns, LowerFence = 9.131 ns, UpperFence = 9.136 ns ConfidenceInterval = [9.132 ns; 9.136 ns] (CI 99.9%), Margin = 0.002 ns (0.02% of Mean) Skewness = 0.53, Kurtosis = 2.61, MValue = 2 -------------------- Histogram -------------------- [9.130 ns ; 9.139 ns) | @@@@@@@@@@@@@ --------------------------------------------------- PerfTest.TestNewton: DefaultJob Runtime = .NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2; GC = Concurrent Workstation Mean = 95.735 ns, StdErr = 0.014 ns (0.01%), N = 14, StdDev = 0.051 ns Min = 95.687 ns, Q1 = 95.696 ns, Median = 95.721 ns, Q3 = 95.753 ns, Max = 95.865 ns IQR = 0.057 ns, LowerFence = 95.610 ns, UpperFence = 95.839 ns ConfidenceInterval = [95.678 ns; 95.793 ns] (CI 99.9%), Margin = 0.057 ns (0.06% of Mean) Skewness = 1.18, Kurtosis = 3.45, MValue = 2 -------------------- Histogram -------------------- [95.659 ns ; 95.892 ns) | @@@@@@@@@@@@@@ --------------------------------------------------- // * Summary * BenchmarkDotNet=v0.13.3, OS=Windows 7 SP1 (6.1.7601.0) Intel Core i7 CPU 960 3.20GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores Frequency=3132167 Hz, Resolution=319.2678 ns, Timer=TSC .NET SDK=6.0.401 [Host] : .NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2 DefaultJob : .NET 6.0.12 (6.0.1222.56807), X64 RyuJIT SSE4.2 | Method | Mean | Error | StdDev | |------------ |----------:|----------:|----------:| | TestBuiltin | 9.134 ns | 0.0021 ns | 0.0018 ns | | TestNewton | 95.735 ns | 0.0573 ns | 0.0508 ns | // * Hints * Outliers PerfTest.TestBuiltin: Default -> 2 outliers were removed (11.09 ns, 11.09 ns) PerfTest.TestNewton: Default -> 1 outlier was removed (98.56 ns) // * Legends * Mean : Arithmetic mean of all measurements Error : Half of 99.9% confidence interval StdDev : Standard deviation of all measurements 1 ns : 1 Nanosecond (0.000000001 sec) // ***** BenchmarkRunner: End ***** Run time: 00:00:41 (41.11 sec), executed benchmarks: 2 Global total time: 00:00:47 (47.04 sec), executed benchmarks: 2 // * Artifacts cleanup *
The last few lines are really all you need.
It is not easy to write good benchmarks. But hopefully reading this has given some insigth into potential pitfalls and how to avoid them.
Version | Date | Description |
---|---|---|
1.0 | January 9th 2023 | Initial version |
See list of all articles here
Please send comments to Arne Vajhøj