(Naive) Let's modify a few n-body programs to perform 3,000 in-process warm-up iterations; and then perform once-more for measurement, recorded by os.path.getmtime('two') - os.path.getmtime('one') side-effects.
Let's reduce the workload 250x from N=50,000,000 to N=200,000, so runtimes are only one or two tenths-of-a-second.
The "process" measurements (the usual elapsed-seconds) were made without doing warm-up iterations and include startup costs.
The "in-process" measurements were made without doing warm-up iterations and exclude startup costs.
The "warm-up" measurements were made after doing 3,000 warm-up iterations and exclude startup costs.
The N=50,000,000 workload used for n-body programs takes seconds and tens-of-seconds. The Java #9 warm measurement is a few tenths-of-a-second faster.