Swap globals for local variables in a function and see the Python code speed up. Swap language implementations and see the C code speed up.
Take a tiny tiny ten-line snippet of code, loop through 10 000 million times and usually brittle performance measurements may become micro-benchmark-broken.
Something to think about, when you're tempted to draw broad conclusions from ten-line snippets of code.
The "process" measurements (the usual elapsed-seconds) were made without doing warm-up iterations and include startup costs.
The "in-process" measurements were made without doing warm-up iterations and exclude startup costs.