Geekbench warns that Intel’s BOT tool for the new Arrow Lake Plus CPUs generates results that ‘aren’t comparable with standard runs’

Primate Labs, the company behind Geekbench, has announced that it will be attaching a warning to benchmark results produced by Intel’s new Arrow Lake Plus CPUs. The benchmark specialist says that Intel’s new BOT tool, which only runs on the freshly-launched chips, as well as Panther Lake parts, “can boost Geekbench 6 scores by up to 8%, but those results aren’t comparable with standard runs.”

Primate Labs actually says that individual workloads can be improved by as much as 40%. As for a justification for the move, the company says: “Since the tool modifies the benchmark, and it is unclear to both Primate Labs and the general public how these changes occur, results generated with the tool are not comparable to results generated without it.”

The warning will be applied to all Geekbench 6 results for the new Intel chips because Primate Labs currently has “no way to detect if a Geekbench 6 result was run with or without the Binary Optimization Tool.” Hence, for now, all Arrow Lake Plus benchmark results will have the following warning attached:

“This benchmark result may be invalid due to binary modification tools that can run on this system.”

Primate Labs concludes that, “while the Binary Optimization Tool only supports a small number of Intel CPUs, this is an important step to ensure scores reported on the Geekbench Browser remain trustworthy. Intel lists the supported CPUs on the Binary Optimization Tool webpage. We expect this list to be dynamic and that it will change over time. Primate Labs’ warnings will be updated accordingly.”

The immediate question is whether this is fair. Some observers have rightly pointed out that Geekbench runs aren’t entirely comparable across different platforms in the first place. More to the point, what Intel’s BOT tool is doing is arguably no different to what various CPU architectures do internally.

The Geekbench binary is dynamically linked, which means the actual instructions being run differ depending on the CPU and OS combination in question. Intel’s BOT simply makes sure that the instructions generated are absolutely optimal for the underlying micro-architecture. Moreover, when you dig right down, all CPUs effectively translate software binaries into microcode specific to their architectures.

My understanding is that what Intel’s BOT is doing essentially amounts to re-ordering instructions so that they fully utilise the Arrow Lake Plus pipeline. All the actual calculations are the same. In other words, enabling BOT doesn’t mean skipping any work.

If there is a valid reason to flag results that are BOT-enhanced, it’s that they are unrepresentative of how the Arrow Lake Plus architecture performs without BOT enabled. That matters because BOT doesn’t automatically work on all software.

Intel likens BOT to a game of Tetris where instructions are more optimally ordered. (Image credit: Intel)

Instead, Intel must first assess the software or application and create application-specific profiles for BOT. In an ideal world, you’d have results both with and without BOT enabled, giving a full picture of how the CPU performs both natively and with the tool switched on.

You could also argue that Intel has something of a track record when it comes to, well, tweaking benchmark results. To dredge up just two examples of many, in 2024 SPEC invalidated Intel benchmark results using a certain Intel compiler due to “unfair” optimisations, and in 2009, Intel’s ICC was found to be crippling performance on AMD CPUs by deliberately removing all optimisations for the competing CPU architecture.

The other concern is that Intel’s BOT sparks an arms race of optimisations that effectively make all of the results unrepresentative of actual CPU performance. With all that in mind, flagging BOT-enhanced results probably makes sense.