Running multithreaded, CPU-intensive processes in parallel on ARM

Summary
I apologize up front. This is not 100% FreedomBox specific. I just wonder if anybody has noticed something similar when running two “heavy” multithreaded processes in parallel on ARM CPU’s. It is something some of you must have run into, I assume.

Problem
When I run two multithreaded processes in parallel on a 4-core Intel or AMD CPU, performance is more or less half of what it would be, if only one of these processes is run at a time (when running process A alone until finished and then running process B, both able to occupy all 4 cores).

In other words: if both processes finish in exactly 10s when run alone, if they are run in parallel, they finish after roughly 20s - seamingly occupying 2 cores for themselves.

When I run the same code in parallel on a Raspberry Pi, performance is far worse than “roughly half”. It’s more like 10-fold. The RPi likes it much better, if the two processes are executed one after the other. It is as if something (caches?) is preventing the ARM CPU from properly “juggling” two CPU-intensive processes in parallel. Did anybody else notice this? Are the ARM CPU’s inside a RPi really such bad multi-process executors? Or am I doing something wrong?

I am using multithreaded C and Go code. My executables appear to show the same behavior independent of programming language and compiler.