Bench SMP mode

Wed Aug 8 02:16:52 UTC 2018

> As for the network interrupts, YMMV, but they tend to migrate towards
> busy workers/CPUs in the tests I have seen, which is not necessarily a
> good thing when the worker is close to maxing out a CPU core. Confining
> interrupts to dedicated cores may improve overall performance.

In multi-socket architectures PCI-e ports are bound to a particular socket
(aka NUMA node).  In this case the dual-port Intel X710 card that I am using
is bound to the first socket.  In top this shows as SI usage on the cores
associated with that socket.

> > See attached! (will email you direct if the list whinges).  Setup for 20
> > cores, 1 dump with 2 cores per worker, the other with 1 core per worker.
>
> Thank you for sharing these helpful backtraces.

Hope you find something useful!

> I am not sure I share your disappointment in terms of performance: A
> robot=thread model would only scale well for very busy robots, which is
> both unrealistic (in most cases) and already supported (by configuring
> one robot per worker).
>
> Polygraph was born before SMP became a thing on regular machines we used
> for drones. If we were to write it from scratch today, we would have
> used threads for ease of worker management/synchronization, but we would
> still not dedicate a thread to each robot because such rigid and
> expensive architecture would not scale in many realistic simulations
> that use thousands of robots.

All good, does well for what it is.  Knowing more about the internal
architecture allows me to work around it (more cores!) and still achieve the
testing goals.

> Cheers,
>
> Alex.

Regards,
William