Bench SMP mode

Mon Jul 30 00:15:12 UTC 2018

Hi All,
I'm trying to move away from launching multiple polygraph-server/client
instances via a script and allocating a specific core, fake_hosts subset
and same config to utilising SMP mode (which I assume you launch one
instance and it uses as many cores as it needs).

However based on stuff in the user list and the only mention of cpu_cores
in the change logs for 4.9.0 I seem to be having a little trouble (there's
nothing in the documentation for this).

I have the following bench config:

Bench sslBench = {
  client_side = {
    max_host_load = 100/sec;
    max_agent_load = 1/sec;
    addr_space = [ '198.18.24-27.10-249/22' ];
    hosts =  [ '198.18.24.2' ] ** 750;
    cpu_cores = [ 65535 ];
  };
  server_side = {
    max_host_load = client_side.max_host_load;
    max_agent_load = client_side.max_agent_load;
    addr_space = [ '198.18.28-29.10-249:443/23' ];
    hosts = [ '198.18.28.2' ] ** 480;
    cpu_cores = [ 65535 ];
  };
};

I have two hosts, one client one server:

<client host>	-	<DUT>	-	<server host>
198.18.24.2/22		.1	.1	198.18.28.2/23

All hosts are connected via 10Gig Eth.  For getting this going, the DUT is
just a router at this point, and can handle 5Gbit/s before it starts being
a slow point (but I'm not getting anywhere near that load).

But while it launches huge numbers of servers/robots, they are all in the
one thread, resulting in it maxing out a single core (out of 80 logical),
then showing an associated rise in latency. My test is only hitting
~600Mbit/s before it maxes out the core it's running on.

The issue that I'm getting is trying to get the cpu mask correct for what
WPG will understand.  I've tried arrays with:
1,2,3,4,5,6,7,8
'1','2','3','4','5','6','7','8'
0xffffffff
65535

And all of them get rejected with the error "incorrect format cpu_cores".

This is only part of my problem though, the other issue that I was trying
to get to the bottom of (which I thought may be related to the 26 client
and server instances I'm running on each box) was that after about 7
minutes I was getting latency suddenly spiking to >40 seconds.  In that
situation, none of the processes is running at more than 26% of the single
core its running on, and still way under the throughput that the DUT can
handle.

Help?

Thanks.

Kind Regards,

William Law