From william.law at tesserent.com Mon Jul 30 00:15:12 2018 From: william.law at tesserent.com (William Law) Date: Mon, 30 Jul 2018 10:15:12 +1000 Subject: Bench SMP mode Message-ID: <5885a856e68e7b7c9e30f00df4586299@mail.gmail.com> Hi All, I'm trying to move away from launching multiple polygraph-server/client instances via a script and allocating a specific core, fake_hosts subset and same config to utilising SMP mode (which I assume you launch one instance and it uses as many cores as it needs). However based on stuff in the user list and the only mention of cpu_cores in the change logs for 4.9.0 I seem to be having a little trouble (there's nothing in the documentation for this). I have the following bench config: Bench sslBench = { client_side = { max_host_load = 100/sec; max_agent_load = 1/sec; addr_space = [ '198.18.24-27.10-249/22' ]; hosts = [ '198.18.24.2' ] ** 750; cpu_cores = [ 65535 ]; }; server_side = { max_host_load = client_side.max_host_load; max_agent_load = client_side.max_agent_load; addr_space = [ '198.18.28-29.10-249:443/23' ]; hosts = [ '198.18.28.2' ] ** 480; cpu_cores = [ 65535 ]; }; }; I have two hosts, one client one server: - - 198.18.24.2/22 .1 .1 198.18.28.2/23 All hosts are connected via 10Gig Eth. For getting this going, the DUT is just a router at this point, and can handle 5Gbit/s before it starts being a slow point (but I'm not getting anywhere near that load). But while it launches huge numbers of servers/robots, they are all in the one thread, resulting in it maxing out a single core (out of 80 logical), then showing an associated rise in latency. My test is only hitting ~600Mbit/s before it maxes out the core it's running on. The issue that I'm getting is trying to get the cpu mask correct for what WPG will understand. I've tried arrays with: 1,2,3,4,5,6,7,8 '1','2','3','4','5','6','7','8' 0xffffffff 65535 And all of them get rejected with the error "incorrect format cpu_cores". This is only part of my problem though, the other issue that I was trying to get to the bottom of (which I thought may be related to the 26 client and server instances I'm running on each box) was that after about 7 minutes I was getting latency suddenly spiking to >40 seconds. In that situation, none of the processes is running at more than 26% of the single core its running on, and still way under the throughput that the DUT can handle. Help? Thanks. Kind Regards, William Law