Bench SMP mode

Alex Rousskov rousskov at measurement-factory.com
Wed Aug 1 22:33:43 UTC 2018


On 07/29/2018 06:15 PM, William Law wrote:

> I'm trying to move away from launching multiple polygraph-server/client
> instances via a script and allocating a specific core, fake_hosts subset
> and same config to utilising SMP mode (which I assume you launch one
> instance and it uses as many cores as it needs).

Your assumption is correct. You should not use fake hosts though. Let
Polygraph create aliases for you.


> I have the following bench config:
> 
> Bench sslBench = {
>   client_side = {
>     max_host_load = 100/sec;
>     max_agent_load = 1/sec;
>     addr_space = [ '198.18.24-27.10-249/22' ];
>     hosts =  [ '198.18.24.2' ] ** 750;
>     cpu_cores = [ 65535 ];
>   };
>   server_side = {
>     max_host_load = client_side.max_host_load;
>     max_agent_load = client_side.max_agent_load;
>     addr_space = [ '198.18.28-29.10-249:443/23' ];
>     hosts = [ '198.18.28.2' ] ** 480;
>     cpu_cores = [ 65535 ];
>   };
> };

Your "750" and "480" should be the number of cores on hosts 198.18.24.2
and 198.18.28.2 (cores that you want to use). It is best to put that
multiplier inside the address array.

Your [ 65535 ] should be an array of arrays of core IDs, one inner array
per SMP worker, telling the corresponding worker which core(s) to use.

Here is a better (but untested) version of your Bench, using 4 cores per
drone, numbered 2 through 5.

  Bench sslBench = {
    client_side = {
      max_host_load = 100/sec;
      max_agent_load = 1/sec;
      addr_space = [ 'lo::198.18.24-27.10-249/32' ];
      hosts =  [ '198.18.24.2' ** 4 ];
      cpu_cores = [ [2], [3], [4], [5] ];
    };
    server_side = {
      max_host_load = client_side.max_host_load;
      max_agent_load = client_side.max_agent_load;
      addr_space = [ 'lo::198.18.28-29.10-249:443/32' ];
      hosts = [ '198.18.28.2' ** 4 ];
      cpu_cores = client_side.cpu_cores;
    };
  };

Avoid sharing physical cores among virtual cores: Two busy virtual cores
can do _less_ than one real core they share.

If you really have 40 physical cores, and you want to use most of them,
then you would probably want to generate the cpu_cores array by a
script. We should add PGL function(s) to generate typical CPU affinity
map(s) for a given number of cores. Quality patches or sponsorships welcome.

Note that I added loopback interfaces to addr_space and changed their
subnet to /32. You should let Polygraph create these aliases (and the
corresponding robots and servers) on the loopback interface and then
configure a couple of simple routes for all agents to be able to talk to
each other (or the proxy). No --fake_hosts!


> after about 7
> minutes I was getting latency suddenly spiking to >40 seconds.  In that
> situation, none of the processes is running at more than 26% of the single
> core its running on, and still way under the throughput that the DUT can
> handle.


I suspect your OS ran out of some resource like RAM for Polygraph
processes, conntrack buffer space, or ephemeral ports. Check system
logs. If that does not help, try monitoring with atop(1).


HTH,

Alex.


More information about the Users mailing list