Bench SMP mode

William Law william.law at tesserent.com
Thu Aug 2 00:03:04 UTC 2018


Hi Alex,
> -----Original Message-----
> From: Alex Rousskov [mailto:rousskov at measurement-factory.com]
> Sent: Thursday, 2 August 2018 8:34 AM
> To: William Law; users at lists.web-polygraph.org
> Subject: Re: Bench SMP mode
>
> On 07/29/2018 06:15 PM, William Law wrote:
>
> > I'm trying to move away from launching multiple polygraph-server/client
> > instances via a script and allocating a specific core, fake_hosts subset
> > and same config to utilising SMP mode (which I assume you launch one
> > instance and it uses as many cores as it needs).
>
> Your assumption is correct. You should not use fake hosts though. Let
> Polygraph create aliases for you.
>

I'm using fake hosts to limit the particular instance to a number of IP's in
the greater pool.  This was helping reduce the CPU load that each agent was
consuming.

> > I have the following bench config:
> >
> > Bench sslBench = {
> >   client_side = {
> >     max_host_load = 100/sec;
> >     max_agent_load = 1/sec;
> >     addr_space = [ '198.18.24-27.10-249/22' ];
> >     hosts =  [ '198.18.24.2' ] ** 750;
> >     cpu_cores = [ 65535 ];
> >   };
> >   server_side = {
> >     max_host_load = client_side.max_host_load;
> >     max_agent_load = client_side.max_agent_load;
> >     addr_space = [ '198.18.28-29.10-249:443/23' ];
> >     hosts = [ '198.18.28.2' ] ** 480;
> >     cpu_cores = [ 65535 ];
> >   };
> > };
>
> Your "750" and "480" should be the number of cores on hosts 198.18.24.2
> and 198.18.28.2 (cores that you want to use). It is best to put that
> multiplier inside the address array.

I had done it this way as then it accurately produced the number of servers
and robots to cover the IP space

>
> Your [ 65535 ] should be an array of arrays of core IDs, one inner array
> per SMP worker, telling the corresponding worker which core(s) to use.
>

This was the final array variable that I ended up on trying to find a valid
value for the field that the code wouldn't grumble about.

>
> Here is a better (but untested) version of your Bench, using 4 cores per
> drone, numbered 2 through 5.
>
>   Bench sslBench = {
>     client_side = {
>       max_host_load = 100/sec;
>       max_agent_load = 1/sec;
>       addr_space = [ 'lo::198.18.24-27.10-249/32' ];
>       hosts =  [ '198.18.24.2' ** 4 ];
>       cpu_cores = [ [2], [3], [4], [5] ];
>     };
>     server_side = {
>       max_host_load = client_side.max_host_load;
>       max_agent_load = client_side.max_agent_load;
>       addr_space = [ 'lo::198.18.28-29.10-249:443/32' ];
>       hosts = [ '198.18.28.2' ** 4 ];
>       cpu_cores = client_side.cpu_cores;
>     };
>   };
>
> Avoid sharing physical cores among virtual cores: Two busy virtual cores
> can do _less_ than one real core they share.

I might turn off hyperthreading then and give what you have provided a go.
I'll let you know.

>
> If you really have 40 physical cores, and you want to use most of them,
> then you would probably want to generate the cpu_cores array by a
> script. We should add PGL function(s) to generate typical CPU affinity
> map(s) for a given number of cores. Quality patches or sponsorships
> welcome.

I'm running 2 Dell R830's w/ 4x Xeon E5-4620 v4's and 256GB RAM each (also
8x 10G and 6x 1G eth ports).

> Note that I added loopback interfaces to addr_space and changed their
> subnet to /32. You should let Polygraph create these aliases (and the
> corresponding robots and servers) on the loopback interface and then
> configure a couple of simple routes for all agents to be able to talk to
> each other (or the proxy). No --fake_hosts!

I'm curious as to why you attach the IP's to lo and not the interface that
is connected to the DUT though.

> > after about 7
> > minutes I was getting latency suddenly spiking to >40 seconds.  In that
> > situation, none of the processes is running at more than 26% of the
> > single
> > core its running on, and still way under the throughput that the DUT can
> > handle.
>
>
> I suspect your OS ran out of some resource like RAM for Polygraph
> processes, conntrack buffer space, or ephemeral ports. Check system
> logs. If that does not help, try monitoring with atop(1).

Conntrack isn't an issue here (not used on the client/server endpoints) nor
is RAM usage (see my note on the specs above).  It might be ephemeral ports,
but I don't think I'm establishing enough connections to exhaust the
1025-65535 port range that is configured.  Again, I'll check and let you
know.

>
> HTH,
>
> Alex.

Kind Regards,

William Law


More information about the Users mailing list