From morakhad at cisco.com  Tue Aug  2 13:51:09 2011
From: morakhad at cisco.com (Mohammed Rakhada)
Date: Tue, 02 Aug 2011 14:51:09 +0100
Subject: Incorrect throughput being seen in report and differing PGL configs
Message-ID: <1312293069.2165.34.camel@localhost>

Hello,

I am using Web Polygraph version 4.4.0 and running against a proxy
server using 8 clients and 8 servers. After the report is run I see the
following data.


label: Tue Aug 2 13:52:51 BST 2011
throughput: 4196.00xact/sec or 608.43Mbits/sec
response time: 49msec mean
hit ratios: 18.86% DHR and 6.88% BHR
unique URLs: 820565xact (35.40% recurrence)
errors: 0.00% (8xact out of 1270224xact)
duration: 5.05min
start time: Tue, 02 Aug 2011 12:53:18 GMT
workload: available
Polygraph version: 4.4.0
reporter version: 4.4.0

however when I am looking at the switch statistics the number reported
is much lower (360Mbits/sec)

Could you clarify what the throughput value actually relates to?

I am also seeing the following message when trying to run the
polygraph-reporter. Could you help so that I can avoid this error?

PGL configuration in /opt/stress/scripts/tmp/sss-101-strs_1312291541.log
differs from the one
in /opt/stress/scripts/tmp/sss-103-strs_1312291541.log

All the configs are identical apart from the "use" line at the bottom
which I generate to pair up my servers and clients. The attached file
192.168.29.101.polygraph.pg is an example of this line. Each pair of
server / client will have the same line but the next pair will have a
different line.

Attached are my polygraph files to help you understand my setup.

Please let me know if you require further information or clarification.

Thanks in advance.

Mohammed Rakhada
Systems Administrator
Cisco Ltd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 192.168.29.101.polygraph.pg
Type: text/x-csrc
Size: 5610 bytes
Desc: not available
URL: <http://lists.web-polygraph.org/pipermail/users/attachments/20110802/cec35f02/attachment.c>
-------------- next part --------------
Robot sss101strs = {
        kind = "sss101strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.202:8080' ];
	http_proxies = [ '192.168.30.50:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.201' ** 1000 ];
};

Robot sss103strs = {
        kind = "sss103strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.204:8080' ];
	http_proxies = [ '192.168.30.50:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.203' ** 1000 ];
};

Robot sss105strs = {
        kind = "sss105strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.206:8080' ];
	http_proxies = [ '192.168.30.60:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.205' ** 1000 ];
};

Robot sss107strs = {
        kind = "sss107strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.208:8080' ];
	http_proxies = [ '192.168.30.60:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.207' ** 1000 ];
};

Robot sss109strs = {
        kind = "sss109strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.210:8080' ];
	http_proxies = [ '192.168.30.70:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.209' ** 1000 ];
};

Robot sss111strs = {
        kind = "sss111strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.212:8080' ];
	http_proxies = [ '192.168.30.70:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.211' ** 1000 ];
};

Robot sss113strs = {
        kind = "sss113strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.214:8080' ];
	http_proxies = [ '192.168.30.80:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.213' ** 1000 ];
};

Robot sss115strs = {
        kind = "sss115strs";
        interests = [ "public": 30%, "foreign" ]; //user traces
        foreign_trace = "/opt/home/mrakhada/t53.urls.httponly.ports";
        pop_model = { pop_distr = popUnif(); };
        recurrence = 25% / cntImage.cachable;
        req_rate = 1/sec;
        //origins = M.names;
        origins = [ '192.168.30.216:8080' ];
	http_proxies = [ '192.168.30.80:8080' ];
        //origins = S.addresses;
        addresses = ['192.168.30.215' ** 1000 ];
};

Server sss102strs = {
        kind = "sss102strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.202:8080', '192.168.30.202:80' ];
};
Server sss104strs = {
        kind = "sss104strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.204:8080', '192.168.30.204:80' ];
};
Server sss106strs = {
        kind = "sss106strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.206:8080', '192.168.30.206:80' ];
};
Server sss108strs = {
        kind = "sss108strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.208:8080', '192.168.30.208:80' ];
};
Server sss110strs = {
        kind = "sss110strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.210:8080', '192.168.30.210:80' ];
};
Server sss112strs = {
        kind = "sss112strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.212:8080', '192.168.30.212:80' ];
};
Server sss114strs = {
        kind = "sss114strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.214:8080', '192.168.30.214:80' ];
};
Server sss116strs = {
        kind = "sss116strs";
        contents = [ cntJPG: 40%, cntGIF: 45%, cntPNG: 14%, cntPDF: 1% ];
        direct_access = contents;
        //addresses = M.addresses;
  	addresses = [ '192.168.30.216:8080', '192.168.30.216:80' ];
};

From dmitry.kurochkin at measurement-factory.com  Tue Aug  2 17:15:29 2011
From: dmitry.kurochkin at measurement-factory.com (Dmitry Kurochkin)
Date: Tue, 02 Aug 2011 21:15:29 +0400
Subject: Incorrect throughput being seen in report and differing PGL
	configs
In-Reply-To: <1312293069.2165.34.camel@localhost>
References: <1312293069.2165.34.camel@localhost>
Message-ID: <87oc074tvy.fsf@gmail.com>

Hi.

On Tue, 02 Aug 2011 14:51:09 +0100, Mohammed Rakhada <morakhad at cisco.com> wrote:
> Hello,
> 
> I am using Web Polygraph version 4.4.0 and running against a proxy
> server using 8 clients and 8 servers. After the report is run I see the
> following data.
> 
> 
> label: Tue Aug 2 13:52:51 BST 2011
> throughput: 4196.00xact/sec or 608.43Mbits/sec
> response time: 49msec mean
> hit ratios: 18.86% DHR and 6.88% BHR
> unique URLs: 820565xact (35.40% recurrence)
> errors: 0.00% (8xact out of 1270224xact)
> duration: 5.05min
> start time: Tue, 02 Aug 2011 12:53:18 GMT
> workload: available
> Polygraph version: 4.4.0
> reporter version: 4.4.0
> 
> however when I am looking at the switch statistics the number reported
> is much lower (360Mbits/sec)
> 
> Could you clarify what the throughput value actually relates to?
> 

Throughput on the index page of HTML report is client side reply
throughput, i.e. (size of all replies clients received) / duration).  It
does not include requests or replies sent by servers.

I am not sure why you see lower throughput stats on the switch.  You may
get wrong stats in reporter if you specify a single log multiple times
on the command line.  I do not think it is likely, but this may be a bug
in Polygraph reporter or client.

If you believe Polygraph stats are wrong, I recommend you start with
checking that throughput in reporter is calculated correctly from binary
logs.  Make sure you do not specify any log twice in reporter
parameters.  Try generating report for a single log: throughput for
multiple logs should be equal to sum of all throughputs from each log.

You may also send us Polygraph binary logs for investigation.

> I am also seeing the following message when trying to run the
> polygraph-reporter. Could you help so that I can avoid this error?
> 
> PGL configuration in /opt/stress/scripts/tmp/sss-101-strs_1312291541.log
> differs from the one
> in /opt/stress/scripts/tmp/sss-103-strs_1312291541.log
> 
> All the configs are identical apart from the "use" line at the bottom
> which I generate to pair up my servers and clients. The attached file
> 192.168.29.101.polygraph.pg is an example of this line. Each pair of
> server / client will have the same line but the next pair will have a
> different line.
> 

To get rid of the warning you should use the same workload for all
Polygraph client and server processes.  It should be simple for your
current workload: use a single PGL Robot and Server with addresses,
origins, and http_proxies set to list of all addresses you need.  E.g.:

  Robot R = {
    ...
    origins = [ all server addresses ];
    addresses = [ all client addresses ];
    http_proxies = [ all HTTP proxy addresses ];
  };

When Polygraph client (server) starts, it checks network interfaces
configured on the host and starts only those Robots (Servers) that use
locally configured address.  So when you run such workload on different
hosts, agents with different addresses would be started, which is what
you want, I guess.  This would also allow result in all Robots making
requests to all Servers (with your current workload, Robots running on a
given hosts always make requests to a single Server though a single HTTP
proxy).

Note: you may copy PGL objects to avoid setting the same properties
multiple times, e.g.:

  Robot Base = {
    // common settings
    ...
  };

  Robot R1 = Base;
  R1.req_types = ...; // set R1-specific properties

Regards,
  Dmitry

> Attached are my polygraph files to help you understand my setup.
> 
> Please let me know if you require further information or clarification.
> 
> Thanks in advance.
> 
> Mohammed Rakhada
> Systems Administrator
> Cisco Ltd


From rousskov at measurement-factory.com  Thu Aug  4 14:12:42 2011
From: rousskov at measurement-factory.com (Alex Rousskov)
Date: Thu, 04 Aug 2011 08:12:42 -0600
Subject: Incorrect throughput being seen in report and differing
	PGL	configs
In-Reply-To: <87oc074tvy.fsf@gmail.com>
References: <1312293069.2165.34.camel@localhost> <87oc074tvy.fsf@gmail.com>
Message-ID: <4E3AA8DA.9090201@measurement-factory.com>

On 08/02/2011 11:15 AM, Dmitry Kurochkin wrote:

> On Tue, 02 Aug 2011 14:51:09 +0100, Mohammed Rakhada <morakhad at cisco.com> wrote:

>> throughput: 4196.00xact/sec or 608.43Mbits/sec
>>
>> however when I am looking at the switch statistics the number reported
>> is much lower (360Mbits/sec)
>>
>> Could you clarify what the throughput value actually relates to?
>>
> 
> Throughput on the index page of HTML report is client side reply
> throughput, i.e. (size of all replies clients received) / duration).  It
> does not include requests or replies sent by servers.
> 
> I am not sure why you see lower throughput stats on the switch.  You may
> get wrong stats in reporter if you specify a single log multiple times
> on the command line.  I do not think it is likely, but this may be a bug
> in Polygraph reporter or client.

Another possibility here is that the switch is counting traffic volumes
over a longer (or shorter!) periods of time while Polygraph reporter is
using response volume during the specified test phase(s).

If you want to investigate this further, running longer tests with fixed
response sizes while looking at runtime switch stats may be useful.

HTH.

Alex.


From morakhad at cisco.com  Thu Aug  4 14:22:42 2011
From: morakhad at cisco.com (Mohammed Rakhada)
Date: Thu, 04 Aug 2011 15:22:42 +0100
Subject: Incorrect throughput being seen in report and differing PGL
	configs
In-Reply-To: <4E3AA8DA.9090201@measurement-factory.com>
References: <1312293069.2165.34.camel@localhost> <87oc074tvy.fsf@gmail.com>
	<4E3AA8DA.9090201@measurement-factory.com>
Message-ID: <1312467762.2165.41.camel@localhost>

Hello Alex,

Thanks for your report, there isn't a problem here. It was my
interpretation of the data. I didn't realise the throughput was averaged
across the length of the test and so in my test as there was a slow ramp
period (and it was a short test) it was giving me the "middle" value.
All looks fine and as to be expected.

Thanks to Dmitry for his initial input, meant to reply to him earlier
but was just verifying some of my tests and results.

What would be nice would be to have a maximum and 95% measurement so
that we can see what the actual measured peak was.

As for the hosts and differing configs I've implemented the changes
Dmitry suggested and they work fine.

Thanks

Mohammed Rakhada

On Thu, 2011-08-04 at 08:12 -0600, Alex Rousskov wrote:

> On 08/02/2011 11:15 AM, Dmitry Kurochkin wrote:
> 
> > On Tue, 02 Aug 2011 14:51:09 +0100, Mohammed Rakhada <morakhad at cisco.com> wrote:
> 
> >> throughput: 4196.00xact/sec or 608.43Mbits/sec
> >>
> >> however when I am looking at the switch statistics the number reported
> >> is much lower (360Mbits/sec)
> >>
> >> Could you clarify what the throughput value actually relates to?
> >>
> > 
> > Throughput on the index page of HTML report is client side reply
> > throughput, i.e. (size of all replies clients received) / duration).  It
> > does not include requests or replies sent by servers.
> > 
> > I am not sure why you see lower throughput stats on the switch.  You may
> > get wrong stats in reporter if you specify a single log multiple times
> > on the command line.  I do not think it is likely, but this may be a bug
> > in Polygraph reporter or client.
> 
> Another possibility here is that the switch is counting traffic volumes
> over a longer (or shorter!) periods of time while Polygraph reporter is
> using response volume during the specified test phase(s).
> 
> If you want to investigate this further, running longer tests with fixed
> response sizes while looking at runtime switch stats may be useful.
> 
> HTH.
> 
> Alex.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.web-polygraph.org/pipermail/users/attachments/20110804/a9953311/attachment.html>

From dmitry.kurochkin at measurement-factory.com  Thu Aug  4 18:13:14 2011
From: dmitry.kurochkin at measurement-factory.com (Dmitry Kurochkin)
Date: Thu, 04 Aug 2011 22:13:14 +0400
Subject: Incorrect throughput being seen in report and differing PGL
	configs
In-Reply-To: <1312467762.2165.41.camel@localhost>
References: <1312293069.2165.34.camel@localhost> <87oc074tvy.fsf@gmail.com>
	<4E3AA8DA.9090201@measurement-factory.com>
	<1312467762.2165.41.camel@localhost>
Message-ID: <87ei11gi4l.fsf@gmail.com>

Hi Mohammed.

On Thu, 04 Aug 2011 15:22:42 +0100, Mohammed Rakhada <morakhad at cisco.com> wrote:
> Hello Alex,
> 
> Thanks for your report, there isn't a problem here. It was my
> interpretation of the data. I didn't realise the throughput was averaged
> across the length of the test and so in my test as there was a slow ramp
> period (and it was a short test) it was giving me the "middle" value.
> All looks fine and as to be expected.
> 

Good to hear.

> Thanks to Dmitry for his initial input, meant to reply to him earlier
> but was just verifying some of my tests and results.
> 
> What would be nice would be to have a maximum and 95% measurement so
> that we can see what the actual measured peak was.
> 

May not be exactly what you want, but take a look at "load trace" plot
on "traffic rates, counts, and volumes" page.  Also, "everything" page
has stats for individual objects, e.g. "Object 'hits and misses'".

If you are interested in "raw" stats, you can get it with
polygraph-lx(1) and polygraph-ltrace(1) tools.  E.g.:

  $ ltrace --win_len 30sec --side clt --objects rep.rate LOG

Would give you reply rate stats with 30sec interval.  Polygraph stat
cycle length is 5sec by default and can be changed with --stats_cycle
option.

Regards,
  Dmitry

> As for the hosts and differing configs I've implemented the changes
> Dmitry suggested and they work fine.
> 
> Thanks
> 
> Mohammed Rakhada
> 
> On Thu, 2011-08-04 at 08:12 -0600, Alex Rousskov wrote:
> 
> > On 08/02/2011 11:15 AM, Dmitry Kurochkin wrote:
> > 
> > > On Tue, 02 Aug 2011 14:51:09 +0100, Mohammed Rakhada <morakhad at cisco.com> wrote:
> > 
> > >> throughput: 4196.00xact/sec or 608.43Mbits/sec
> > >>
> > >> however when I am looking at the switch statistics the number reported
> > >> is much lower (360Mbits/sec)
> > >>
> > >> Could you clarify what the throughput value actually relates to?
> > >>
> > > 
> > > Throughput on the index page of HTML report is client side reply
> > > throughput, i.e. (size of all replies clients received) / duration).  It
> > > does not include requests or replies sent by servers.
> > > 
> > > I am not sure why you see lower throughput stats on the switch.  You may
> > > get wrong stats in reporter if you specify a single log multiple times
> > > on the command line.  I do not think it is likely, but this may be a bug
> > > in Polygraph reporter or client.
> > 
> > Another possibility here is that the switch is counting traffic volumes
> > over a longer (or shorter!) periods of time while Polygraph reporter is
> > using response volume during the specified test phase(s).
> > 
> > If you want to investigate this further, running longer tests with fixed
> > response sizes while looking at runtime switch stats may be useful.
> > 
> > HTH.
> > 
> > Alex.
> 
> 
> _______________________________________________
> Users mailing list
> Users at web-polygraph.org
> http://www.web-polygraph.org/mailman/listinfo/users