Client side stats SSL workload 4.3.2
Michael Hendrie
pmix at hendrie.id.au
Fri May 17 08:11:46 UTC 2013
On 17/05/2013, at 2:29 AM, Alex Rousskov <rousskov at measurement-factory.com> wrote:
> On 05/16/2013 03:22 AM, Dmitry Kurochkin wrote:
>> Hi Michael.
>>
>> Michael Hendrie <pmix at hendrie.id.au> writes:
>>
>>> Hi List,
>>>
>>> I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to
>>> Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some
>>> inconsistency being reported in the client side runtime stats.
>>>
>>> The issue only seems to occur when polygraph-client is configured to send
>>> requests via a proxy. In the client side stats, I'm seeing a significantly
>>> higher request rate than what was configured, on the server side stats they
>>> report as expected. e.g. on a test configured for 250req/sec:
>>>
>>> Client Side Stats:
>>> 030.41| i-top1 786670 421.83 1357 0.00 0 1671
>>> ssl 448598 421.83 1360 0.00 0 1671
>>> 030.50| i-top1 788934 452.78 1379 0.00 0 1668
>>> ssl 449857 452.78 1379 0.00 0 1668
>>> 030.58| i-top1 791229 458.98 1465 0.00 0 1660
>>> ssl 451193 458.98 1465 0.00 0 1660
>>> 030.66| i-top1 793401 434.40 1461 0.00 0 1687
>>> ssl 452449 434.40 1463 0.00 0 1687
>>> 030.75| i-top1 795476 415.02 1401 0.00 0 1723
>>> ssl 453636 415.02 1402 0.00 0 1721
>>>
>>> Server Side Stats:
>>> 030.45| i-top1 426331 227.03 2518 0.00 0 2200
>>> ssl 426331 227.03 2518 0.00 0 1637
>>> 030.54| i-top1 427519 237.59 2539 0.00 0 2201
>>> ssl 427519 237.59 2539 0.00 0 1638
>>> 030.62| i-top1 428710 238.19 2467 0.00 0 2171
>>> ssl 428710 238.19 2468 0.00 0 1608
>>> 030.70| i-top1 429962 250.40 2503 0.00 0 2300
>>> ssl 429962 250.40 2502 0.00 0 1737
>>>
>>>
>>> I didn't see this inconsistency in 4.0.10 and if I have a direct
>>> client->server test (no proxy) then stats are also reported correctly.
>>>
>>> The proxy itself (squid) is reporting around 190 client req/sec:
>>> client_http.requests = 188.674032/sec
>>>
>>> Having just read the changelog (after typing the above), now wondering if
>>> the fix in the non-public 4.6.0 would resolve this issue?
>>>
>>> *
>>>
>>> - Make retried transactions obey request inter-arrival
>>> distribution and, hence, Robot::req_rate setting.
>>>
>>> *
>>>
>>> Any thoughts/idea's?
>>>
>>
>> Since v4.2.0 CONNECT tunnel establishment is treated as a separate
>> transaction:
>>
>> - Treat an HTTP CONNECT response as the end of the transaction.
>> Collect and report stats dedicated to these new CONNECT-only
>> transactions. These CONNECT transactions are often rather
>> different from the in-tunnel transactions that follow.
>> Isolating them also prevents CONNECT overheads from polluting
>> the first in-tunnel transaction stats.
>>
>> So an HTTPS GET transaction over proxy is counted as two transactions on
>> the client side: CONNECT and GET.
>>
>> Detailed stats are available in HTML report or can be extracted from
>> binary log with the lx tool.
>>
>> The behavior is the same on all newer Polygraph versions, so upgrading
>> would not help.
>
> .... but there are plans to make baseline reporting match configuration
> better when dealing with what we call "compound" transactions. The
> overall principle behind the planned changes is to have reported
> baseline request rate (i-lines in the console output) match the
> configured Bench::peak_req_rate (or equivalent), regardless of whether
> the workload involves CONNECT requests and/or authentication.
>
> We will also report CONNECT (and authentication) activity separately,
> just like we report ssl activity separately today.
>
>
> We hope the above changes will reduce surprises caused by compound
> transactions involving multiple HTTP requests.
>
>
OK so the numbers stack up then, squid is reporting ~180 req/sec (it will only know the CONNECT and not how many GET's passed through the tunnel) and peak_req_rate configured for 250 so the ~430 being reported client side sounds OK.
Unfortunately I have another problem then, was hoping the above could be related to what I'm seeing.
If I run a direct test (no proxy involved), it will go for hours and never miss a beat (last one ran for 15 hours). If I use the same workload but introduce a proxy with the --proxy switch the test always bombs out somewhere between the 2 and 3 hour mark (have tried at least 5 different times) even though none of the devices (including the proxy) are reporting any high load or resource exhaustion. Before it bombs here's what the client console says:
136.74| i-top1 3588097 406.69 1583 0.00 0 1681
ssl 2048966 406.69 1584 0.00 0 1679
136.83| i-top1 3590106 400.26 1600 0.00 0 1659
ssl 2050132 400.26 1602 0.00 0 1633
136.91| i-top1 3592036 364.13 1695 0.00 0 1719
ssl 2051152 364.13 1696 0.00 0 1653
137.04| PolyApp.cc:189: error: 1/1 (c58) internal timers may be getting behind
137.04| record level of timing drift: 2.48sec; last check was 6.45sec ago
137.27| i-top1 3592779 34.36 11075 0.00 0 1609
ssl 2051667 34.36 11075 0.00 0 1102
137.30| PolyApp.cc:189: error: 2/2 (c58) internal timers may be getting behind
137.30| record level of timing drift: 15.40sec; last check was 15.37sec ago
137.40| Connection.cc:581: error: 1/3 (s32) Broken pipe
137.40| error: raw close on SSL connection failed
137.40| connection between 172.17.20.35:31263 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts
137.40| Connection.cc:581: error: 2/4 (s32) Broken pipe
137.40| error: raw close on SSL connection failed
137.40| connection between 172.17.20.173:43974 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts
137.41| Connection.cc:581: error: 4/6 (s32) Broken pipe
137.41| error: raw close on SSL connection failed
137.41| connection between 172.17.20.5:42461 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts
137.41| Connection.cc:581: error: 8/10 (s32) Broken pipe
137.41| error: raw close on SSL connection failed
137.41| connection between 172.17.20.245:35371 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts
137.41| Connection.cc:581: error: 16/18 (s32) Broken pipe
137.41| error: raw close on SSL connection failed
137.41| connection between 172.17.21.32:13088 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts
137.43| i-top1 3593243 53.18 25644 0.00 0 1533
ssl 2051843 53.18 25644 0.00 0 769
137.44| i-top1 3593243 0.00 -1 -1.00 0 1895
ssl 2051843 0.00 -1 -1.00 0 748
Any ideas on this one?
Michael
More information about the Users
mailing list