From pmix at hendrie.id.au Thu May 16 00:55:51 2013 From: pmix at hendrie.id.au (Michael Hendrie) Date: Thu, 16 May 2013 10:25:51 +0930 Subject: Client side stats SSL workload 4.3.2 Message-ID: Hi List, I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some inconsistency being reported in the client side runtime stats. The issue only seems to occur when polygraph-client is configured to send requests via a proxy. In the client side stats, I'm seeing a significantly higher request rate than what was configured, on the server side stats they report as expected. e.g. on a test configured for 250req/sec: Client Side Stats: 030.41| i-top1 786670 421.83 1357 0.00 0 1671 ssl 448598 421.83 1360 0.00 0 1671 030.50| i-top1 788934 452.78 1379 0.00 0 1668 ssl 449857 452.78 1379 0.00 0 1668 030.58| i-top1 791229 458.98 1465 0.00 0 1660 ssl 451193 458.98 1465 0.00 0 1660 030.66| i-top1 793401 434.40 1461 0.00 0 1687 ssl 452449 434.40 1463 0.00 0 1687 030.75| i-top1 795476 415.02 1401 0.00 0 1723 ssl 453636 415.02 1402 0.00 0 1721 Server Side Stats: 030.45| i-top1 426331 227.03 2518 0.00 0 2200 ssl 426331 227.03 2518 0.00 0 1637 030.54| i-top1 427519 237.59 2539 0.00 0 2201 ssl 427519 237.59 2539 0.00 0 1638 030.62| i-top1 428710 238.19 2467 0.00 0 2171 ssl 428710 238.19 2468 0.00 0 1608 030.70| i-top1 429962 250.40 2503 0.00 0 2300 ssl 429962 250.40 2502 0.00 0 1737 I didn't see this inconsistency in 4.0.10 and if I have a direct client->server test (no proxy) then stats are also reported correctly. The proxy itself (squid) is reporting around 190 client req/sec: client_http.requests = 188.674032/sec Having just read the changelog (after typing the above), now wondering if the fix in the non-public 4.6.0 would resolve this issue? * - Make retried transactions obey request inter-arrival distribution and, hence, Robot::req_rate setting. * Any thoughts/idea's? Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.kurochkin at measurement-factory.com Thu May 16 09:22:05 2013 From: dmitry.kurochkin at measurement-factory.com (Dmitry Kurochkin) Date: Thu, 16 May 2013 13:22:05 +0400 Subject: Client side stats SSL workload 4.3.2 In-Reply-To: References: Message-ID: <87obcb8eki.fsf@gmail.com> Hi Michael. Michael Hendrie writes: > Hi List, > > I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to > Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some > inconsistency being reported in the client side runtime stats. > > The issue only seems to occur when polygraph-client is configured to send > requests via a proxy. In the client side stats, I'm seeing a significantly > higher request rate than what was configured, on the server side stats they > report as expected. e.g. on a test configured for 250req/sec: > > Client Side Stats: > 030.41| i-top1 786670 421.83 1357 0.00 0 1671 > ssl 448598 421.83 1360 0.00 0 1671 > 030.50| i-top1 788934 452.78 1379 0.00 0 1668 > ssl 449857 452.78 1379 0.00 0 1668 > 030.58| i-top1 791229 458.98 1465 0.00 0 1660 > ssl 451193 458.98 1465 0.00 0 1660 > 030.66| i-top1 793401 434.40 1461 0.00 0 1687 > ssl 452449 434.40 1463 0.00 0 1687 > 030.75| i-top1 795476 415.02 1401 0.00 0 1723 > ssl 453636 415.02 1402 0.00 0 1721 > > Server Side Stats: > 030.45| i-top1 426331 227.03 2518 0.00 0 2200 > ssl 426331 227.03 2518 0.00 0 1637 > 030.54| i-top1 427519 237.59 2539 0.00 0 2201 > ssl 427519 237.59 2539 0.00 0 1638 > 030.62| i-top1 428710 238.19 2467 0.00 0 2171 > ssl 428710 238.19 2468 0.00 0 1608 > 030.70| i-top1 429962 250.40 2503 0.00 0 2300 > ssl 429962 250.40 2502 0.00 0 1737 > > > I didn't see this inconsistency in 4.0.10 and if I have a direct > client->server test (no proxy) then stats are also reported correctly. > > The proxy itself (squid) is reporting around 190 client req/sec: > client_http.requests = 188.674032/sec > > Having just read the changelog (after typing the above), now wondering if > the fix in the non-public 4.6.0 would resolve this issue? > > * > > - Make retried transactions obey request inter-arrival > distribution and, hence, Robot::req_rate setting. > > * > > Any thoughts/idea's? > Since v4.2.0 CONNECT tunnel establishment is treated as a separate transaction: - Treat an HTTP CONNECT response as the end of the transaction. Collect and report stats dedicated to these new CONNECT-only transactions. These CONNECT transactions are often rather different from the in-tunnel transactions that follow. Isolating them also prevents CONNECT overheads from polluting the first in-tunnel transaction stats. So an HTTPS GET transaction over proxy is counted as two transactions on the client side: CONNECT and GET. Detailed stats are available in HTML report or can be extracted from binary log with the lx tool. The behavior is the same on all newer Polygraph versions, so upgrading would not help. Regards, Dmitry > Michael > _______________________________________________ > Users mailing list > Users at web-polygraph.org > http://www.web-polygraph.org/mailman/listinfo/users From rousskov at measurement-factory.com Thu May 16 16:59:32 2013 From: rousskov at measurement-factory.com (Alex Rousskov) Date: Thu, 16 May 2013 10:59:32 -0600 Subject: Client side stats SSL workload 4.3.2 In-Reply-To: <87obcb8eki.fsf@gmail.com> References: <87obcb8eki.fsf@gmail.com> Message-ID: <51951074.6080104@measurement-factory.com> On 05/16/2013 03:22 AM, Dmitry Kurochkin wrote: > Hi Michael. > > Michael Hendrie writes: > >> Hi List, >> >> I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to >> Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some >> inconsistency being reported in the client side runtime stats. >> >> The issue only seems to occur when polygraph-client is configured to send >> requests via a proxy. In the client side stats, I'm seeing a significantly >> higher request rate than what was configured, on the server side stats they >> report as expected. e.g. on a test configured for 250req/sec: >> >> Client Side Stats: >> 030.41| i-top1 786670 421.83 1357 0.00 0 1671 >> ssl 448598 421.83 1360 0.00 0 1671 >> 030.50| i-top1 788934 452.78 1379 0.00 0 1668 >> ssl 449857 452.78 1379 0.00 0 1668 >> 030.58| i-top1 791229 458.98 1465 0.00 0 1660 >> ssl 451193 458.98 1465 0.00 0 1660 >> 030.66| i-top1 793401 434.40 1461 0.00 0 1687 >> ssl 452449 434.40 1463 0.00 0 1687 >> 030.75| i-top1 795476 415.02 1401 0.00 0 1723 >> ssl 453636 415.02 1402 0.00 0 1721 >> >> Server Side Stats: >> 030.45| i-top1 426331 227.03 2518 0.00 0 2200 >> ssl 426331 227.03 2518 0.00 0 1637 >> 030.54| i-top1 427519 237.59 2539 0.00 0 2201 >> ssl 427519 237.59 2539 0.00 0 1638 >> 030.62| i-top1 428710 238.19 2467 0.00 0 2171 >> ssl 428710 238.19 2468 0.00 0 1608 >> 030.70| i-top1 429962 250.40 2503 0.00 0 2300 >> ssl 429962 250.40 2502 0.00 0 1737 >> >> >> I didn't see this inconsistency in 4.0.10 and if I have a direct >> client->server test (no proxy) then stats are also reported correctly. >> >> The proxy itself (squid) is reporting around 190 client req/sec: >> client_http.requests = 188.674032/sec >> >> Having just read the changelog (after typing the above), now wondering if >> the fix in the non-public 4.6.0 would resolve this issue? >> >> * >> >> - Make retried transactions obey request inter-arrival >> distribution and, hence, Robot::req_rate setting. >> >> * >> >> Any thoughts/idea's? >> > > Since v4.2.0 CONNECT tunnel establishment is treated as a separate > transaction: > > - Treat an HTTP CONNECT response as the end of the transaction. > Collect and report stats dedicated to these new CONNECT-only > transactions. These CONNECT transactions are often rather > different from the in-tunnel transactions that follow. > Isolating them also prevents CONNECT overheads from polluting > the first in-tunnel transaction stats. > > So an HTTPS GET transaction over proxy is counted as two transactions on > the client side: CONNECT and GET. > > Detailed stats are available in HTML report or can be extracted from > binary log with the lx tool. > > The behavior is the same on all newer Polygraph versions, so upgrading > would not help. .... but there are plans to make baseline reporting match configuration better when dealing with what we call "compound" transactions. The overall principle behind the planned changes is to have reported baseline request rate (i-lines in the console output) match the configured Bench::peak_req_rate (or equivalent), regardless of whether the workload involves CONNECT requests and/or authentication. We will also report CONNECT (and authentication) activity separately, just like we report ssl activity separately today. We hope the above changes will reduce surprises caused by compound transactions involving multiple HTTP requests. Cheers, Alex. From pmix at hendrie.id.au Fri May 17 08:11:46 2013 From: pmix at hendrie.id.au (Michael Hendrie) Date: Fri, 17 May 2013 17:41:46 +0930 Subject: Client side stats SSL workload 4.3.2 In-Reply-To: <51951074.6080104@measurement-factory.com> References: <87obcb8eki.fsf@gmail.com> <51951074.6080104@measurement-factory.com> Message-ID: <8274538C-DCB9-4F3E-8A9C-8A33D127A5D7@hendrie.id.au> On 17/05/2013, at 2:29 AM, Alex Rousskov wrote: > On 05/16/2013 03:22 AM, Dmitry Kurochkin wrote: >> Hi Michael. >> >> Michael Hendrie writes: >> >>> Hi List, >>> >>> I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to >>> Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some >>> inconsistency being reported in the client side runtime stats. >>> >>> The issue only seems to occur when polygraph-client is configured to send >>> requests via a proxy. In the client side stats, I'm seeing a significantly >>> higher request rate than what was configured, on the server side stats they >>> report as expected. e.g. on a test configured for 250req/sec: >>> >>> Client Side Stats: >>> 030.41| i-top1 786670 421.83 1357 0.00 0 1671 >>> ssl 448598 421.83 1360 0.00 0 1671 >>> 030.50| i-top1 788934 452.78 1379 0.00 0 1668 >>> ssl 449857 452.78 1379 0.00 0 1668 >>> 030.58| i-top1 791229 458.98 1465 0.00 0 1660 >>> ssl 451193 458.98 1465 0.00 0 1660 >>> 030.66| i-top1 793401 434.40 1461 0.00 0 1687 >>> ssl 452449 434.40 1463 0.00 0 1687 >>> 030.75| i-top1 795476 415.02 1401 0.00 0 1723 >>> ssl 453636 415.02 1402 0.00 0 1721 >>> >>> Server Side Stats: >>> 030.45| i-top1 426331 227.03 2518 0.00 0 2200 >>> ssl 426331 227.03 2518 0.00 0 1637 >>> 030.54| i-top1 427519 237.59 2539 0.00 0 2201 >>> ssl 427519 237.59 2539 0.00 0 1638 >>> 030.62| i-top1 428710 238.19 2467 0.00 0 2171 >>> ssl 428710 238.19 2468 0.00 0 1608 >>> 030.70| i-top1 429962 250.40 2503 0.00 0 2300 >>> ssl 429962 250.40 2502 0.00 0 1737 >>> >>> >>> I didn't see this inconsistency in 4.0.10 and if I have a direct >>> client->server test (no proxy) then stats are also reported correctly. >>> >>> The proxy itself (squid) is reporting around 190 client req/sec: >>> client_http.requests = 188.674032/sec >>> >>> Having just read the changelog (after typing the above), now wondering if >>> the fix in the non-public 4.6.0 would resolve this issue? >>> >>> * >>> >>> - Make retried transactions obey request inter-arrival >>> distribution and, hence, Robot::req_rate setting. >>> >>> * >>> >>> Any thoughts/idea's? >>> >> >> Since v4.2.0 CONNECT tunnel establishment is treated as a separate >> transaction: >> >> - Treat an HTTP CONNECT response as the end of the transaction. >> Collect and report stats dedicated to these new CONNECT-only >> transactions. These CONNECT transactions are often rather >> different from the in-tunnel transactions that follow. >> Isolating them also prevents CONNECT overheads from polluting >> the first in-tunnel transaction stats. >> >> So an HTTPS GET transaction over proxy is counted as two transactions on >> the client side: CONNECT and GET. >> >> Detailed stats are available in HTML report or can be extracted from >> binary log with the lx tool. >> >> The behavior is the same on all newer Polygraph versions, so upgrading >> would not help. > > .... but there are plans to make baseline reporting match configuration > better when dealing with what we call "compound" transactions. The > overall principle behind the planned changes is to have reported > baseline request rate (i-lines in the console output) match the > configured Bench::peak_req_rate (or equivalent), regardless of whether > the workload involves CONNECT requests and/or authentication. > > We will also report CONNECT (and authentication) activity separately, > just like we report ssl activity separately today. > > > We hope the above changes will reduce surprises caused by compound > transactions involving multiple HTTP requests. > > OK so the numbers stack up then, squid is reporting ~180 req/sec (it will only know the CONNECT and not how many GET's passed through the tunnel) and peak_req_rate configured for 250 so the ~430 being reported client side sounds OK. Unfortunately I have another problem then, was hoping the above could be related to what I'm seeing. If I run a direct test (no proxy involved), it will go for hours and never miss a beat (last one ran for 15 hours). If I use the same workload but introduce a proxy with the --proxy switch the test always bombs out somewhere between the 2 and 3 hour mark (have tried at least 5 different times) even though none of the devices (including the proxy) are reporting any high load or resource exhaustion. Before it bombs here's what the client console says: 136.74| i-top1 3588097 406.69 1583 0.00 0 1681 ssl 2048966 406.69 1584 0.00 0 1679 136.83| i-top1 3590106 400.26 1600 0.00 0 1659 ssl 2050132 400.26 1602 0.00 0 1633 136.91| i-top1 3592036 364.13 1695 0.00 0 1719 ssl 2051152 364.13 1696 0.00 0 1653 137.04| PolyApp.cc:189: error: 1/1 (c58) internal timers may be getting behind 137.04| record level of timing drift: 2.48sec; last check was 6.45sec ago 137.27| i-top1 3592779 34.36 11075 0.00 0 1609 ssl 2051667 34.36 11075 0.00 0 1102 137.30| PolyApp.cc:189: error: 2/2 (c58) internal timers may be getting behind 137.30| record level of timing drift: 15.40sec; last check was 15.37sec ago 137.40| Connection.cc:581: error: 1/3 (s32) Broken pipe 137.40| error: raw close on SSL connection failed 137.40| connection between 172.17.20.35:31263 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts 137.40| Connection.cc:581: error: 2/4 (s32) Broken pipe 137.40| error: raw close on SSL connection failed 137.40| connection between 172.17.20.173:43974 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts 137.41| Connection.cc:581: error: 4/6 (s32) Broken pipe 137.41| error: raw close on SSL connection failed 137.41| connection between 172.17.20.5:42461 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts 137.41| Connection.cc:581: error: 8/10 (s32) Broken pipe 137.41| error: raw close on SSL connection failed 137.41| connection between 172.17.20.245:35371 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts 137.41| Connection.cc:581: error: 16/18 (s32) Broken pipe 137.41| error: raw close on SSL connection failed 137.41| connection between 172.17.21.32:13088 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts 137.43| i-top1 3593243 53.18 25644 0.00 0 1533 ssl 2051843 53.18 25644 0.00 0 769 137.44| i-top1 3593243 0.00 -1 -1.00 0 1895 ssl 2051843 0.00 -1 -1.00 0 748 Any ideas on this one? Michael From dmitry.kurochkin at measurement-factory.com Fri May 17 16:29:30 2013 From: dmitry.kurochkin at measurement-factory.com (Dmitry Kurochkin) Date: Fri, 17 May 2013 20:29:30 +0400 Subject: Client side stats SSL workload 4.3.2 In-Reply-To: <8274538C-DCB9-4F3E-8A9C-8A33D127A5D7@hendrie.id.au> References: <87obcb8eki.fsf@gmail.com> <51951074.6080104@measurement-factory.com> <8274538C-DCB9-4F3E-8A9C-8A33D127A5D7@hendrie.id.au> Message-ID: <87ip2h8t91.fsf@gmail.com> Hi Michael. Michael Hendrie writes: > On 17/05/2013, at 2:29 AM, Alex Rousskov wrote: > >> On 05/16/2013 03:22 AM, Dmitry Kurochkin wrote: >>> Hi Michael. >>> >>> Michael Hendrie writes: >>> >>>> Hi List, >>>> >>>> I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to >>>> Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some >>>> inconsistency being reported in the client side runtime stats. >>>> >>>> The issue only seems to occur when polygraph-client is configured to send >>>> requests via a proxy. In the client side stats, I'm seeing a significantly >>>> higher request rate than what was configured, on the server side stats they >>>> report as expected. e.g. on a test configured for 250req/sec: >>>> >>>> Client Side Stats: >>>> 030.41| i-top1 786670 421.83 1357 0.00 0 1671 >>>> ssl 448598 421.83 1360 0.00 0 1671 >>>> 030.50| i-top1 788934 452.78 1379 0.00 0 1668 >>>> ssl 449857 452.78 1379 0.00 0 1668 >>>> 030.58| i-top1 791229 458.98 1465 0.00 0 1660 >>>> ssl 451193 458.98 1465 0.00 0 1660 >>>> 030.66| i-top1 793401 434.40 1461 0.00 0 1687 >>>> ssl 452449 434.40 1463 0.00 0 1687 >>>> 030.75| i-top1 795476 415.02 1401 0.00 0 1723 >>>> ssl 453636 415.02 1402 0.00 0 1721 >>>> >>>> Server Side Stats: >>>> 030.45| i-top1 426331 227.03 2518 0.00 0 2200 >>>> ssl 426331 227.03 2518 0.00 0 1637 >>>> 030.54| i-top1 427519 237.59 2539 0.00 0 2201 >>>> ssl 427519 237.59 2539 0.00 0 1638 >>>> 030.62| i-top1 428710 238.19 2467 0.00 0 2171 >>>> ssl 428710 238.19 2468 0.00 0 1608 >>>> 030.70| i-top1 429962 250.40 2503 0.00 0 2300 >>>> ssl 429962 250.40 2502 0.00 0 1737 >>>> >>>> >>>> I didn't see this inconsistency in 4.0.10 and if I have a direct >>>> client->server test (no proxy) then stats are also reported correctly. >>>> >>>> The proxy itself (squid) is reporting around 190 client req/sec: >>>> client_http.requests = 188.674032/sec >>>> >>>> Having just read the changelog (after typing the above), now wondering if >>>> the fix in the non-public 4.6.0 would resolve this issue? >>>> >>>> * >>>> >>>> - Make retried transactions obey request inter-arrival >>>> distribution and, hence, Robot::req_rate setting. >>>> >>>> * >>>> >>>> Any thoughts/idea's? >>>> >>> >>> Since v4.2.0 CONNECT tunnel establishment is treated as a separate >>> transaction: >>> >>> - Treat an HTTP CONNECT response as the end of the transaction. >>> Collect and report stats dedicated to these new CONNECT-only >>> transactions. These CONNECT transactions are often rather >>> different from the in-tunnel transactions that follow. >>> Isolating them also prevents CONNECT overheads from polluting >>> the first in-tunnel transaction stats. >>> >>> So an HTTPS GET transaction over proxy is counted as two transactions on >>> the client side: CONNECT and GET. >>> >>> Detailed stats are available in HTML report or can be extracted from >>> binary log with the lx tool. >>> >>> The behavior is the same on all newer Polygraph versions, so upgrading >>> would not help. >> >> .... but there are plans to make baseline reporting match configuration >> better when dealing with what we call "compound" transactions. The >> overall principle behind the planned changes is to have reported >> baseline request rate (i-lines in the console output) match the >> configured Bench::peak_req_rate (or equivalent), regardless of whether >> the workload involves CONNECT requests and/or authentication. >> >> We will also report CONNECT (and authentication) activity separately, >> just like we report ssl activity separately today. >> >> >> We hope the above changes will reduce surprises caused by compound >> transactions involving multiple HTTP requests. >> >> > > OK so the numbers stack up then, squid is reporting ~180 req/sec (it will only know the CONNECT and not how many GET's passed through the tunnel) and peak_req_rate configured for 250 so the ~430 being reported client side sounds OK. > > Unfortunately I have another problem then, was hoping the above could be related to what I'm seeing. > > If I run a direct test (no proxy involved), it will go for hours and never miss a beat (last one ran for 15 hours). If I use the same workload but introduce a proxy with the --proxy switch the test always bombs out somewhere between the 2 and 3 hour mark (have tried at least 5 different times) even though none of the devices (including the proxy) are reporting any high load or resource exhaustion. Before it bombs here's what the client console says: > > 136.74| i-top1 3588097 406.69 1583 0.00 0 1681 > ssl 2048966 406.69 1584 0.00 0 1679 > 136.83| i-top1 3590106 400.26 1600 0.00 0 1659 > ssl 2050132 400.26 1602 0.00 0 1633 > 136.91| i-top1 3592036 364.13 1695 0.00 0 1719 > ssl 2051152 364.13 1696 0.00 0 1653 > 137.04| PolyApp.cc:189: error: 1/1 (c58) internal timers may be getting behind > 137.04| record level of timing drift: 2.48sec; last check was 6.45sec ago > 137.27| i-top1 3592779 34.36 11075 0.00 0 1609 > ssl 2051667 34.36 11075 0.00 0 1102 > 137.30| PolyApp.cc:189: error: 2/2 (c58) internal timers may be getting behind > 137.30| record level of timing drift: 15.40sec; last check was 15.37sec ago > 137.40| Connection.cc:581: error: 1/3 (s32) Broken pipe > 137.40| error: raw close on SSL connection failed > 137.40| connection between 172.17.20.35:31263 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts > 137.40| Connection.cc:581: error: 2/4 (s32) Broken pipe > 137.40| error: raw close on SSL connection failed > 137.40| connection between 172.17.20.173:43974 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts > 137.41| Connection.cc:581: error: 4/6 (s32) Broken pipe > 137.41| error: raw close on SSL connection failed > 137.41| connection between 172.17.20.5:42461 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts > 137.41| Connection.cc:581: error: 8/10 (s32) Broken pipe > 137.41| error: raw close on SSL connection failed > 137.41| connection between 172.17.20.245:35371 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts > 137.41| Connection.cc:581: error: 16/18 (s32) Broken pipe > 137.41| error: raw close on SSL connection failed > 137.41| connection between 172.17.21.32:13088 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts > 137.43| i-top1 3593243 53.18 25644 0.00 0 1533 > ssl 2051843 53.18 25644 0.00 0 769 > 137.44| i-top1 3593243 0.00 -1 -1.00 0 1895 > ssl 2051843 0.00 -1 -1.00 0 748 > > > Any ideas on this one? > Are there any errors on client or proxy? Some resource must be exhausted on client, proxy or server. I guess you have checked CPU and memory usage. Did you look at the network? Number of sockets in CLOSE_WAIT state, dropped packets, errors in system log, etc? Is the request rate stable in the phase where the test fails? In the log you provided, response time is increasing before the failure (though not by much). If response time is growing during a stable phase (with constant request rate) this suggests that proxy performance is degrading over time and sooner or later this would result in a test failure. Since the failure does not happen in a direct test, proxy would be the primary suspect. A good way to triage the issue may be to simplify the workload and/or proxy configuration (e.g. disable caching if it is enabled). I hope these ideas would point you in the right direction. Regards, Dmitry > Michael > _______________________________________________ > Users mailing list > Users at web-polygraph.org > http://www.web-polygraph.org/mailman/listinfo/users From pmix at hendrie.id.au Sat May 18 23:20:19 2013 From: pmix at hendrie.id.au (Michael Hendrie) Date: Sun, 19 May 2013 08:50:19 +0930 Subject: Client side stats SSL workload 4.3.2 In-Reply-To: <87ip2h8t91.fsf@gmail.com> References: <87obcb8eki.fsf@gmail.com> <51951074.6080104@measurement-factory.com> <8274538C-DCB9-4F3E-8A9C-8A33D127A5D7@hendrie.id.au> <87ip2h8t91.fsf@gmail.com> Message-ID: <2D22BE02-6BDE-4B0F-B26B-C4842A86DE65@hendrie.id.au> On 18/05/2013, at 1:59 AM, Dmitry Kurochkin wrote: > Hi Michael. > > Michael Hendrie writes: > >> On 17/05/2013, at 2:29 AM, Alex Rousskov wrote: >> >>> On 05/16/2013 03:22 AM, Dmitry Kurochkin wrote: >>>> Hi Michael. >>>> >>>> Michael Hendrie writes: >>>> >>>>> Hi List, >>>>> >>>>> I recently updated from Web-Polygraph 4.0.10 running on FreeBSD to >>>>> Web-Polygraph 4.3.2 running on Linux (RHEL 6.1 clone) and now have some >>>>> inconsistency being reported in the client side runtime stats. >>>>> >>>>> The issue only seems to occur when polygraph-client is configured to send >>>>> requests via a proxy. In the client side stats, I'm seeing a significantly >>>>> higher request rate than what was configured, on the server side stats they >>>>> report as expected. e.g. on a test configured for 250req/sec: >>>>> >>>>> Client Side Stats: >>>>> 030.41| i-top1 786670 421.83 1357 0.00 0 1671 >>>>> ssl 448598 421.83 1360 0.00 0 1671 >>>>> 030.50| i-top1 788934 452.78 1379 0.00 0 1668 >>>>> ssl 449857 452.78 1379 0.00 0 1668 >>>>> 030.58| i-top1 791229 458.98 1465 0.00 0 1660 >>>>> ssl 451193 458.98 1465 0.00 0 1660 >>>>> 030.66| i-top1 793401 434.40 1461 0.00 0 1687 >>>>> ssl 452449 434.40 1463 0.00 0 1687 >>>>> 030.75| i-top1 795476 415.02 1401 0.00 0 1723 >>>>> ssl 453636 415.02 1402 0.00 0 1721 >>>>> >>>>> Server Side Stats: >>>>> 030.45| i-top1 426331 227.03 2518 0.00 0 2200 >>>>> ssl 426331 227.03 2518 0.00 0 1637 >>>>> 030.54| i-top1 427519 237.59 2539 0.00 0 2201 >>>>> ssl 427519 237.59 2539 0.00 0 1638 >>>>> 030.62| i-top1 428710 238.19 2467 0.00 0 2171 >>>>> ssl 428710 238.19 2468 0.00 0 1608 >>>>> 030.70| i-top1 429962 250.40 2503 0.00 0 2300 >>>>> ssl 429962 250.40 2502 0.00 0 1737 >>>>> >>>>> >>>>> I didn't see this inconsistency in 4.0.10 and if I have a direct >>>>> client->server test (no proxy) then stats are also reported correctly. >>>>> >>>>> The proxy itself (squid) is reporting around 190 client req/sec: >>>>> client_http.requests = 188.674032/sec >>>>> >>>>> Having just read the changelog (after typing the above), now wondering if >>>>> the fix in the non-public 4.6.0 would resolve this issue? >>>>> >>>>> * >>>>> >>>>> - Make retried transactions obey request inter-arrival >>>>> distribution and, hence, Robot::req_rate setting. >>>>> >>>>> * >>>>> >>>>> Any thoughts/idea's? >>>>> >>>> >>>> Since v4.2.0 CONNECT tunnel establishment is treated as a separate >>>> transaction: >>>> >>>> - Treat an HTTP CONNECT response as the end of the transaction. >>>> Collect and report stats dedicated to these new CONNECT-only >>>> transactions. These CONNECT transactions are often rather >>>> different from the in-tunnel transactions that follow. >>>> Isolating them also prevents CONNECT overheads from polluting >>>> the first in-tunnel transaction stats. >>>> >>>> So an HTTPS GET transaction over proxy is counted as two transactions on >>>> the client side: CONNECT and GET. >>>> >>>> Detailed stats are available in HTML report or can be extracted from >>>> binary log with the lx tool. >>>> >>>> The behavior is the same on all newer Polygraph versions, so upgrading >>>> would not help. >>> >>> .... but there are plans to make baseline reporting match configuration >>> better when dealing with what we call "compound" transactions. The >>> overall principle behind the planned changes is to have reported >>> baseline request rate (i-lines in the console output) match the >>> configured Bench::peak_req_rate (or equivalent), regardless of whether >>> the workload involves CONNECT requests and/or authentication. >>> >>> We will also report CONNECT (and authentication) activity separately, >>> just like we report ssl activity separately today. >>> >>> >>> We hope the above changes will reduce surprises caused by compound >>> transactions involving multiple HTTP requests. >>> >>> >> >> OK so the numbers stack up then, squid is reporting ~180 req/sec (it will only know the CONNECT and not how many GET's passed through the tunnel) and peak_req_rate configured for 250 so the ~430 being reported client side sounds OK. >> >> Unfortunately I have another problem then, was hoping the above could be related to what I'm seeing. >> >> If I run a direct test (no proxy involved), it will go for hours and never miss a beat (last one ran for 15 hours). If I use the same workload but introduce a proxy with the --proxy switch the test always bombs out somewhere between the 2 and 3 hour mark (have tried at least 5 different times) even though none of the devices (including the proxy) are reporting any high load or resource exhaustion. Before it bombs here's what the client console says: >> >> 136.74| i-top1 3588097 406.69 1583 0.00 0 1681 >> ssl 2048966 406.69 1584 0.00 0 1679 >> 136.83| i-top1 3590106 400.26 1600 0.00 0 1659 >> ssl 2050132 400.26 1602 0.00 0 1633 >> 136.91| i-top1 3592036 364.13 1695 0.00 0 1719 >> ssl 2051152 364.13 1696 0.00 0 1653 >> 137.04| PolyApp.cc:189: error: 1/1 (c58) internal timers may be getting behind >> 137.04| record level of timing drift: 2.48sec; last check was 6.45sec ago >> 137.27| i-top1 3592779 34.36 11075 0.00 0 1609 >> ssl 2051667 34.36 11075 0.00 0 1102 >> 137.30| PolyApp.cc:189: error: 2/2 (c58) internal timers may be getting behind >> 137.30| record level of timing drift: 15.40sec; last check was 15.37sec ago >> 137.40| Connection.cc:581: error: 1/3 (s32) Broken pipe >> 137.40| error: raw close on SSL connection failed >> 137.40| connection between 172.17.20.35:31263 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts >> 137.40| Connection.cc:581: error: 2/4 (s32) Broken pipe >> 137.40| error: raw close on SSL connection failed >> 137.40| connection between 172.17.20.173:43974 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts >> 137.41| Connection.cc:581: error: 4/6 (s32) Broken pipe >> 137.41| error: raw close on SSL connection failed >> 137.41| connection between 172.17.20.5:42461 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts >> 137.41| Connection.cc:581: error: 8/10 (s32) Broken pipe >> 137.41| error: raw close on SSL connection failed >> 137.41| connection between 172.17.20.245:35371 and 172.16.0.13:8080 failed at 3 reads, 5 writes, and 2 xacts >> 137.41| Connection.cc:581: error: 16/18 (s32) Broken pipe >> 137.41| error: raw close on SSL connection failed >> 137.41| connection between 172.17.21.32:13088 and 172.16.0.13:8080 failed at 4 reads, 6 writes, and 3 xacts >> 137.43| i-top1 3593243 53.18 25644 0.00 0 1533 >> ssl 2051843 53.18 25644 0.00 0 769 >> 137.44| i-top1 3593243 0.00 -1 -1.00 0 1895 >> ssl 2051843 0.00 -1 -1.00 0 748 >> >> >> Any ideas on this one? >> > > Are there any errors on client or proxy? There are no error on the proxy and the problem begins on the client straight after the "137.04| PolyApp.cc:189: error: 1/1 (c58) internal timers may be getting behind" message > > Some resource must be exhausted on client, proxy or server. I guess you > have checked CPU and memory usage. Did you look at the network? Number > of sockets in CLOSE_WAIT state, dropped packets, errors in system log, > etc? I should have said no apparent resource exhaustion, I haven't caught a snapshot at the exact failure time but certainly in the 2.x hours leading up all looks fine. > > Is the request rate stable in the phase where the test fails? In the > log you provided, response time is increasing before the failure (though > not by much). If response time is growing during a stable phase (with > constant request rate) this suggests that proxy performance is degrading > over time and sooner or later this would result in a test failure. Yes the test is configured to get into the "top" phase very quickly, about 3 minutes after start and is fine for 2.x hours before failing. > > Since the failure does not happen in a direct test, proxy would be the > primary suspect. A good way to triage the issue may be to simplify the > workload and/or proxy configuration (e.g. disable caching if it is > enabled). Logically sounds like the proxy but have moved it to a environment where I still have polygraph 4.0.10 and it has been under test at the same request rate for 36 hours now with no issue. I haven't compared the exact workload details between the environments yet but would expect very minimal difference, if any at all. > > I hope these ideas would point you in the right direction. > Probably be a few days before I can continue investigating, if you have any other suggestions of where to look based on the above info I'd be glad to hear. Michael From suhaskekuda at gmail.com Wed May 22 10:09:02 2013 From: suhaskekuda at gmail.com (suhas kekuda) Date: Wed, 22 May 2013 15:39:02 +0530 Subject: needed help Message-ID: hi i'm new to polygraph and done basic tests on different proxies. i'm eager to test on authentication(using different phase) in the creation of phase i'm stuck..so could please anyone one guide me how to set and check my separate phase for it thanking you with regards suhas -------------- next part -------------- An HTML attachment was scrubbed... URL: