Huge Domain List

Alex Rousskov rousskov at measurement-factory.com
Thu Apr 19 23:52:41 UTC 2012


On 04/19/2012 03:31 PM, unjc email wrote:
> I have a long list of (~30000) domains which I want Webpolygraph
> clients to use them in the request URLs.

> I have tried using addressMap in workload as like below:

> AddrMap M = {
>      names = [ 'google.com','facebook.com','youtube.com','yahoo.com','live.com','baidu.com','blogspot.com','wikipedia.org'........];
>      addresses = [ '192.168.1.1' ];
> };


> I believe the address list is too long, Webpolygraph throws the
> "SynSym.cc:61: cannot cast string to addr" exception after trying to
> start the test for more than 10 minutes. 
> 
> Please advise if there is another way to input my custom domain-list
> for Webpolygraph to generate URLs.


I just tried a simple.pg workload with an address map like yours that
has 30000 domain names (formed from a local dictionary file). It does
take 25min to parse and interpret 30K strings(*), but the test starts.
Even the memory consumption looks reasonable at 100MB per process. The
client then fails in my case because I do not have a name server setup
to resolve those names, but I hope you do.

I used the following #include trick to keep the workload readable:

> AddrMap M = {
>      names = [
>      'firstname',
> #include "/tmp/names"
>      'lastname'
>      ];
>      addresses = [ '192.168.1.1:80' ];
> };
>
> use(M);


Does your workload work fine with, say, 10 custom domains?

If yes, perhaps your input line is too long (for Polygraph or for your
text editor)? Try the #include trick above, with every domain on its own
line. It is more manageable that way.


HTH,

Alex.
P.S.(*) Polygraph is not optimized to quickly grok 30K random names. In
fact, the default algorithm may try to find a "range" pattern in those
names so that they can be merged into a more compact representation. It
is possible to optimize handling of a large number of random names as
well, of course. Running more than a few tests with 25min startup times
would not be very productive!



More information about the Users mailing list