Skip to content

HTTPCollector duplicates listeners when multiple crawlers have set them #784

@dutsuwak

Description

@dutsuwak

Hello!

I have been doing some tests in a situation where multiple crawlers are set each a with a Listener for a Crawl event. When the HttpCrawlerConfigs are added to the HttpCollector it duplicates the listeners therefore calling multiple times the logic of my program.

Simplified example:

HttpCollectorConfig config = new HttpCollectorConfig();
List<HttpCrawlerConfig> httpCrawlerConfigs = new ArrayList<>();

for(int i = 0; i < urlsList.length; i++){
    var httpCrawlerConfig = new HttpCrawlerConfig();
    httpCrawlerConfig.setEventListeners(new CrawlEventListener());

    httpCrawlerConfigs.add(httpCrawlerConfig);
}

HttpCrawlerConfig[] crawlerConfigs = httpCrawlerConfigs.toArray(new HttpCrawlerConfig[httpCrawlerConfigs.size()]);
config.setCrawlerConfigs(crawlerConfigs);


var collector = new HttpCollector(collectorConfig); // From the debugging I did seems it happens when it scans the crawlers 
collector.start();                                  // configs here, and duplicates the listeners in the event manager

I did a workaround to set the listeners only for the first HttpCrawlerConfig, but I think it should be possible to use separate listeners for each Crawler.

Regards,
Fabian

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions