Skip to content

feat: Enable usage of Camoufox in PlaywrightCrawler#782

Closed
Pijukatel wants to merge 13 commits into
masterfrom
camoufox
Closed

feat: Enable usage of Camoufox in PlaywrightCrawler#782
Pijukatel wants to merge 13 commits into
masterfrom
camoufox

Conversation

@Pijukatel
Copy link
Copy Markdown
Collaborator

@Pijukatel Pijukatel commented Dec 5, 2024

Description

PlaywrightCrawler can use argument browser_type="camoufox" to use Camoufox as browser.
Camoufox is stealthy build of Firefox.

Issues

Quick and dirty attempt to use camoufox.
It works out of the box!

TODO:
Properly handle __aexit__ cleanup of camoufox
In the first run it downloads almost 700 MB binaries from camoufox github. That can be surprising to users. Discuss posibilities.
Properly get browser.
TODO: Add test.
Format and polish.
@Pijukatel Pijukatel added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Dec 5, 2024
@Pijukatel Pijukatel changed the title feat: Enable useage of Camoufox in PlaywrightCrawler feat: Enable usage of Camoufox in PlaywrightCrawler Dec 5, 2024
@github-actions github-actions Bot added this to the 104th sprint - Tooling team milestone Dec 5, 2024
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Dec 5, 2024
@Mantisus
Copy link
Copy Markdown
Collaborator

Mantisus commented Dec 5, 2024

Hey!

This is going to be a great feature!

But I'm not sure if we should use HeaderGenerator with camoufox, they implement their own logic to work with fingerprint.

I've tested it on several sites and it seems to work better without the HeaderGenerator.

@Pijukatel
Copy link
Copy Markdown
Collaborator Author

Hey!

This is going to be a great feature!

But I'm not sure if we should use HeaderGenerator with camoufox, they implement their own logic to work with fingerprint.

I've tested it on several sites and it seems to work better without the HeaderGenerator.

Hi, I think you are right. This PR was kind of just exploration. I will create another version that does not do any change to the crawlee code, but it will just add example code of how to integrate Camoufox into crawlee using custom BrowserPool. This option is already possible with current codebase and it gives users more control (including the headers generation). I like this option more as it does not add any other dependency to our codebase.
We can than discuss pros and cons of both solutions and decide what would be better.

@Pijukatel
Copy link
Copy Markdown
Collaborator Author

Closed in favor of: #789

@Pijukatel Pijukatel closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Camoufox into PlaywrightCrawler

2 participants