fix: use crawl4ai result.markdown instead of removed markdown_v2#52
Open
obchain wants to merge 1 commit into
Open
fix: use crawl4ai result.markdown instead of removed markdown_v2#52obchain wants to merge 1 commit into
obchain wants to merge 1 commit into
Conversation
`crawl4ai` 0.5.x deletes the legacy `markdown_v2` attribute and raises an `AttributeError` from `__getattr__` whenever it is accessed, which breaks the `no_extraction` / `cosine` scraping path in `WebScraper.extract` — `content` stays `None` and the subsequent `len(result.markdown_v2.raw_markdown)` re-raises. Read the markdown payload from `result.markdown` (the current `MarkdownGenerationResult`) and fall back to `markdown_v2` via `getattr` so installations on older crawl4ai builds keep working. Guard the length bookkeeping against missing attributes too. Fixes sentient-agi#34
This was referenced May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Stop reading the legacy
result.markdown_v2attribute inWebScraper.extractand useresult.markdown(the currentMarkdownGenerationResult) instead, with agetattrfallback tomarkdown_v2for oldercrawl4aibuilds.Why
crawl4ai 0.5.xremovedmarkdown_v2and replaced its__getattr__with a hardAttributeError:Two call sites in
context_scraping/crawl4ai_scraper.pywere hitting it:no_extraction/cosinebranch guards withhasattr(result, 'markdown_v2'). On 0.5.x that returnsFalse, socontentsilently staysNoneand the scraper returns an empty payload — exactly what About results of the WebScraper #34 reports (Debug: Processed content: None).len(result.markdown_v2.raw_markdown)is unconditional, so on success the scraper re-raises the sameAttributeErrorand the extraction loop falls into the broadexceptblock.Closes #34
How
src/opendeepsearch/context_scraping/crawl4ai_scraper.py:markdown_obj = getattr(result, 'markdown', None) or getattr(result, 'markdown_v2', None)once per resultraw_markdownfrommarkdown_objforno_extraction/cosinestrategiesraw_markdown_length/citations_markdown_lengthbookkeeping onmarkdown_obj is not Noneand usegetattr(..., '', '') or ''for the individual fields, so a partially-populatedMarkdownGenerationResultdoes not crash the bookkeeping eitherThe
getattrfallback tomarkdown_v2keeps the path working for anyone still on a pre-0.5 crawl4ai build (the project's pinnedcrawl4ai @ git+...salzubi401/crawl4ai.git@mainis left untouched on purpose).Testing
python3 -m py_compile src/opendeepsearch/context_scraping/crawl4ai_scraper.py— cleangrep -n markdown_v2 src/opendeepsearch/context_scraping/crawl4ai_scraper.py— only the intentional fallback line remainscrawl4ai 0.5.xinstalled andWebScraper(debug=True).scrape(<any URL>), the unpatched code logsDebug: Processed content: Noneand surfacesAttributeError: 'markdown_v2' attribute is deprecated. With this patch,result.markdown.raw_markdownis read instead andcontentis populated.No new unit tests added —
WebScraperinstantiatesAsyncWebCrawlerand reaches the network, so a meaningful test would need either VCR fixtures or a fakeresultdouble. Happy to add a small unit test for_pick_markdown_objectif you'd prefer to extract a helper.