feat: render Solr highlights as bold in instrument list#466
feat: render Solr highlights as bold in instrument list#466PouyaMohseni wants to merge 2 commits intodevelopfrom
Conversation
| ] | ||
|
|
||
| # Extract highlight info | ||
| highlights = getattr(solr_response, "highlighting", {}) |
There was a problem hiding this comment.
I personally prefer to rename highlights to highlight_info or highlight_data.
| "hl.fl": "text", | ||
| "hl.simple.pre": "<b>", | ||
| "hl.simple.post": "</b>", | ||
| "hl.snippets": 1000, |
There was a problem hiding this comment.
1000 is too much...50 should be sufficient for most instruments?
There was a problem hiding this comment.
Some instruments like guitar have labels in 200 languages, adding aliases it can increase even more (many of them share a common ar). Although for one language we might not have more than 10 labels, we are not sure if our highlight query hits all of the labels in the selected language.
As an example, 50 would not be enough for a query of ar in guitar as it hits more then 100 times.
There was a problem hiding this comment.
In this case, switching to token-level highlights makes more sense.
| highlight_map = { | ||
| snippet.replace("<b>", "").replace("</b>", ""): snippet | ||
| for snippet in hl_snippets | ||
| } |
There was a problem hiding this comment.
This maps the whole snippets, which can be unreliable. For example, instead of creating a map
hl_snippets = [
"Stradivarius <b>Violin</b>",
"<b>Violin</b> Concerto in D Major"
]
It's better to use
"Violin": "<b>Violin</b>"
There was a problem hiding this comment.
I do not understand why this would be unreliable. Aren't both highlight and instrumentname_set solr outputs? Also, if we want to reduce these hits ("Stradivarius <b>Violin</b>", "<b>Violin</b> Concerto in D Major" ) to (<b>Violin</b>), isn't it better to define a new field in solr storing tokenized?
There was a problem hiding this comment.
Yes, both values come from Solr. My concern is that full highlight snippets are context-dependent and may include surrounding text, which makes them brittle as mapping keys. Mapping at the matched-term level (e.g. "Violin" → "Violin") is more robust and less coupled to Solr’s snippet construction. And yes, tokenizers would be a better solution.
| highlight_map = { | ||
| snippet.replace("<b>", "").replace("</b>", ""): snippet | ||
| for snippet in hl_snippets | ||
| } |
There was a problem hiding this comment.
Yes, both values come from Solr. My concern is that full highlight snippets are context-dependent and may include surrounding text, which makes them brittle as mapping keys. Mapping at the matched-term level (e.g. "Violin" → "Violin") is more robust and less coupled to Solr’s snippet construction. And yes, tokenizers would be a better solution.
| "hl.fl": "text", | ||
| "hl.simple.pre": "<b>", | ||
| "hl.simple.post": "</b>", | ||
| "hl.snippets": 1000, |
There was a problem hiding this comment.
In this case, switching to token-level highlights makes more sense.
| for name in self._names: | ||
| for orig, highlighted in highlight_dict.items(): | ||
| if orig and orig.lower() in name.lower(): | ||
| name = re.sub( | ||
| re.escape(orig), highlighted, name, flags=re.IGNORECASE | ||
| ) | ||
| updated_names.append(name) |
There was a problem hiding this comment.
Also realize this is O(nm)...not great. And this doesn't highlight harp in harpsichord
There was a problem hiding this comment.
I noticed it would be a bad idea to create a different tokenization for highlighted field in Solr indexing as this field would need to replicate our query structure to able to highlight as intended i(e.g. containing Ngram). Maybe it is better to keep the highlighted field as our query field but apply deduplication/tokenization as post-processing for a more robust highlighting.
bbe332f to
f6db6fe
Compare
- Added `apply_highlights` method to `InstrumentNameSet` to apply Solr highlight snippets to each instrument name. - Replaced matching parts of instrument names with highlighted `<b>` text from Solr. - Updated `SolrInstrument` to store highlights and pass them to `InstrumentNameSet`. resolves #343
f6db6fe to
fde3219
Compare
apply_highlightsmethod toInstrumentNameSetto apply Solr highlight snippets to each instrument name.<b>text from Solr.SolrInstrumentto store highlights and pass them toInstrumentNameSet.resolves #343