Skip to content

feat(word): add comment range highlighting with hover tooltip in HTML preview#77

Open
xiaopenyoua wants to merge 704 commits into
iOfficeAI:mainfrom
xiaopenyoua:feat/word-html-preview-comment-highlight
Open

feat(word): add comment range highlighting with hover tooltip in HTML preview#77
xiaopenyoua wants to merge 704 commits into
iOfficeAI:mainfrom
xiaopenyoua:feat/word-html-preview-comment-highlight

Conversation

@xiaopenyoua

Copy link
Copy Markdown

Summary

  • 在 Word HTML 预览中为批注(CommentRangeStart/End)添加高亮样式
  • 悬停时以 tooltip 展示批注内容(作者、日期、initials、正文)

Verification

构建

dotnet build src/officecli/officecli.csproj --no-restore

生成含批注的 Word 文档并预览 HTML

officecli blank test.docx

在 Word 中添加一条批注后保存,再用以下命令预览

officecli watch test.docx

鼠标悬停在黄色高亮文字上可看到 tooltip

goworm and others added 30 commits April 16, 2026 00:11
…only keys

R12-2 (fuzzer, MEDIUM): sheet-level sort dispatch early-returned when
rows.Count == 0, so `sort=XFE asc` / `sort=AAAA asc` on an empty sheet
silently returned "Updated" instead of rejecting the invalid column.
Move the empty-sheet no-op inside SortRangeRows so column validation
runs first, and tighten the XFD-overflow check to fire on any length
(was >= 4), catching 3-letter overflows like XFE/ZZZ.

R12-3 (fuzzer, LOW): `sort=asc` (column letter forgotten) produced a
misleading "Sort column ASC is outside the range A:B". Reject ASC/DESC
as column tokens up-front with a targeted "direction keyword, not a
column letter" error.
2-series structure (clusteredColumn + paretoLine overlay) matching MSO's
cx:chart format. PreparePareto pre-sorts descending; secondary percentage
axis (0-100%) for the cumulative line. DetectExtendedChartType handles
both OfficeCli- and MSO-authored forms.

Bump version to 1.0.48.
Add mergefield as a first-class field type. Usage:
  officecli add doc.docx "/body/p[1]" --type mergefield --prop fieldName=CustomerName

Placeholder text defaults to «fieldName» format (e.g. «CustomerName»).

https://claude.ai/code/session_013XdLypgxPSbNA428pzDXB3
- REF: cross-reference bookmark text (--prop bookmarkName, hyperlink)
- PAGEREF: cross-reference bookmark page number
- SEQ: auto-numbering sequences (--prop identifier=Figure/Table)
- IF: conditional field (--prop expression, trueText, falseText)

https://claude.ai/code/session_013XdLypgxPSbNA428pzDXB3
Zero-param: SECTIONPAGES, SECTION, CREATEDATE, SAVEDATE, PRINTDATE,
EDITTIME, LASTSAVEDBY, NUMWORDS, NUMCHARS, REVNUM, TEMPLATE,
COMMENTS, KEYWORDS

Parameterized: NOTEREF (bookmarkName), STYLEREF (styleName),
DOCPROPERTY (propertyName)

https://claude.ai/code/session_013XdLypgxPSbNA428pzDXB3
…ontextualSpacing

- FontMetricsReader: include hhea lineGap in ratio for accurate line height
- @font-face: add ascent-override/descent-override/line-gap-override
- Heading line-height uses font metrics ratio instead of "normal"
- Paragraph spacing collapse: subtract prev spaceAfter from spaceBefore
- contextualSpacing: suppress spacing between same-style adjacent paragraphs
- docGrid type=lines: snap line-height to linePitch multiples
- Support contextualSpacing property in set handler (paragraph + style)
…age width

Tables with no explicit <w:tblW> were rendered as width:100%, filling the
full page even when the <w:tblGrid> specified narrower column widths.
Native Word auto-fits such tables to content — compute width from
gridCol sum instead. Use max-width for auto layout (allows shrink),
width for fixed layout. Also handles tblW type=pct (percentage).
The 'Created: ... (resident started)' message now suggests running
officecli close when done, so agents/users can release the file lock
immediately instead of waiting 60s idle timeout.
…etter-spacing, effect props

Render-comparison testing against native Word found several run-level
properties silently dropped or collapsed in HTML preview:

- Double strikethrough rendered identical to single (both as
  text-decoration:line-through). Now adds text-decoration-style:double.
- Underline style variants (double/wave/dotted/dash/thick/*Heavy) all
  collapsed to plain single underline. Mapped each to CSS
  text-decoration-style and text-decoration-thickness.
- w:spacing (character spacing) was ignored. Emit letter-spacing in pt.
- Paragraph-add shortcut silently dropped outline/shadow/emboss/imprint/
  vanish/rtl/noproof — only the run-add path honored them. Mirrored
  the 7 missing handlers in the paragraph branch.
- MergeRunProperties never merged Spacing or the 6 effect props, so
  even when written to XML they were dropped during effective-props
  resolution and never reached the HTML renderer.
…collapse

w:tab chars previously all rendered as a single em-space regardless of
paragraph tab stops, making 'Left\tCenter\tRight' visually collapse to
three adjacent words. Now:

- Track per-paragraph tab index in render context
- For each tab, look up the Nth declared tab stop and emit an
  inline-block span with width equal to the distance from the previous
  stop position
- Honor dot/hyphen/underscore leaders on positional stops via CSS
  border-bottom patterns
- Fallback to 36pt (0.5in) when no stops are defined

TOC-style right-aligned dot-leader tabs still flow through the existing
dot-leader class path.
Section <w:cols w:num="N"/> was previously ignored in the HTML preview —
all content rendered single-column regardless of the declared column
count. Now emit CSS on .page-body:

- column-count:N for num > 1
- column-rule:1px solid for w:sep="true"
- column-gap:Xpt from w:space (twips → pt)

Line-numbering (w:lnNumType) still TODO — requires per-line markers.
…oWrap

Render-comparison testing found several cell/revision rendering gaps:

- Tracked insertions (<w:ins>) previously rendered as plain text, losing
  the author annotation. Now wrap in a .track-ins span with underline +
  green color, with the author name in a tooltip.
- Tracked deletions (<w:del>) were dropped entirely, leaving the
  reviewer unable to see what was removed. Now render the deleted text
  inside a .track-del span with strikethrough + red color.
- Cell <w:textDirection> btLr/tbRl was ignored — text stayed horizontal
  where Word rotates 90°. Emit CSS writing-mode:vertical-rl; btLr adds
  a 180° rotation to flip the reading direction.
- Cell <w:noWrap/> was dropped — now emits white-space:nowrap so cell
  content doesn't wrap.
Two more render gaps caught by comparison testing:

- <w:fldChar><w:ffData><w:ffCheckBox> form field checkboxes were
  dropped entirely in the preview. Now emit ☑ (checked) or ☐ (unchecked)
  based on w:default or w:checked state, matching Word's native glyph
  in read-only previews.
- <w:w val="N"/> character horizontal scale (narrower/wider glyph
  rendering) was ignored. Emit CSS transform:scaleX(N/100) with
  display:inline-block so the scaled width is actually reserved.
- MergeRunProperties also merges CharacterScale now, matching the
  pattern already used for Spacing, so style-inherited scale reaches
  the renderer.

Deferred (complex, need dedicated work): numFmt variants beyond
decimal/lowerLetter/lowerRoman; header/footer titlePg+evenOdd;
right-aligned tab with non-dot leader; contextualSpacing boundary.
Round 12 comparison found four picture-level visual effects that were
silently dropped in the HTML preview:

- a:xfrm rot (rotation in 60000ths of a degree) — now emits CSS
  transform:rotate(Xdeg) on the <img>
- a:xfrm flipH/flipV — now emits transform:scaleX(-1) / scaleY(-1),
  combined with rotate when both present
- a:ln (picture outline) — now emits CSS border with width converted
  from EMU to px and srgbClr mapped to a hex color
- a:effectLst a:outerShdw — now emits box-shadow with offset/blur
  computed from dir (degrees) and dist/blurRad (EMU)

Existing crop (a:srcRect) handling is preserved and effects are
composed through both the cropped and uncropped image render paths.
…geometry, gradient fill

Round 13 comparison found five shape rendering gaps:

- a:xfrm rot on standalone shapes was only applied when the shape lived
  inside a wpg:wgp group; inline shapes rendered upright regardless.
  Rotation now applies in both code paths.
- wps:bodyPr anchor=ctr/b vertical text alignment only worked for group
  members; standalone shapes ignored it. Now applied in both paths.
- prstGeom prst=ellipse/oval rendered as a solid rectangle. Emit
  border-radius:50% so the shape reads as an oval; prst=roundRect gets
  a 12px radius approximation.
- a:gradFill (solid gradient) was dropped — shape appeared with no
  background. Now emit CSS linear-gradient from gsLst stops (pos in
  1/1000-percent) with angle converted from OOXML 60000ths to CSS deg.

Deferred: exotic prstGeom (line, arrow, callout) need SVG authoring,
documented in KNOWN_ISSUES.md as a future pass.
Round 15 comparison found that w:tab leader="middleDot" fell through
to no leader fill. Native Word renders middleDot as evenly-spaced
centered dots between tab stops; the closest CSS approximation is a
2px dotted border which browsers render as a coarser dot pattern
visually distinct from the 1px "dot" leader.

Drop cap float works in CSS (see XML output) but is blocked by
.page-body flex-column layout; logged in KNOWN_ISSUES #7c for a
follow-up refactor.
…bidi

Round 16 surfaced four i18n rendering gaps:

- w:em (dot/comma/circle/underDot) — now emits CSS text-emphasis-style
  with correct position (over for dot/comma/circle, under for underDot)
  and webkit prefix for broader browser support. Previously silently
  dropped.
- w:ruby (furigana) — now emits <ruby>base<rt>annotation</rt></ruby>.
  Previously the whole ruby run was dropped, leaving only surrounding
  labels.
- w:bidi at paragraph level — now emits direction:rtl. Previously the
  paragraph ignored the hint and relied on content-level detection.
- w:rtl at run level — changed unicode-bidi from bidi-override to
  embed. Override disables Unicode BiDi shaping; for Arabic, that
  reversed characters within a word and broke contextual ligatures.
  embed preserves algorithmic shaping while still flowing RTL.
- MergeRunProperties now merges Emphasis so style-inherited em isn't
  dropped during effective-property resolution.

Deferred (iOfficeAI#5 in KNOWN_ISSUES): per-script font chain from rFonts
ascii/hAnsi/eastAsia/cs — needs per-run glyph range detection.
…e NUMPAGES

Round 17 comparison surfaced header/footer rendering gaps:

- HeaderPart/FooterPart content only iterated <w:p> children, silently
  dropping <w:tbl> — layout tables commonly used for 3-column
  headers/footers rendered empty.
- Paragraphs were filtered if they had no text, losing image-only
  paragraphs (logos, watermarks). Replaced the filter with a check that
  considers tables, drawings, and field characters as content.
- Footer NUMPAGES field was substituted with the cached "1" instead
  of the actual rendered page count. Added a second placeholder
  (<!--NUM_PAGES-->) that gets replaced with pageList.Count per page.

Deferred (logged in KNOWN_ISSUES iOfficeAI#17+): VML watermark rendering
(v:pict/v:textpath), chart legends/data labels — chart SVG renderer
emits geometry but not metadata overlays.
Round 19 comparison found that paragraph alignment w:jc="distribute"
rendered as plain text-align:justify, leaving the last/only line
unstretched. Native Word spreads every line (including single-line
paragraphs) to full width with inter-character spacing.

Pair text-align:justify with text-align-last:justify +
text-justify:inter-character so the last line also stretches. w:jc=
"both" retains the plain-justify behavior (last line flows normally).
…(1/8 pt)

Round 20 comparison caught asymmetric cell borders rendering with
compressed widths. Root cause: OOXML border sz attribute is in 1/8 of
a point (8 = 1pt, 24 = 3pt, etc.), but the renderer was dividing by 8
and emitting the result as px. At default 96 DPI that under-rendered
3pt borders as 3px ≈ 2.25pt — visually thin and inconsistent with
Word's native rendering.

Switch the output unit to pt so declared 1pt / 2pt / 3pt / 4pt borders
render at their intended sizes. The double-border minimum threshold
was also updated to the pt-equivalent (2.25pt / ≈3px) so double-line
style still renders two visible strokes.
… theme part is missing

Round 21 comparison caught all w:themeColor references resolving to no
color (runs rendered black). Root cause: blank documents created via
BlankDocCreator have no <a:theme> part, and GetThemeColors returned an
empty dictionary. Word itself falls back to the built-in Office
palette for missing themes; the preview now does too.

- Added OfficeDefaultThemeColors dictionary with accent1-6, dark1/2,
  light1/2, hyperlink, followedHyperlink and their aliases
  (dk1/dk2/lt1/lt2/tx1/tx2/text1/text2/background1/background2).
- GetThemeColors fills in any missing standard names after the theme
  part is consulted, so explicit themes override but unset slots still
  resolve.
- Run color emit path refactored to call ResolveRunColor for
  consistency with conditional-format and border color paths — single
  source of truth for themeColor + themeTint/Shade resolution.

Fixes themeColor on text, table shading (via existing
ResolveShadingFill path), and borders (via existing RenderBorderCss)
in one shot since all three consult GetThemeColors.
Round 22 comparison found runs with <w:vanish/> inherited from a
character style rendered as visible text (commonly gray) in the HTML
preview. Native Word omits vanished content from the default view.
Short-circuit RenderRunHtml when the effective run properties carry
vanish or specVanish so hidden text doesn't leak into the preview.
…tings

Round 24 comparison caught two settings.xml inputs the HTML preview
ignored:

- w:defaultTabStop set to e.g. 360 twips (0.25in) was overridden by a
  hardcoded 36pt (0.5in) fallback, so tab columns in documents tuned
  for tighter grids came out twice as wide as Word rendered them.
  Now read the setting when no paragraph/style tab stops apply.
- w:autoHyphenation was silently dropped. Documents with long words
  wrapped to the next line without hyphenation, producing a ragged
  right edge that diverged from Word's justified/hyphenated output.
  Emit CSS hyphens:auto + -webkit-hyphens:auto on .page-body so the
  browser uses its language-specific hyphenation dictionaries.
…ark strings aren't lost

Full VML geometry rendering is deferred (KNOWN_ISSUES #7e), but text
stored inside <v:pict> — WordArt via v:textpath@string and classic
watermarks via v:textbox/w:txbxContent — silently disappeared from the
preview, taking document information with them.

Emit any extracted text inside a <span class="vml-fallback">
italic-gray placeholder so the reader still sees "DRAFT",
"WordArt Sample", etc. Proper geometry rendering (rect/oval/line
fill/stroke, rotation, absolute positioning) remains deferred.
Round 27 comparison caught SVG images rendering as blank slots in the
HTML preview. Office 2019+ stores vector images as a PNG fallback in
<a:blip r:embed> plus the actual SVG in an a:extLst extension
(asvg:svgBlip r:embed). Many authoring tools emit a 1×1 transparent
PNG as the fallback, so the preview showed nothing even though the
document had a valid SVG.

When the blip contains an asvg:svgBlip child, use its rel id to locate
the SVG part instead of the fallback PNG. Embedded as image/svg+xml
data URI, the SVG renders in all modern browsers identical to Word.
Round 29 comparison found firstCol tblStylePr applied even when the
table had <w:tblLook w:firstColumn="0"/>. Root cause: ParseTableLook
used the legacy val hex bitmask whenever it was present, bypassing
individual attrs entirely. Per ECMA-376 §17.7.6.7 individual attrs
supersede val.

When any of firstRow/lastRow/firstColumn/lastColumn/noHBand/noVBand
attrs are authored on <w:tblLook>, use them exclusively (so an attr
with value="0" turns the bit OFF even if val would set it). Fall
back to val only when no individual attrs are present.
goworm and others added 26 commits April 23, 2026 17:36
Bugs fixed:
- AddRun / AddPicture / AddOle / AddHyperlink / AddComment / AddBookmark /
  AddBreak / AddField / AddSdt: insert-by-index was off by one on paragraphs
  with pPr (any paragraph with alignment/style/indent). All paragraph-child
  inserts now route through a new pPr-aware InsertIntoParagraph helper so
  new elements never precede pPr.
- AddBreak / AddField / AddSdt / AddFormField / AddChart: when parent was
  /body the handlers ignored --index / --after / --before entirely. Now
  honored via new InsertAtIndexOrAppend helper that also preserves the
  sectPr-last-child invariant.
- AddFormField: the wrapper paragraph was appended with raw AppendChild,
  landing after sectPr. Fixed.
- AddSection / AddToc / AddFootnote / AddEndnote: ignored --index. Now
  honored. AddToc: title is positioned adjacent to the TOC paragraph
  instead of being appended at end-of-body when --index is set.
- AddAtFindPosition: the inline variant passed a run-count index to the
  re-entered Add(), which downstream handlers consumed as ChildElements
  index, causing --before find:<text> to insert before pPr (schema
  invalid). Converted to ChildElements index before re-entry.
- CopyFrom (--from clone): silently ignored --after / --before and bypassed
  parent/child validation (allowing body-in-body, p-in-p, styles-with-
  non-style etc.). Now resolves anchors via ResolveAnchorPosition, rejects
  self/ancestor clones, routes through ValidateParentChild, handles find:
  anchors, and uses the pPr-aware helper when target is a paragraph.

Validation / error surfacing:
- New ValidateParentChild gate rejects schema-invalid parent/child combos:
  paragraph-in-paragraph, table-in-paragraph, anything under /body/sectPr,
  non-row children under w:tbl, non-cell children under w:tr, non-style
  children under /styles, and direct children (other than sdtPr/sdtContent)
  under w:sdt / w:sdtRun.
- ParsePath now rejects multi-predicate (p[1][2]), empty predicate (p[]),
  trailing junk after ], trailing slash, and empty path segments.
- --index is rejected early for negative values; leading/trailing
  whitespace on --index is rejected (was silently trimmed).
- Empty find:"" pattern now errors cleanly instead of leaking an
  Array-dimensions-exceeded .NET exception.
- ArgumentOutOfRangeException and other raw internal exceptions are
  wrapped into clean ArgumentException messages.

AddSdt result-path: when parent is under /header[N] or /footer[N], the
returned path now stays under that root instead of being rewritten as
/body/..., so downstream --after chains keep resolving.
- ValidateParentChild now rejects schema-invalid clones via --from:
  - run/hyperlink as direct body children (must live inside a paragraph)
  - style outside /styles
  - cell inside cell / run inside cell (TableCell accepts only block-level)
  - raw <w:sectPr> cloned into /body (singleton; kept distinct from
    `--type section` which creates a paragraph-level section break)
- AddBookmark: body-level inserts now go through InsertAtIndexOrAppend so
  --index / --after / --before are honored and the sectPr-last-child
  invariant is preserved. Return path emits /bookmark[@name=…] which is
  actually resolvable by NavigateToElement (bookmarkStart has no local
  name `bookmark`).
- Navigation: /footnote[@footnoteId=N] and /endnote[@endnoteId=N] are now
  accepted as add/get parents, routing to the corresponding footnote/endnote
  body element. Also added a generic @name= matcher for bookmarkStart so
  the new bookmark return path resolves.
- ParsePath predicate parsing tightened: only positive integer `[N]`,
  `last()`, or `@ident=bare-or-double-quoted-value` are accepted.
  Previously `/body/p[XYZ]`, `/body/p[@=X]`, `/body/p[@paraid]`,
  `/body/p[@w:paraId="X"]` silently resolved to the first element.
- CopyFrom: cloned paragraphs now regenerate bookmarkStart/End ids and
  names so duplicated bookmarks in a clone don't collide with the
  original or each other. MapLocalNameToAddType keeps "sectpr" distinct
  from "section" so the two ValidateParentChild rules don't conflate.
- SDT validation hint and query bookmark path updated to match the new
  selector grammar.
- ValidateParentChild: reject raw <w:sectPr> cloned directly into a
  paragraph via --from. (--type section still creates a legitimate pPr-
  wrapped break as before.)
- AddBookmark: reject duplicate bookmark names with a clear error so a
  second add doesn't silently clash with an existing name and leave the
  /bookmark[@name=X] lookup pointing at the wrong bookmark.
- AddBookmark: when parent is /body and --prop text=... is supplied, wrap
  bookmarkStart + new Run + bookmarkEnd inside a fresh <w:p> so no bare
  <w:r> lands as a direct body child (runs must live inside paragraphs).
- NavigateToElement: resolve /footnote[@footnoteId=N], /footnote[N],
  /endnote[@endnoteId=N], /endnote[N] at the root, mirroring the
  AddParentResolver. Paths returned by `add` inside a footnote/endnote
  now round-trip through `get` and work as --after/--before anchors.
- Query: canonicalize emitted paths by stripping the `/document[1]/body[1]`
  prefix to `/body` so tr/tc/sectPr/ins/del/drawing/commentRange* paths
  returned by `query` resolve via `get` and are usable as anchors.
- CopyFrom now rejects cloning <w:footnote>, <w:endnote>, <w:comment>
  as inline content. These elements live in dedicated parts
  (footnotes.xml, endnotes.xml, comments.xml); inserting them under
  <w:p> or <w:body> via --from produced schema-invalid OOXML. Users
  should add via `--type footnote/endnote/comment --prop text=...`
  which creates the reference + part entry correctly.
- CopyFrom now rejects cloning <w:bookmarkStart>/<w:bookmarkEnd> as a
  standalone element. Bookmarks span content via a start/end pair with
  matching @id; cloning just the start (as the virtual
  /bookmark[@name=X] selector resolves to) produced a never-closed
  bookmark in the target. Error redirects users to clone the
  containing paragraph or range.
- CopyFrom now regenerates w:id on <w:ins>/<w:del> revision elements in
  the clone (both self and descendants), mirroring the existing
  paraId and bookmark-id regeneration. Duplicate revision ids after
  clone previously failed semantic validation.
- `add --type ins|del|moveTo|moveFrom` used to fall through to AddDefault,
  which wrote the --prop key=value pairs as unnamespaced attributes, never
  emitted the required w:id/w:author/w:date, and silently destroyed the
  paragraph's existing runs when --index was omitted. Tracked-change
  authoring is outside the add command's scope; reject with a clear error
  pointing users at the normal inline add flow (mirrors the footnote /
  endnote / comment rejections already in place).
- `AddParagraph` now accepts `ilvl` as an alias for `numlevel` when
  building <w:numPr>, so `add --type paragraph --prop numId=1 --prop ilvl=2`
  produces the <w:ilvl> child the user expects. Previously ilvl was silently
  dropped even though `set --prop ilvl=N` on the same paragraph worked.
…after sectPr

- AddParagraph: hoisted <w:ilvl> handling out of the numId branch so
  --prop ilvl=N alone emits <w:numPr><w:ilvl/></w:numPr> consistently
  with `set --prop ilvl=N`. Added range checks: numId must be >= 0,
  ilvl must be in [0,8]; out-of-range values now throw instead of
  silently producing schema-invalid OOXML.
- NavigateToElement: top-level /section[N] is now a resolvable anchor;
  it maps to the Nth paragraph in /body whose <w:pPr> carries a
  <w:sectPr>. Previously `add --type section` returned a /section[N]
  path that subsequent --after/--before could not resolve.
- ResolveAnchorPosition: reject --after <body-level sectPr> with a
  clear error. Body-level <w:sectPr> must remain the last child of
  <w:body>, so "after sectPr" has no valid placement; silently
  substituting --before semantics was confusing. Paragraph-level
  sectPr (inside w:pPr) is unaffected.
- `--after find:<substring>` / `--before find:<substring>` for block types
  (paragraph, table, section, toc, ...) no longer splits the matched
  paragraph. The previous behavior inherited AddInlineAtSplitPoint's
  character-offset splitting and produced two paragraph fragments with
  the new block wedged between them — schema-valid but semantically
  destructive. For non-inline types we now resolve to the containing
  paragraph and insert the new block as a sibling. Inline types
  (run/pagebreak/bookmark/field/inline sdt) keep splitting, which is
  still the correct semantics.
- NavigateToElement now resolves top-level `/formfield[N]` to the
  paragraph containing the Nth form field's begin-run. Previously
  `add --type formfield` returned `/formfield[N]` but that path only
  worked for `get` (special-case) — `--after /formfield[N]` failed
  "Anchor element not found". Same class as the earlier `/section[N]`
  fix.
Extends the round 8/9 fix pattern: `add` emits these synthetic paths as
the new element's identity, but `--after`/`--before` previously rejected
them because NavigateToElement had no routing for these roots.

- /chart[N] resolves to the body paragraph containing the Nth w:drawing
  chart, mirroring GetAllWordCharts' document-order walk.
- /toc[N] resolves to the Nth body paragraph carrying a TOC field,
  mirroring AddToc's counting.
- /watermark is a positional no-op in ResolveAnchorPosition (watermarks
  live in header parts, no body sibling exists). --after /watermark
  appends, --before /watermark prepends, so the round-trip stays usable.
…watermark-absent anchor

- AddChart (both standard + extended-chart branches) now emits the return
  path using the same document-order traversal the resolver uses. The old
  insertion-counter approach produced a /chart[N] that the resolver could
  not map back after --before/--after inserted anywhere but the end.
- AddSection return path now mirrors the NavigateToElement /section[N]
  walker, computing the new section break's document-order position
  rather than counting insertion events. Fixes --before /section[1]
  reporting /section[3] when the new section is actually /section[1].
- AddToc rejects header/footer parents with a clear error (TOC field code
  references body-level headings and is not meaningful in a header/footer
  part). Prevents the previous /toc[0] return-path contract violation.
- ResolveAnchorPosition /watermark handler now errors "Anchor element not
  found: /watermark" when the document has no watermark. The round 10
  no-op behavior was meant for docs that do have one; silently appending
  when none exists was a contract violation.
…of-range

- AddToc return path now mirrors the round 11 AddChart/AddSection fix:
  computes the new tocPara's position in the doc-order TOC list via
  FindIndex(ReferenceEquals) instead of a total count. --before /toc[1]
  now correctly returns /toc[1] rather than /toc[last].
- ResolveAnchorPosition /watermark handler now captures the optional
  index from /watermark[N]. If the index is less than 1 or greater than
  the watermark count (0 or 1 in practice), throw "Anchor element not
  found" to mirror /chart[99] behavior. Bare /watermark keeps its
  positional-hint no-op when a watermark exists; errors cleanly when
  absent.
…l count

Extends the round 11/12 sweep that corrected AddChart/AddSection/AddToc
return-path numbering. Two handlers were missed:

- AddTable used parent.Elements<Table>().Count(), so --before /body/tbl[1]
  returned /body/tbl[3] even though the resolver places the new table at
  /body/tbl[1] in document order.
- AddHyperlink used the same pattern against hlPara.Elements<Hyperlink>().

Both now compute the new element's FindIndex(ReferenceEquals) in its
parent's child collection, matching what NavigateToElement reports.
Two linked fixes for display-mode equation adds:

- Add.Text.cs: AddEquation display-mode previously computed the return
  path as /body/oMathPara[total-count], which (a) didn't match the
  doc-order resolver used by NavigateToElement and (b) pointed at the
  wrong element after --before / --after inserts. Now counts the
  insertTarget's direct children in the same sequence the resolver
  walks (bare M.Paragraph + IsOMathParaWrapperParagraph wrappers),
  stopping at the newly-inserted wrapper paragraph.
- Navigation.cs ResolveAnchorIndex: /body/oMathPara[N] resolves to the
  inner M.Paragraph, which isn't a direct body child — its wrapper w:p
  is. Added retargeting so that when the resolved anchor's parent is a
  pure oMathPara-wrapper paragraph listed as a sibling under the
  resolution parent, the anchor is hoisted to the wrapper for IndexOf
  lookup. Restores round-trip: /body/oMathPara[N] is now usable as
  --after / --before anchor for follow-up adds.
Cloning a paragraph containing a chart or picture via --from produced a
duplicate <wp:docPr> id, which fails OOXML semantic validation ('id'
should have unique value). The existing id-fixup sweep in CopyFrom
already regenerated paraId, textId, bookmark ids, and revision ids;
extended it to walk Descendants<DW.DocProperties>() on the clone and
reassign Id via the doc's existing docPr sequence.
Bookmarks and legacy form fields accept a --prop name=... value that is
later referenced via selectors like bookmarkStart[@name=X]. When the
name contains '/', '[' or ']', the selector grammar cannot parse it as
a literal and the created element becomes unaddressable — users could
not get, set, or remove their own bookmark after creating it.

AddBookmark and AddFormField now validate the name at input, throwing a
clear error listing the offending characters so callers either escape
them upstream or pick a different name. OOXML itself doesn't mandate
such a restriction, but officecli's own selector grammar has no escape
syntax for these chars, so rejecting at the input boundary is the
pragmatic fix.
…ph listing

- AddBookmark now rejects additional bookmark-name characters
  (whitespace, leading '@', quotes) that the selector predicate parser
  can't handle as bare attribute values. The round-17 fix covered
  /, [, ] but left these other unaddressable-name footguns.
- AddParagraph and AddRun route --prop text=... through a new
  AppendTextWithBreaks helper that tokenizes on \n/\r\n/\r and \t,
  emitting alternating <w:t> + <w:br/> + <w:tab/> children. Literal
  newlines and tabs were previously embedded inside <w:t> verbatim;
  Word and LibreOffice collapse both to a single space on render, so
  the characters silently disappeared in the finished document.
  Also normalizes OpenXml SDK's ' /' self-closing form to '/' in the
  final document.xml for canonical output.
- Navigation's Body lister now enumerates children with the same
  p[N] vs oMathPara[M] bucketing the resolver uses. Previously the
  lister counted the equation wrapper paragraph under /body/p[N]
  while the resolver counted it under /body/oMathPara[M], leaving
  equation paragraphs reported but unaddressable via the emitted
  p[N] path.
- AddFormField name validation extended to match AddBookmark's
  post-R18 rules: rejects whitespace, leading '@' or '\'', embedded
  '"', and duplicate names within the document. Form fields embed a
  BookmarkStart/End pair with the same name, so the weaker earlier
  validation produced unaddressable or duplicate bookmarks.
- CopyFrom rejects <m:oMathPara> and <m:oMath> as clone sources.
  Previously --from /body/oMathPara[N] cloned the bare math element
  into the target, producing schema-invalid OOXML (body cannot
  contain oMathPara directly). Users should clone the containing
  paragraph (/body/p[N]) instead — same pattern as the R6 rejections
  for footnote/endnote/comment/bookmarkStart.
Two invocations of 'add /styles --type style --prop name=DupStyle'
previously both succeeded, leaving two <w:style w:styleId=DupStyle>
entries in styles.xml and triggering the OOXML semantic check
'styleId should have unique value'. Now:

- If --prop id=<explicit> collides with an existing styleId, throw
  an ArgumentException pointing the caller at a unique id/name.
- If only --prop name=<value> was given (id derived implicitly),
  auto-suffix the derived id (DupStyle, DupStyle2, DupStyle3, ...)
  so the styles part stays schema-valid without forcing the caller
  to pre-check.

Parallels the R18 bookmark dup-name rejection and R19 formfield
dup-name rejection; styles had been the remaining handler without
duplicate-id protection.
…ction

Two successive 'add /body --type header --prop kind=default' calls used
to silently produce a sectPr with two <w:headerReference type=default>
entries pointing at different header parts. OOXML allows at most one
reference per type per section (default | first | even).

Before appending a new reference, check the section's existing references
for a matching type and throw ArgumentException pointing the caller at
'remove the existing one first or use --prop type=<first|even>'.
Mirrors the R18/R19/R20 dup-rejection pattern.
…ToString

The round-21 dup-reference rejection interpolated preHeaderType /
preFooterType directly into the error message. These are
HeaderFooterValues values whose default ToString emits
'HeaderFooterValues { }' — unhelpful for users. Added a small
HeaderFooterTypeName helper that maps the three possible values back
to 'default', 'first', or 'even' and pass that through the message
templates.
Generalize ApplySlideBackground to accept SlidePart, SlideLayoutPart, or
SlideMasterPart — all three share the same p:bg/p:bgPr schema. Query for
/slidemaster[N] and /slidelayout[N] now also reports Format["background"],
so Set/Get round-trips across all three container types.

Supports all existing background values (solid, gradient, image, none) on
masters and layouts without new syntax.
Get and Set both called ParsePath before reaching the /formfield[N|name]
regex dispatch. ParsePath's generic predicate validator only accepts
positive-integer / last() / [@attr=value], so the documented
/formfield[name] form (used by formfield-by-name lookup) was rejected
with 'Malformed path segment' before the special-case router could fire.

Move the /formfield[...] branch above ParsePath in both WordHandler.Query
and WordHandler.Set. Remove the now-dead duplicate blocks further down.
…Child

AddPicture and AddOle already have explicit TableCell-parent branches
(Add.Media.cs) that wrap the inline run in a Paragraph before
appending, satisfying the OOXML block-only rule for <w:tc>. But
ValidateParentChild rejected picture/ole under TableCell up front,
making that wrap code unreachable.

Whitelist picture/image/img/ole/oleobject/object/embed in the
TableCell branch so the wrap helpers can actually run.
`add --type table --parent /body/p[N] --after find:X` previously:
1. ValidateParentChild rejected 'block type under paragraph' up front.
2. Even if that was bypassed, AddAtFindPosition's block branch silently
   degraded to 'insert as sibling of the paragraph' (commit e846b16),
   so the table landed at the end of the whole paragraph — ignoring the
   caller's find: position when the anchor sat mid-paragraph.

Neither matched Word's native 'cursor mid-sentence → Insert → Table'
behavior nor the literal semantics of --after find:X.

- ValidateParentChild now takes the InsertPosition and lets block-type
  adds through under a paragraph parent when a find: anchor is present;
  error message points the non-find: case at /body.
- AddAtFindPosition's block branch now:
    * inserts as a sibling when the anchor lands on a paragraph boundary
      (splitPoint == 0 or == total length) — no destructive split;
    * calls the new SplitParagraphAtOffset helper when the anchor is
      mid-paragraph, producing head paragraph + new block + tail
      paragraph, with pPr cloned onto the tail so style/numbering/
      heading are preserved on both halves.

Also reverses the 'do NOT split' comment introduced in e846b16: the
destructive-split concern only applies when the system autonomously
decides to split; when the caller explicitly names a mid-paragraph
anchor, honoring their position is the correct behavior (matching
Word's native Enter-key split on paragraph properties).
Add background.mode (stretch/tile/center), background.alpha (0..100), and
background.scale (1..500) as canonical dot-keys paired with background=image:.
Stretch stays the default so bare background=image:/path behaves identically.

- tile:   <a:tile sx=sy=scale*1000 algn=tl flip=none>
- center: <a:tile sx=sy=100000 algn=ctr>  (LibreOffice NO_REPEAT convention)
- alpha:  <a:alphaModFix amt=alpha*1000> inside <a:blip>

Get round-trips only non-default values: no background.mode for stretch, no
background.alpha for opaque. Works on /slide[N], /slidemaster[N], and
/slidelayout[N]. background.mode/alpha/scale without a paired background key
throws; invalid mode/alpha/scale ranges throw.
PR3 lifts two known limitations of the background feature:

1. background.mode/alpha/scale can now be set without re-supplying
   background=image:<path>. The existing Blip.Embed rel is preserved so
   the image part is neither duplicated nor orphaned. Mutating alpha/mode
   against a solid/gradient background or no background throws a clear
   error directing the user to set background=image:<path> first.

2. Get now accepts /slidemaster[N]/slidelayout[M] in addition to bare
   /slidelayout[N], so Set and Get are symmetric. The nested path is
   returned verbatim in the node's Path field.
… preview

- Track CommentRangeStart/End to open/close highlight spans
- New GetCommentDisplayHtml renders comment author, date, initials as tooltip
- CSS hover rule shows tooltip overlay on highlighted text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@goworm

goworm commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

感谢贡献!这个 feature 方向对,实现也基本到位。合并前有两点需要补一下:

1. 请补一张截图(CONTRIBUTING Rule 2)

这是纯视觉 feature,按 CONTRIBUTING.md Rule 2 要求,feature PR 至少需要一张截图展示效果。请在 HTML 预览中悬停到高亮文字上,截一张 tooltip 显示的图贴到 PR description 里。

2. 跨段落批注会破坏 HTML

`CommentRangeStart/End` 在 OOXML 中允许跨段落(例如批注从第 1 段的某个词开始,到第 3 段的某个词结束)。当前实现的 `commentDepth` 是 `RenderParagraphContentHtml` 方法内的局部变量,跨段落时:

  • 第 1 段遇到 `CommentRangeStart`,开了一个 `<span class="comment-highlight">`,段落结束时没人关
  • 第 3 段遇到 `CommentRangeEnd`,试图关一个本段没开的 ``,最终 DOM 结构破裂

修复思路:把 `commentDepth` 提升为 renderer 级字段(成员变量),并在段落渲染结束时,如果 `commentDepth > 0`,补一个 ``;下一个段落开始渲染时,如果 `commentDepth > 0`,先补一个 `<span class="comment-highlight">`(不带 tooltip,因为 tooltip 只在 start 段落出现一次即可)。

复现用例:在 Word 中选中一段横跨两个段落的文本,右键加批注,保存后用 `officecli watch` 查看 HTML,会看到 DOM 断裂。

其余的性能优化(Dictionary 缓存)、CSS 分层、测试等,按 CONTRIBUTING 约定是合并后由 maintainer 清理的,你不用处理。补完上面两点我就合。

@tqjason

tqjason commented Apr 28, 2026

Copy link
Copy Markdown

@goworm 你举了一个跨 paragraph 批注的例子,我正好有需求要 跨 paragraph 或者 跨 run 进行批注,但是没找到 cli 怎么用才能实现呢。看代码好像不支持啊

@tqjason

tqjason commented Apr 28, 2026

Copy link
Copy Markdown

另外批注的 commentReference 是包裹在一个 run 里面的, get 一个段落时,这个 run 是忽略的,但是给这个段落在批注后插入新 run,返回的 run id 没有跳过 commentReference 所在 run

唉,总之,给docx 打批注真是很复杂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants