Skip to content

[Phase B2] Generalize SourceFetcher beyond note.com (Zenn, Qiita, RSS, HTML) #22

@terisuke

Description

@terisuke

Tracked under ADR 0002 — Phase B. Detail: implementation plan §B2.

Problem

Style derivation is locked to note.com. The owner publishes under multiple identities:

Every persona must be able to derive its style guide from its own platforms.

Scope

New interface in internal/domain/source:

type Fetcher interface {
    FetchProfile(ctx context.Context, ref Ref) (*ProfileSnapshot, error)
    FetchArticle(ctx context.Context, ref Ref) (*ArticleSnapshot, error)
    FetchList(ctx context.Context, ref Ref, limit int) ([]ArticleSnapshot, error)
}

Concrete implementations under internal/infrastructure/source/:

  • note/ — extracted from existing internal/infrastructure/note/fetcher.go. The note.com host check moves into this fetcher only.
  • zenn/ — public articles + /{user}/feed. Robots.txt + 1 req/sec rate limit.
  • qiita/ — public REST API (no auth needed for read-only public posts) + HTML fallback.
  • rss/ — generic RSS reader for Astro/Jekyll/Hugo blogs (cor-jp.com once its RSS is verified).
  • html/ — generic semantic-content extractor (last resort).

Each fetcher has its own User-Agent string and rate-limit policy.

Acceptance criteria

  • Scenario test fetches one article from each of {note, zenn, qiita, rss} and writes tmp/source_fetch/{name}.json.
  • application and handlers packages no longer reference note.com directly; routing decisions live in the persona registry or fetcher dispatcher.
  • Cloudia's Zenn and Qiita articles are fetched in a scenario test; both produce non-empty ArticleSnapshot with code blocks preserved.
  • Each fetcher has unit tests using httptest.Server (no real network).

Out of scope

  • Authenticated fetching (private articles, drafts).
  • Pagination beyond the first page.

Dependencies

  • B1 should land first so that the persona registry can declare which fetcher to use per source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions