Skip to content

EPIC-025: Multi-Source Profile Enrichment (Website, Facebook, Instagram, YouTube, Linktree) #113

@MAGIKBIT

Description

@MAGIKBIT

EPIC-025 — Multi-Source Profile Enrichment

Goal

Extend the existing EPIC-024 enrichment system to support profile enrichment from 5 additional sources: Website URL, Facebook Page, Instagram Profile, YouTube Channel, and Linktree — using the pluggable EnrichmentProviderInterface pattern already in place.

Background

EPIC-024 delivered the core enrichment framework with Apollo (company/person), Hunter/Clearbit (logo), and Demo providers. This epic adds source-based providers that extract profile data from public URLs rather than API lookups by email/domain.

New Source Providers

# Source Provider Class API / Method Fields Extracted
1 Website URL WebsiteProvider Crawler (meta tags, OG, JSON-LD, schema.org) name, description, emails, phones, address, social links, logo, favicon
2 Facebook Page FacebookProvider Facebook Graph API (preferred) / OG fallback page name, bio, website, phone, address, profile pic, cover photo, category
3 Instagram Profile InstagramProvider Instagram Basic Display API / OG fallback username, full name, bio, profile pic, website link, follower count
4 YouTube Channel YouTubeProvider YouTube Data API v3 channel name, description, custom URL, subscriber count, avatar, banner
5 Linktree LinktreeProvider Crawler (structured HTML/JSON) display name, bio, avatar, all link entries (social, website, payment)

Architecture

Extends existing EnrichmentProviderInterface with a new method:

public function enrichFromUrl(string $url): EnrichmentResult;
public function canHandleUrl(string $url): bool;

The EnrichmentService gains a enrichByUrl(string $url) method that auto-detects the URL type and dispatches to the correct provider.

Stories

# Story GitHub Issue MAGIK Priority
1 Provider Interface Extension — enrichFromUrl() + URL detection #TBD MAGIK-934 P0
2 Website URL Provider — Meta/OG/JSON-LD Extraction #TBD MAGIK-935 P0
3 Facebook Page Provider — Graph API + OG Fallback #TBD MAGIK-936 P1
4 Instagram Profile Provider — Basic Display API + OG Fallback #TBD MAGIK-937 P1
5 YouTube Channel Provider — Data API v3 #TBD MAGIK-938 P1
6 Linktree Provider — Structured HTML/JSON Crawler #TBD MAGIK-939 P1
7 Field Normalization & Deduplication Service #TBD MAGIK-940 P0
8 DB Schema — Enrichment Sources + Evidence Table #TBD MAGIK-941 P0
9 Review UI — Current vs Suggested, Accept/Reject per Field #TBD MAGIK-942 P0
10 Media Integration — S3-Ready Logo/Cover Image Storage #TBD MAGIK-943 P1
11 Rate Limiting & Consent Controls #TBD MAGIK-944 P1
12 Multi-Tenant Scoping — Parent/Child Feature Toggles #TBD MAGIK-945 P1
13 Demo Provider Extension — Fake URL-Based Data #TBD MAGIK-946 P2
14 Integration Testing — Multi-Source End-to-End #TBD MAGIK-947 P2

Technical Approach

  1. Extend EnrichmentProviderInterface — add enrichFromUrl() + canHandleUrl() methods; update existing providers with no-op stubs
  2. Strategy pattern — URL → provider routing via canHandleUrl() chain in EnrichmentService
  3. Confidence + Evidence — each extracted field carries confidence score (0.0–1.0) and source evidence (snippet + URL)
  4. Normalization — E.164 phones via libphonenumber, address component parsing, unique social link dedup
  5. Draft-only storage — results stored in enrichment_drafts (extended schema); never auto-applied
  6. Review UI — side-by-side current vs suggested, per-field accept/reject, batch apply with audit log
  7. Rate limiting — per-provider, per-tenant rate limits via enrichment_rate_limits table
  8. Consent — checkbox required before enrichment; consent stored in audit log

Scope

  • CI4 Portal (app.portalv2)
  • All user roles (Admin, Reseller, Org, Employee, Individual) with appropriate access gates
  • Extends existing EPIC-024 infrastructure (same tables, same controller, same service)

Acceptance Criteria (Epic-Level)

  • User can paste a Website URL and see extracted profile fields as draft
  • User can paste a Facebook Page URL and see extracted profile fields
  • User can paste an Instagram profile URL and see extracted profile fields
  • User can paste a YouTube channel URL and see extracted profile fields
  • User can paste a Linktree URL and see all extracted links + profile data
  • All extracted fields show confidence scores and source evidence
  • Review UI shows current vs suggested side-by-side for each field
  • User can accept/reject individual fields before applying
  • Applied changes are audit-logged with source attribution
  • Approved images (logo/cover) auto-save to Media tab (S3-ready)
  • Rate limiting prevents API abuse per provider per tenant
  • Consent checkbox required before any enrichment action
  • Multi-tenant scoping: parent can enable/disable for children
  • Demo mode works for all 5 new sources without API keys
  • All providers are swappable (API vs crawler) without app logic changes

Security & Compliance

  • No automatic profile overwrite — all results are DRAFT
  • Consent required before enrichment (GDPR/CCPA alignment)
  • Rate limiting prevents abuse and cost overrun
  • Crawler providers document robots.txt compliance
  • API keys stored in .env, never in code
  • PII fields encrypted at rest in draft table

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions