Coverage Lens
Project Overview
Coverage Lens compares how UK news outlets frame the same story. It ingests RSS feeds from The Guardian, BBC News, and Sky News, clusters headlines that describe the same event, and presents a side-by-side dashboard showing how each outlet emphasises the story differently.
Key Features
- Semantic matching — Hugging Face embeddings (with a local fallback) group equivalent headlines across outlets
- Framing analysis — Shared terms, tone, and headline differences per outlet
- 10-story archive — FIFO backfill of recent triple-outlet matches
- Image proxy — Lambda proxy serves outlet photos with correct referers and BBC URL repair
- Serverless AWS — Scheduled ingest, DynamoDB snapshot, API Gateway, CloudFront static site
Architecture
Ingest pipeline (every 4 hours by default): fetch and dedupe RSS items, enrich top stories with excerpts and images, embed headlines and cluster by cosine similarity, curate triple-outlet matches with framing analysis, then merge into a rolling archive (max 10 stories) and write a single LIVE item to DynamoDB.
Dashboard API: GET /live returns the full comparison payload (archive, gaps, coverage matrix). GET /image?url=…&ref=… proxies outlet images for the browser.
The static dashboard lives in web/ (no build step) and is deployed to S3/CloudFront. Backend logic is in src/lambdas/ and src/lib/, with infrastructure defined in template.yaml via AWS SAM.
Repository
Synced from NewsComparitor — click folder names or ▶ arrows to expand. Use Expand all / Collapse all in the toolbar.