Skip to content

flaviosv/video-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Summarizer

A containerized, microservices-based video processing pipeline that extracts audio from videos and transcribes them using AI. Services are orchestrated via Docker Compose with N8N handling workflow automation.

Architecture

Google Drive (video)
    → N8N: write to /tmp
    → Converter (FFmpeg) → .mp3 → /tmp
    → Transcript (Whisper) → JSON segments
    → N8N: AI Agent (Groq) → Markdown summary
    → Google Drive (Markdown file)

N8N Workflow

The pipeline runs as an N8N workflow with the following steps:

# Step Description
1 Download file Downloads the source video from Google Drive
2 Write to disk Saves the binary to /tmp for shared volume access
3 Convert video to mp3 Calls the Converter service to extract audio
4 Transcribe video Calls the Transcript service to produce a text transcript
5 Edit Fields Maps and reshapes the transcript response fields
6 AI Agent (Groq) Sends the transcript to a Groq Chat Model to generate a structured summary
7 Create Markdown in Drive Writes the AI-generated summary as a Markdown file to Google Drive

Services

Service Port Technology Role
N8N 5678 N8N Workflow orchestration
Converter 5001 Go 1.25, Gin, FFmpeg Extracts mono MP3 audio from video
Transcript 5002 Python 3.11, FastAPI, faster-whisper Transcribes audio to text

Prerequisites

  • Docker and Docker Compose
  • A HuggingFace account and access token (for Whisper model downloads)
  • A Groq account and API key (for the AI summarization step)
  • A Google account with Google Drive access (configured as N8N credentials)

Getting Started

  1. Clone the repository
git clone <repository-url>
cd video-summarizer
  1. Set up environment variables

Create a .env file in the project root:

HF_TOKEN=your_huggingface_token_here
  1. Start the services
docker compose up --build
  1. Access N8N at http://localhost:5678 to configure your workflow.

API Reference

Converter Service — GET /convert/:filename

Reads a video file from /tmp, extracts audio, and writes a mono MP3 at 16kHz/32kbps back to /tmp.

Example:

GET http://localhost:5001/convert/video.mp4

Response:

{
  "output": "video.mp3",
  "path": "/tmp/video.mp3"
}

Transcript Service — GET /transcribe/{filename}

Reads an MP3 from /tmp and returns a timestamped transcription with language detection.

Example:

GET http://localhost:5002/transcribe/video.mp3

Response:

{
  "filename": "video.mp3",
  "language": "Portuguese",
  "language_iso": "pt_BR",
  "language_probability": 0.9981,
  "transcript": "Segment one text.\nSegment two text."
}

Supported languages: Portuguese, Spanish, English, French.

Configuration

Variable Service Default Description
HF_TOKEN Transcript HuggingFace token for model download
WHISPER_MODEL Transcript medium Whisper model size
API_PORT Converter 5001 HTTP port override

Project Structure

video-summarizer/
├── converter/          # Go-based video-to-audio converter
│   ├── main.go
│   ├── go.mod
│   └── Dockerfile
├── transcript/         # Python-based audio transcription
│   ├── app.py
│   ├── requirements.txt
│   └── Dockerfile
├── videos/             # Local video file storage
├── docker-compose.yml
└── .env

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors