Initial commit: YouTube Shorts maker application

Features:
- Video download from TikTok/Douyin using yt-dlp
- Audio transcription with OpenAI Whisper
- GPT-4 translation (direct/summarize/rewrite modes)
- Subtitle generation with ASS format
- Video trimming with frame-accurate preview
- BGM integration with volume control
- Intro text overlay support
- Thumbnail generation with text overlay

Tech stack:
- Backend: FastAPI, Python 3.11+
- Frontend: React, Vite, TailwindCSS
- Video processing: FFmpeg
- AI: OpenAI Whisper, GPT-4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
kihong.kim
2026-01-03 21:38:34 +09:00
commit c3795138da
64 changed files with 13059 additions and 0 deletions

27
.env.example Normal file
View File

@@ -0,0 +1,27 @@
# OpenAI API Key (필수)
OPENAI_API_KEY=your_openai_api_key_here
# OpenAI 모델 설정
OPENAI_MODEL=gpt-4o-mini # gpt-4o-mini (저렴), gpt-4o (고품질)
# 번역 모드: direct (직역), summarize (요약), rewrite (재구성)
TRANSLATION_MODE=rewrite
# 번역 최대 토큰 수 (비용 제어)
TRANSLATION_MAX_TOKENS=1000
# Pixabay API Key (선택 - BGM 검색용)
# https://pixabay.com/api/docs/ 에서 무료 발급
PIXABAY_API_KEY=
# Freesound API Key (선택 - 자동 BGM 검색/다운로드)
# https://freesound.org/apiv2/apply 에서 무료 발급
# 50만개+ CC 라이선스 사운드 검색 가능
FREESOUND_API_KEY=
# Whisper 모델 (small, medium, large)
# GPU 없으면 medium 권장
WHISPER_MODEL=medium
# 웹 서버 포트
PORT=3000

46
.gitignore vendored Normal file
View File

@@ -0,0 +1,46 @@
# Environment
.env
.env.local
# Data directories
data/downloads/
data/processed/
data/jobs.json
backend/data/downloads/
backend/data/processed/
backend/data/jobs.json
backend/data/cookies/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
.venv/
venv_*/
**/venv/
**/venv_*/
# Node
node_modules/
dist/
.pnpm-store/
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Logs
*.log
logs/
# Whisper model cache
~/.cache/whisper/

138
README.md Normal file
View File

@@ -0,0 +1,138 @@
# Shorts Maker
중국 쇼츠 영상(Douyin, Kuaishou 등)을 다운로드하고 한글 자막을 자동으로 생성하는 웹 애플리케이션입니다.
## 주요 기능
- **영상 다운로드**: Douyin, Kuaishou, TikTok, YouTube, Bilibili 지원
- **음성 인식**: OpenAI Whisper로 자동 음성 인식
- **번역**: GPT-4로 자연스러운 한글 번역
- **자막 합성**: FFmpeg로 자막을 영상에 burn-in
- **BGM 추가**: 원본 음성 제거 후 BGM 삽입
## 시스템 요구사항
- Docker & Docker Compose
- 최소 8GB RAM (Whisper medium 모델)
- OpenAI API 키
## 빠른 시작
### 1. 환경 설정
```bash
cp .env.example .env
```
`.env` 파일을 열고 OpenAI API 키를 입력합니다:
```
OPENAI_API_KEY=sk-your-api-key-here
```
### 2. Docker 실행
```bash
docker-compose up -d --build
```
### 3. 접속
브라우저에서 `http://localhost:3000` 접속
## 사용 방법
1. **영상 URL 입력**: Douyin, Kuaishou 등의 영상 URL을 입력
2. **다운로드**: 자동으로 영상 다운로드
3. **BGM 선택**: (선택사항) 배경 음악 선택
4. **처리 시작**: 자막 생성 및 영상 처리
5. **다운로드**: 완성된 영상 다운로드
## 기술 스택
### Backend
- Python 3.11 + FastAPI
- yt-dlp (영상 다운로드)
- OpenAI Whisper (음성 인식)
- OpenAI GPT-4 (번역)
- FFmpeg (영상 처리)
### Frontend
- React 18 + Vite
- Tailwind CSS
- Axios
### Infrastructure
- Docker & Docker Compose
- Redis (작업 큐)
- Nginx (리버스 프록시)
## 디렉토리 구조
```
shorts-maker/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI 앱
│ │ ├── config.py # 설정
│ │ ├── routers/ # API 라우터
│ │ ├── services/ # 비즈니스 로직
│ │ └── models/ # 데이터 모델
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── pages/ # 페이지 컴포넌트
│ │ ├── components/ # UI 컴포넌트
│ │ └── api/ # API 클라이언트
│ ├── Dockerfile
│ └── package.json
├── data/
│ ├── downloads/ # 다운로드된 영상
│ ├── processed/ # 처리된 영상
│ └── bgm/ # BGM 파일
├── docker-compose.yml
└── .env
```
## API 엔드포인트
| Method | Endpoint | 설명 |
|--------|----------|------|
| POST | /api/download/ | 영상 다운로드 시작 |
| POST | /api/process/ | 영상 처리 시작 |
| GET | /api/jobs/ | 작업 목록 조회 |
| GET | /api/jobs/{id} | 작업 상세 조회 |
| GET | /api/jobs/{id}/download | 처리된 영상 다운로드 |
| GET | /api/bgm/ | BGM 목록 조회 |
| POST | /api/bgm/upload | BGM 업로드 |
## 설정 옵션
| 환경변수 | 기본값 | 설명 |
|----------|--------|------|
| OPENAI_API_KEY | - | OpenAI API 키 (필수) |
| WHISPER_MODEL | medium | Whisper 모델 (small/medium/large) |
| PORT | 3000 | 웹 서버 포트 |
## 문제 해결
### 다운로드 실패
- 일부 영상은 지역 제한이 있을 수 있습니다
- VPN 또는 프록시 설정이 필요할 수 있습니다
### 음성 인식 품질
- `WHISPER_MODEL=large`로 변경하면 정확도가 올라갑니다 (더 많은 메모리 필요)
### 메모리 부족
- `WHISPER_MODEL=small`로 변경하세요
## 라이선스
MIT License
## 주의사항
- 저작권이 있는 영상을 무단으로 재배포하지 마세요
- BGM은 저작권에 유의하여 사용하세요
- API 사용량에 따른 비용이 발생할 수 있습니다

50
backend/Dockerfile Normal file
View File

@@ -0,0 +1,50 @@
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
fonts-nanum \
fonts-noto-cjk \
git \
curl \
unzip \
fontconfig \
&& rm -rf /var/lib/apt/lists/*
# Install additional Korean fonts for YouTube Shorts
RUN mkdir -p /usr/share/fonts/truetype/korean && \
# Pretendard (가독성 최고)
curl -L -o /tmp/pretendard.zip "https://github.com/orioncactus/pretendard/releases/download/v1.3.9/Pretendard-1.3.9.zip" && \
unzip -j /tmp/pretendard.zip "*/Pretendard-Bold.otf" -d /usr/share/fonts/truetype/korean/ && \
unzip -j /tmp/pretendard.zip "*/Pretendard-Regular.otf" -d /usr/share/fonts/truetype/korean/ && \
# Black Han Sans (임팩트)
curl -L -o /usr/share/fonts/truetype/korean/BlackHanSans-Regular.ttf "https://github.com/AcDevelopers/Black-Han-Sans/raw/master/fonts/ttf/BlackHanSans-Regular.ttf" && \
# DoHyeon (귀여움)
curl -L -o /usr/share/fonts/truetype/korean/DoHyeon-Regular.ttf "https://github.com/nicholasrq/DoHyeon/raw/master/fonts/DoHyeon-Regular.ttf" && \
# Update font cache
fc-cache -fv && \
rm -rf /tmp/*.zip
# Install yt-dlp
RUN pip install --no-cache-dir yt-dlp
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Download Whisper model (medium for CPU)
RUN python -c "import whisper; whisper.load_model('medium')"
# Copy application code
COPY . .
# Create data directories
RUN mkdir -p data/downloads data/processed data/bgm
# Expose port
EXPOSE 8000
# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

1
backend/app/__init__.py Normal file
View File

@@ -0,0 +1 @@
# Shorts Maker Backend

53
backend/app/config.py Normal file
View File

@@ -0,0 +1,53 @@
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
# API Keys
OPENAI_API_KEY: str = ""
PIXABAY_API_KEY: str = "" # Optional: for Pixabay music search
FREESOUND_API_KEY: str = "" # Optional: for Freesound API (https://freesound.org/apiv2/apply)
# Directories
DOWNLOAD_DIR: str = "data/downloads"
PROCESSED_DIR: str = "data/processed"
BGM_DIR: str = "data/bgm"
# Whisper settings
WHISPER_MODEL: str = "medium" # small, medium, large
# Redis
REDIS_URL: str = "redis://redis:6379/0"
# OpenAI settings
OPENAI_MODEL: str = "gpt-4o-mini" # gpt-4o-mini, gpt-4o, gpt-4-turbo
TRANSLATION_MAX_TOKENS: int = 1000 # Max tokens for translation (cost control)
TRANSLATION_MODE: str = "rewrite" # direct, summarize, rewrite
# GPT Prompt Customization
GPT_ROLE: str = "친근한 유튜브 쇼츠 자막 작가" # GPT persona/role
GPT_TONE: str = "존댓말" # 존댓말, 반말, 격식체
GPT_STYLE: str = "" # Additional style instructions (optional)
# Processing
DEFAULT_FONT_SIZE: int = 24
DEFAULT_FONT_COLOR: str = "white"
DEFAULT_BGM_VOLUME: float = 0.3
# Server
PORT: int = 3000 # Frontend port
# Proxy (for geo-restricted platforms like Douyin)
PROXY_URL: str = "" # http://host:port or socks5://host:port
class Config:
env_file = "../.env" # Project root .env file
extra = "ignore" # Ignore extra fields in .env
@lru_cache()
def get_settings():
return Settings()
settings = get_settings()

64
backend/app/main.py Normal file
View File

@@ -0,0 +1,64 @@
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from contextlib import asynccontextmanager
import os
from app.routers import download, process, bgm, jobs, fonts
from app.config import settings
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
os.makedirs(settings.DOWNLOAD_DIR, exist_ok=True)
os.makedirs(settings.PROCESSED_DIR, exist_ok=True)
os.makedirs(settings.BGM_DIR, exist_ok=True)
# Check BGM status on startup
bgm_files = []
if os.path.exists(settings.BGM_DIR):
bgm_files = [f for f in os.listdir(settings.BGM_DIR) if f.endswith(('.mp3', '.wav', '.m4a', '.ogg'))]
if len(bgm_files) == 0:
print("[Startup] No BGM files found. Upload BGM via /api/bgm/upload or download from Pixabay/Mixkit")
else:
names = ', '.join(bgm_files[:3])
suffix = f'... (+{len(bgm_files) - 3} more)' if len(bgm_files) > 3 else ''
print(f"[Startup] Found {len(bgm_files)} BGM files: {names}{suffix}")
yield
# Shutdown
app = FastAPI(
title="Shorts Maker API",
description="중국 쇼츠 영상을 한글 자막으로 변환하는 서비스",
version="1.0.0",
lifespan=lifespan,
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Static files for processed videos
app.mount("/static/downloads", StaticFiles(directory="data/downloads"), name="downloads")
app.mount("/static/processed", StaticFiles(directory="data/processed"), name="processed")
app.mount("/static/bgm", StaticFiles(directory="data/bgm"), name="bgm")
# Routers
app.include_router(download.router, prefix="/api/download", tags=["Download"])
app.include_router(process.router, prefix="/api/process", tags=["Process"])
app.include_router(bgm.router, prefix="/api/bgm", tags=["BGM"])
app.include_router(jobs.router, prefix="/api/jobs", tags=["Jobs"])
app.include_router(fonts.router, prefix="/api/fonts", tags=["Fonts"])
@app.get("/api/health")
async def health_check():
return {"status": "healthy", "service": "shorts-maker"}

View File

@@ -0,0 +1,2 @@
from app.models.schemas import *
from app.models.job_store import job_store

View File

@@ -0,0 +1,91 @@
from typing import Dict, Optional
from datetime import datetime
import uuid
import json
import os
from app.models.schemas import JobInfo, JobStatus
class JobStore:
"""Simple in-memory job store with file persistence."""
def __init__(self, persistence_file: str = "data/jobs.json"):
self._jobs: Dict[str, JobInfo] = {}
self._persistence_file = persistence_file
self._load_jobs()
def _load_jobs(self):
"""Load jobs from file on startup."""
if os.path.exists(self._persistence_file):
try:
with open(self._persistence_file, "r") as f:
data = json.load(f)
for job_id, job_data in data.items():
job_data["created_at"] = datetime.fromisoformat(job_data["created_at"])
job_data["updated_at"] = datetime.fromisoformat(job_data["updated_at"])
self._jobs[job_id] = JobInfo(**job_data)
except Exception:
pass
def _save_jobs(self):
"""Persist jobs to file."""
os.makedirs(os.path.dirname(self._persistence_file), exist_ok=True)
data = {}
for job_id, job in self._jobs.items():
job_dict = job.model_dump()
job_dict["created_at"] = job_dict["created_at"].isoformat()
job_dict["updated_at"] = job_dict["updated_at"].isoformat()
data[job_id] = job_dict
with open(self._persistence_file, "w") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
def create_job(self, original_url: str) -> JobInfo:
"""Create a new job."""
job_id = str(uuid.uuid4())[:8]
now = datetime.now()
job = JobInfo(
job_id=job_id,
status=JobStatus.PENDING,
created_at=now,
updated_at=now,
original_url=original_url,
)
self._jobs[job_id] = job
self._save_jobs()
return job
def get_job(self, job_id: str) -> Optional[JobInfo]:
"""Get a job by ID."""
return self._jobs.get(job_id)
def update_job(self, job_id: str, **kwargs) -> Optional[JobInfo]:
"""Update a job."""
job = self._jobs.get(job_id)
if job:
for key, value in kwargs.items():
if hasattr(job, key):
setattr(job, key, value)
job.updated_at = datetime.now()
self._save_jobs()
return job
def list_jobs(self, limit: int = 50) -> list[JobInfo]:
"""List recent jobs."""
jobs = sorted(
self._jobs.values(),
key=lambda j: j.created_at,
reverse=True
)
return jobs[:limit]
def delete_job(self, job_id: str) -> bool:
"""Delete a job."""
if job_id in self._jobs:
del self._jobs[job_id]
self._save_jobs()
return True
return False
# Global job store instance
job_store = JobStore()

View File

@@ -0,0 +1,279 @@
from pydantic import BaseModel, HttpUrl
from typing import Optional, List
from enum import Enum
from datetime import datetime
class JobStatus(str, Enum):
PENDING = "pending"
DOWNLOADING = "downloading"
READY_FOR_TRIM = "ready_for_trim" # Download complete, ready for trimming
TRIMMING = "trimming" # Video trimming in progress
EXTRACTING_AUDIO = "extracting_audio" # Step 2: FFmpeg audio extraction
NOISE_REDUCTION = "noise_reduction" # Step 3: Noise reduction
TRANSCRIBING = "transcribing" # Step 4: Whisper STT
TRANSLATING = "translating" # Step 5: GPT translation
AWAITING_REVIEW = "awaiting_review" # Script ready, waiting for user review before rendering
PROCESSING = "processing" # Step 6: Video composition + BGM
COMPLETED = "completed"
FAILED = "failed"
AWAITING_SUBTITLE = "awaiting_subtitle" # No audio - waiting for manual subtitle input
class DownloadRequest(BaseModel):
url: str
platform: Optional[str] = None # auto-detect if not provided
class DownloadResponse(BaseModel):
job_id: str
status: JobStatus
message: str
class SubtitleStyle(BaseModel):
font_size: int = 28
font_color: str = "white"
outline_color: str = "black"
outline_width: int = 2
position: str = "bottom" # top, center, bottom
font_name: str = "Pretendard"
# Enhanced styling options
bold: bool = True # 굵은 글씨 (가독성 향상)
shadow: int = 1 # 그림자 깊이 (0=없음, 1-4)
background_box: bool = True # 불투명 배경 박스로 원본 자막 덮기
background_opacity: str = "E0" # 배경 불투명도 (00=투명, FF=완전불투명, E0=권장)
animation: str = "none" # none, fade, pop (자막 애니메이션)
class TranslationModeEnum(str, Enum):
DIRECT = "direct" # 직접 번역 (원본 구조 유지)
SUMMARIZE = "summarize" # 요약 후 번역
REWRITE = "rewrite" # 완전 재구성 (권장)
class ProcessRequest(BaseModel):
job_id: str
bgm_id: Optional[str] = None
bgm_volume: float = 0.3
subtitle_style: Optional[SubtitleStyle] = None
keep_original_audio: bool = False
translation_mode: Optional[str] = None # direct, summarize, rewrite (default from settings)
use_vocal_separation: bool = False # Separate vocals from BGM before transcription
class ProcessResponse(BaseModel):
job_id: str
status: JobStatus
message: str
class TrimRequest(BaseModel):
"""Request to trim a video to a specific time range."""
start_time: float # Start time in seconds
end_time: float # End time in seconds
reprocess: bool = False # Whether to automatically reprocess after trimming (default: False for manual workflow)
class TranscribeRequest(BaseModel):
"""Request to start transcription (audio extraction + STT + translation)."""
translation_mode: Optional[str] = "rewrite" # direct, summarize, rewrite
use_vocal_separation: bool = False # Separate vocals from BGM before transcription
class RenderRequest(BaseModel):
"""Request to render final video with subtitles and BGM."""
bgm_id: Optional[str] = None
bgm_volume: float = 0.3
subtitle_style: Optional[SubtitleStyle] = None
keep_original_audio: bool = False
# Intro text overlay (shown at beginning of video for YouTube Shorts thumbnail)
intro_text: Optional[str] = None # Max 10 characters recommended
intro_duration: float = 0.7 # Duration of frozen frame with intro text (seconds)
intro_font_size: int = 100 # Font size
class TrimResponse(BaseModel):
"""Response after trimming a video."""
job_id: str
success: bool
message: str
new_duration: Optional[float] = None
class VideoInfoResponse(BaseModel):
"""Video information for trimming UI."""
duration: float
width: Optional[int] = None
height: Optional[int] = None
thumbnail_url: Optional[str] = None
class TranscriptSegment(BaseModel):
start: float
end: float
text: str
translated: Optional[str] = None
class JobInfo(BaseModel):
job_id: str
status: JobStatus
created_at: datetime
updated_at: datetime
original_url: Optional[str] = None
video_path: Optional[str] = None
output_path: Optional[str] = None
transcript: Optional[List[TranscriptSegment]] = None
error: Optional[str] = None
progress: int = 0
has_audio: Optional[bool] = None # None = not checked, True = has audio, False = no audio
audio_status: Optional[str] = None # "ok", "no_audio_stream", "audio_silent"
detected_language: Optional[str] = None # Whisper detected language (e.g., "zh", "en", "ko")
class BGMInfo(BaseModel):
id: str
name: str
duration: float
path: str
class BGMUploadResponse(BaseModel):
id: str
name: str
message: str
# 한글 폰트 정의
class FontInfo(BaseModel):
"""Font information for subtitle styling."""
id: str # 폰트 ID (시스템 폰트명)
name: str # 표시 이름
style: str # 스타일 분류
recommended_for: List[str] # 추천 콘텐츠 유형
download_url: Optional[str] = None # 다운로드 링크
license: str = "Free for commercial use"
# 쇼츠에서 인기있는 무료 상업용 한글 폰트
KOREAN_FONTS = {
# 기본 시스템 폰트 (대부분의 시스템에 설치됨)
"NanumGothic": FontInfo(
id="NanumGothic",
name="나눔고딕",
style="깔끔, 기본",
recommended_for=["tutorial", "news", "general"],
download_url="https://hangeul.naver.com/font",
license="OFL (Open Font License)",
),
"NanumGothicBold": FontInfo(
id="NanumGothicBold",
name="나눔고딕 Bold",
style="깔끔, 강조",
recommended_for=["tutorial", "news", "general"],
download_url="https://hangeul.naver.com/font",
license="OFL (Open Font License)",
),
"NanumSquareRound": FontInfo(
id="NanumSquareRound",
name="나눔스퀘어라운드",
style="둥글, 친근",
recommended_for=["travel", "lifestyle", "vlog"],
download_url="https://hangeul.naver.com/font",
license="OFL (Open Font License)",
),
# 인기 무료 폰트 (별도 설치 필요)
"Pretendard": FontInfo(
id="Pretendard",
name="프리텐다드",
style="현대적, 깔끔",
recommended_for=["tutorial", "tech", "business"],
download_url="https://github.com/orioncactus/pretendard",
license="OFL (Open Font License)",
),
"SpoqaHanSansNeo": FontInfo(
id="SpoqaHanSansNeo",
name="스포카 한 산스 Neo",
style="깔끔, 가독성",
recommended_for=["tutorial", "tech", "presentation"],
download_url="https://github.com/spoqa/spoqa-han-sans",
license="OFL (Open Font License)",
),
"GmarketSans": FontInfo(
id="GmarketSans",
name="G마켓 산스",
style="둥글, 친근",
recommended_for=["shopping", "review", "lifestyle"],
download_url="https://corp.gmarket.com/fonts",
license="Free for commercial use",
),
# 개성있는 폰트
"BMDoHyeon": FontInfo(
id="BMDoHyeon",
name="배민 도현체",
style="손글씨, 유머",
recommended_for=["comedy", "mukbang", "cooking"],
download_url="https://www.woowahan.com/fonts",
license="OFL (Open Font License)",
),
"BMJua": FontInfo(
id="BMJua",
name="배민 주아체",
style="귀여움, 캐주얼",
recommended_for=["cooking", "lifestyle", "kids"],
download_url="https://www.woowahan.com/fonts",
license="OFL (Open Font License)",
),
"Cafe24Ssurround": FontInfo(
id="Cafe24Ssurround",
name="카페24 써라운드",
style="강조, 임팩트",
recommended_for=["gaming", "reaction", "highlight"],
download_url="https://fonts.cafe24.com/",
license="Free for commercial use",
),
"Cafe24SsurroundAir": FontInfo(
id="Cafe24SsurroundAir",
name="카페24 써라운드 에어",
style="가벼움, 깔끔",
recommended_for=["vlog", "daily", "lifestyle"],
download_url="https://fonts.cafe24.com/",
license="Free for commercial use",
),
# 제목/강조용 폰트
"BlackHanSans": FontInfo(
id="BlackHanSans",
name="검은고딕",
style="굵음, 강렬",
recommended_for=["gaming", "sports", "action"],
download_url="https://fonts.google.com/specimen/Black+Han+Sans",
license="OFL (Open Font License)",
),
"DoHyeon": FontInfo(
id="DoHyeon",
name="도현",
style="손글씨, 자연스러움",
recommended_for=["vlog", "cooking", "asmr"],
download_url="https://fonts.google.com/specimen/Do+Hyeon",
license="OFL (Open Font License)",
),
}
# 콘텐츠 유형별 추천 폰트
FONT_RECOMMENDATIONS = {
"tutorial": ["Pretendard", "SpoqaHanSansNeo", "NanumGothic"],
"gaming": ["Cafe24Ssurround", "BlackHanSans", "GmarketSans"],
"cooking": ["BMDoHyeon", "BMJua", "DoHyeon"],
"comedy": ["BMDoHyeon", "Cafe24Ssurround", "GmarketSans"],
"travel": ["NanumSquareRound", "Cafe24SsurroundAir", "GmarketSans"],
"news": ["Pretendard", "NanumGothic", "SpoqaHanSansNeo"],
"asmr": ["DoHyeon", "NanumSquareRound", "Cafe24SsurroundAir"],
"fitness": ["BlackHanSans", "Cafe24Ssurround", "GmarketSans"],
"tech": ["Pretendard", "SpoqaHanSansNeo", "NanumGothic"],
"lifestyle": ["GmarketSans", "NanumSquareRound", "Cafe24SsurroundAir"],
}

View File

@@ -0,0 +1 @@
from app.routers import download, process, bgm, jobs

578
backend/app/routers/bgm.py Normal file
View File

@@ -0,0 +1,578 @@
import os
import aiofiles
import httpx
from typing import List
from fastapi import APIRouter, UploadFile, File, HTTPException
from pydantic import BaseModel
from app.models.schemas import BGMInfo, BGMUploadResponse, TranscriptSegment
from app.services.video_processor import get_audio_duration
from app.services.bgm_provider import (
get_free_bgm_sources,
search_freesound,
download_freesound,
search_and_download_bgm,
BGMSearchResult,
)
from app.services.bgm_recommender import (
recommend_bgm_for_script,
get_preset_recommendation,
BGMRecommendation,
BGM_PRESETS,
)
from app.services.default_bgm import (
initialize_default_bgm,
get_default_bgm_list,
check_default_bgm_status,
DEFAULT_BGM_TRACKS,
)
from app.config import settings
router = APIRouter()
class BGMDownloadRequest(BaseModel):
"""Request to download BGM from URL."""
url: str
name: str
class BGMRecommendRequest(BaseModel):
"""Request for BGM recommendation based on script."""
segments: List[dict] # TranscriptSegment as dict
use_translated: bool = True
class FreesoundSearchRequest(BaseModel):
"""Request to search Freesound."""
query: str
min_duration: int = 10
max_duration: int = 180
page: int = 1
page_size: int = 15
commercial_only: bool = True # 상업적 사용 가능한 라이선스만 (CC0, CC-BY)
class FreesoundDownloadRequest(BaseModel):
"""Request to download from Freesound."""
sound_id: str
name: str # Custom name for the downloaded file
class AutoBGMRequest(BaseModel):
"""Request for automatic BGM search and download."""
keywords: List[str] # Search keywords (from BGM recommendation)
max_duration: int = 120
commercial_only: bool = True # 상업적 사용 가능한 라이선스만
@router.get("/", response_model=list[BGMInfo])
async def list_bgm():
"""List all available BGM files."""
bgm_list = []
if not os.path.exists(settings.BGM_DIR):
return bgm_list
for filename in os.listdir(settings.BGM_DIR):
if filename.endswith((".mp3", ".wav", ".m4a", ".ogg")):
filepath = os.path.join(settings.BGM_DIR, filename)
bgm_id = os.path.splitext(filename)[0]
duration = await get_audio_duration(filepath)
bgm_list.append(BGMInfo(
id=bgm_id,
name=bgm_id.replace("_", " ").replace("-", " ").title(),
duration=duration or 0,
path=f"/static/bgm/{filename}"
))
return bgm_list
@router.post("/upload", response_model=BGMUploadResponse)
async def upload_bgm(
file: UploadFile = File(...),
name: str | None = None
):
"""Upload a new BGM file."""
if not file.filename:
raise HTTPException(status_code=400, detail="No filename provided")
# Validate file type
allowed_extensions = (".mp3", ".wav", ".m4a", ".ogg")
if not file.filename.lower().endswith(allowed_extensions):
raise HTTPException(
status_code=400,
detail=f"Invalid file type. Allowed: {allowed_extensions}"
)
# Generate ID
bgm_id = name or os.path.splitext(file.filename)[0]
bgm_id = bgm_id.lower().replace(" ", "_")
# Get extension
ext = os.path.splitext(file.filename)[1].lower()
filepath = os.path.join(settings.BGM_DIR, f"{bgm_id}{ext}")
# Save file
os.makedirs(settings.BGM_DIR, exist_ok=True)
async with aiofiles.open(filepath, 'wb') as out_file:
content = await file.read()
await out_file.write(content)
return BGMUploadResponse(
id=bgm_id,
name=name or file.filename,
message="BGM uploaded successfully"
)
@router.delete("/{bgm_id}")
async def delete_bgm(bgm_id: str):
"""Delete a BGM file."""
for ext in (".mp3", ".wav", ".m4a", ".ogg"):
filepath = os.path.join(settings.BGM_DIR, f"{bgm_id}{ext}")
if os.path.exists(filepath):
os.remove(filepath)
return {"message": f"BGM '{bgm_id}' deleted"}
raise HTTPException(status_code=404, detail="BGM not found")
@router.get("/sources/free", response_model=dict)
async def get_free_sources():
"""Get list of recommended free BGM sources for commercial use."""
sources = get_free_bgm_sources()
return {
"sources": sources,
"notice": "이 소스들은 상업적 사용이 가능한 무료 음악을 제공합니다. 각 사이트의 라이선스를 확인하세요.",
"recommended": [
{
"name": "Pixabay Music",
"url": "https://pixabay.com/music/search/",
"why": "CC0 라이선스, 저작권 표기 불필요, 쇼츠용 짧은 트랙 많음",
"search_tips": ["upbeat", "energetic", "chill", "cinematic", "funny"],
},
{
"name": "Mixkit",
"url": "https://mixkit.co/free-stock-music/",
"why": "고품질, 카테고리별 정리, 상업적 무료 사용",
"search_tips": ["short", "intro", "background"],
},
],
}
@router.post("/download-url", response_model=BGMUploadResponse)
async def download_bgm_from_url(request: BGMDownloadRequest):
"""
Download BGM from external URL (Pixabay, Mixkit, etc.)
Use this to download free BGM files directly from their source URLs.
"""
url = request.url
name = request.name.lower().replace(" ", "_")
# Validate URL - allow trusted free music sources
allowed_domains = [
"pixabay.com",
"cdn.pixabay.com",
"mixkit.co",
"assets.mixkit.co",
"uppbeat.io",
"freemusicarchive.org",
]
from urllib.parse import urlparse
parsed = urlparse(url)
domain = parsed.netloc.lower()
if not any(allowed in domain for allowed in allowed_domains):
raise HTTPException(
status_code=400,
detail=f"URL must be from allowed sources: {', '.join(allowed_domains)}"
)
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, timeout=60)
if response.status_code != 200:
raise HTTPException(
status_code=400,
detail=f"Failed to download: HTTP {response.status_code}"
)
# Determine file extension
content_type = response.headers.get("content-type", "")
if "mpeg" in content_type or url.endswith(".mp3"):
ext = ".mp3"
elif "wav" in content_type or url.endswith(".wav"):
ext = ".wav"
elif "ogg" in content_type or url.endswith(".ogg"):
ext = ".ogg"
else:
ext = ".mp3"
# Save file
os.makedirs(settings.BGM_DIR, exist_ok=True)
filepath = os.path.join(settings.BGM_DIR, f"{name}{ext}")
async with aiofiles.open(filepath, 'wb') as f:
await f.write(response.content)
return BGMUploadResponse(
id=name,
name=request.name,
message=f"BGM downloaded from {domain}"
)
except httpx.TimeoutException:
raise HTTPException(status_code=408, detail="Download timed out")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Download failed: {str(e)}")
@router.post("/recommend")
async def recommend_bgm(request: BGMRecommendRequest):
"""
AI-powered BGM recommendation based on script content.
Analyzes the script mood and suggests appropriate background music.
Returns matched BGM from library if available, otherwise provides search keywords.
"""
# Convert dict to TranscriptSegment
segments = [TranscriptSegment(**seg) for seg in request.segments]
# Get available BGM list
available_bgm = []
if os.path.exists(settings.BGM_DIR):
for filename in os.listdir(settings.BGM_DIR):
if filename.endswith((".mp3", ".wav", ".m4a", ".ogg")):
bgm_id = os.path.splitext(filename)[0]
available_bgm.append({
"id": bgm_id,
"name": bgm_id.replace("_", " ").replace("-", " "),
})
# Get recommendation
success, message, recommendation = await recommend_bgm_for_script(
segments,
available_bgm,
use_translated=request.use_translated,
)
if not success:
raise HTTPException(status_code=500, detail=message)
# Build Pixabay search URL
search_keywords = "+".join(recommendation.search_keywords[:3])
pixabay_url = f"https://pixabay.com/music/search/{search_keywords}/"
return {
"recommendation": {
"mood": recommendation.mood,
"energy": recommendation.energy,
"reasoning": recommendation.reasoning,
"suggested_genres": recommendation.suggested_genres,
"search_keywords": recommendation.search_keywords,
},
"matched_bgm": recommendation.matched_bgm_id,
"search_urls": {
"pixabay": pixabay_url,
"mixkit": f"https://mixkit.co/free-stock-music/?q={search_keywords}",
},
"message": message,
}
@router.get("/recommend/presets")
async def get_bgm_presets():
"""
Get predefined BGM presets for common content types.
Use these presets for quick BGM selection without AI analysis.
"""
presets = {}
for content_type, preset_info in BGM_PRESETS.items():
presets[content_type] = {
"mood": preset_info["mood"],
"keywords": preset_info["keywords"],
"description": f"Best for {content_type} content",
}
return {
"presets": presets,
"usage": "Use content_type parameter with /recommend/preset/{content_type}",
}
@router.get("/recommend/preset/{content_type}")
async def get_preset_bgm(content_type: str):
"""
Get BGM recommendation for a specific content type.
Available types: cooking, fitness, tutorial, comedy, travel, asmr, news, gaming
"""
recommendation = get_preset_recommendation(content_type)
if not recommendation:
available_types = list(BGM_PRESETS.keys())
raise HTTPException(
status_code=404,
detail=f"Unknown content type. Available: {', '.join(available_types)}"
)
# Check for matching BGM in library
if os.path.exists(settings.BGM_DIR):
for filename in os.listdir(settings.BGM_DIR):
if filename.endswith((".mp3", ".wav", ".m4a", ".ogg")):
bgm_id = os.path.splitext(filename)[0]
bgm_name = bgm_id.lower()
# Check if any keyword matches
for keyword in recommendation.search_keywords:
if keyword in bgm_name:
recommendation.matched_bgm_id = bgm_id
break
if recommendation.matched_bgm_id:
break
search_keywords = "+".join(recommendation.search_keywords[:3])
return {
"content_type": content_type,
"recommendation": {
"mood": recommendation.mood,
"energy": recommendation.energy,
"suggested_genres": recommendation.suggested_genres,
"search_keywords": recommendation.search_keywords,
},
"matched_bgm": recommendation.matched_bgm_id,
"search_urls": {
"pixabay": f"https://pixabay.com/music/search/{search_keywords}/",
"mixkit": f"https://mixkit.co/free-stock-music/?q={search_keywords}",
},
}
@router.post("/freesound/search")
async def search_freesound_api(request: FreesoundSearchRequest):
"""
Search for music on Freesound.
Freesound API provides 500,000+ CC licensed sounds.
Get your API key at: https://freesound.org/apiv2/apply
Set commercial_only=true (default) to only return CC0 licensed sounds
that can be used commercially without attribution.
"""
success, message, results = await search_freesound(
query=request.query,
min_duration=request.min_duration,
max_duration=request.max_duration,
page=request.page,
page_size=request.page_size,
commercial_only=request.commercial_only,
)
if not success:
raise HTTPException(status_code=400, detail=message)
def is_commercial_ok(license_str: str) -> bool:
return "CC0" in license_str or license_str == "CC BY (Attribution)"
return {
"message": message,
"commercial_only": request.commercial_only,
"results": [
{
"id": r.id,
"title": r.title,
"duration": r.duration,
"tags": r.tags,
"license": r.license,
"commercial_use_ok": is_commercial_ok(r.license),
"preview_url": r.preview_url,
"source": r.source,
}
for r in results
],
"search_url": f"https://freesound.org/search/?q={request.query}",
}
@router.post("/freesound/download", response_model=BGMUploadResponse)
async def download_freesound_api(request: FreesoundDownloadRequest):
"""
Download a sound from Freesound by ID.
Downloads the high-quality preview (128kbps MP3).
"""
name = request.name.lower().replace(" ", "_")
name = "".join(c for c in name if c.isalnum() or c == "_")
success, message, file_path = await download_freesound(
sound_id=request.sound_id,
output_dir=settings.BGM_DIR,
filename=name,
)
if not success:
raise HTTPException(status_code=400, detail=message)
return BGMUploadResponse(
id=name,
name=request.name,
message=message,
)
@router.post("/auto-download")
async def auto_download_bgm(request: AutoBGMRequest):
"""
Automatically search and download BGM based on keywords.
Use this with keywords from /recommend endpoint to auto-download matching BGM.
Requires FREESOUND_API_KEY to be configured.
Set commercial_only=true (default) to only download CC0 licensed sounds
that can be used commercially without attribution.
"""
success, message, file_path, matched_result = await search_and_download_bgm(
keywords=request.keywords,
output_dir=settings.BGM_DIR,
max_duration=request.max_duration,
commercial_only=request.commercial_only,
)
if not success:
return {
"success": False,
"message": message,
"downloaded": None,
"suggestion": "Configure FREESOUND_API_KEY or manually download from Pixabay/Mixkit",
}
# Get duration of downloaded file
duration = 0
if file_path:
duration = await get_audio_duration(file_path) or 0
# Check if license is commercially usable
license_name = matched_result.license if matched_result else ""
commercial_ok = "CC0" in license_name or license_name == "CC BY (Attribution)"
return {
"success": True,
"message": message,
"downloaded": {
"id": os.path.splitext(os.path.basename(file_path))[0] if file_path else None,
"name": matched_result.title if matched_result else None,
"duration": duration,
"license": license_name,
"commercial_use_ok": commercial_ok,
"source": "freesound",
"path": f"/static/bgm/{os.path.basename(file_path)}" if file_path else None,
},
"original": {
"freesound_id": matched_result.id if matched_result else None,
"tags": matched_result.tags if matched_result else [],
},
}
@router.get("/defaults/status")
async def get_default_bgm_status():
"""
Check status of default BGM tracks.
Returns which default tracks are installed and which are missing.
"""
status = check_default_bgm_status(settings.BGM_DIR)
# Add track details
tracks = []
for track in DEFAULT_BGM_TRACKS:
installed = track.id in status["installed_ids"]
tracks.append({
"id": track.id,
"name": track.name,
"category": track.category,
"description": track.description,
"installed": installed,
})
return {
"total": status["total"],
"installed": status["installed"],
"missing": status["missing"],
"tracks": tracks,
}
@router.post("/defaults/initialize")
async def initialize_default_bgms(force: bool = False):
"""
Download default BGM tracks.
Downloads pre-selected royalty-free BGM tracks (Pixabay License).
Use force=true to re-download all tracks.
These tracks are free for commercial use without attribution.
"""
downloaded, skipped, errors = await initialize_default_bgm(
settings.BGM_DIR,
force=force,
)
return {
"success": len(errors) == 0,
"downloaded": downloaded,
"skipped": skipped,
"errors": errors,
"message": f"Downloaded {downloaded} tracks, skipped {skipped} existing" if downloaded > 0
else "All default tracks already installed" if skipped > 0
else "Failed to download tracks",
}
@router.get("/defaults/list")
async def list_default_bgms():
"""
Get list of available default BGM tracks with metadata.
Returns information about all pre-configured default tracks.
"""
tracks = await get_default_bgm_list()
status = check_default_bgm_status(settings.BGM_DIR)
for track in tracks:
track["installed"] = track["id"] in status["installed_ids"]
return {
"tracks": tracks,
"total": len(tracks),
"installed": status["installed"],
"license": "Pixabay License (Free for commercial use, no attribution required)",
}
@router.get("/{bgm_id}")
async def get_bgm(bgm_id: str):
"""Get BGM info by ID."""
for ext in (".mp3", ".wav", ".m4a", ".ogg"):
filepath = os.path.join(settings.BGM_DIR, f"{bgm_id}{ext}")
if os.path.exists(filepath):
duration = await get_audio_duration(filepath)
return BGMInfo(
id=bgm_id,
name=bgm_id.replace("_", " ").replace("-", " ").title(),
duration=duration or 0,
path=f"/static/bgm/{bgm_id}{ext}"
)
raise HTTPException(status_code=404, detail="BGM not found")

View File

@@ -0,0 +1,62 @@
from fastapi import APIRouter, BackgroundTasks, HTTPException
from app.models.schemas import DownloadRequest, DownloadResponse, JobStatus
from app.models.job_store import job_store
from app.services.downloader import download_video, detect_platform
router = APIRouter()
async def download_task(job_id: str, url: str):
"""Background task for downloading video."""
job_store.update_job(job_id, status=JobStatus.DOWNLOADING, progress=10)
success, message, video_path = await download_video(url, job_id)
if success:
job_store.update_job(
job_id,
status=JobStatus.READY_FOR_TRIM, # Ready for trimming step
video_path=video_path,
progress=30,
)
else:
job_store.update_job(
job_id,
status=JobStatus.FAILED,
error=message,
)
@router.post("/", response_model=DownloadResponse)
async def start_download(
request: DownloadRequest,
background_tasks: BackgroundTasks
):
"""Start video download from URL."""
platform = request.platform or detect_platform(request.url)
# Create job
job = job_store.create_job(original_url=request.url)
# Start background download
background_tasks.add_task(download_task, job.job_id, request.url)
return DownloadResponse(
job_id=job.job_id,
status=JobStatus.PENDING,
message=f"Download started for {platform} video"
)
@router.get("/platforms")
async def get_supported_platforms():
"""Get list of supported platforms."""
return {
"platforms": [
{"id": "douyin", "name": "抖音 (Douyin)", "domains": ["douyin.com", "iesdouyin.com"]},
{"id": "kuaishou", "name": "快手 (Kuaishou)", "domains": ["kuaishou.com", "gifshow.com"]},
{"id": "bilibili", "name": "哔哩哔哩 (Bilibili)", "domains": ["bilibili.com"]},
{"id": "tiktok", "name": "TikTok", "domains": ["tiktok.com"]},
{"id": "youtube", "name": "YouTube", "domains": ["youtube.com", "youtu.be"]},
]
}

View File

@@ -0,0 +1,163 @@
"""
Fonts Router - Korean font management for subtitles.
Provides font listing and recommendations for YouTube Shorts subtitles.
"""
from fastapi import APIRouter, HTTPException
from app.models.schemas import FontInfo, KOREAN_FONTS, FONT_RECOMMENDATIONS
router = APIRouter()
@router.get("/")
async def list_fonts():
"""
List all available Korean fonts for subtitles.
Returns font information including:
- id: System font name to use in subtitle_style.font_name
- name: Display name in Korean
- style: Font style description
- recommended_for: Content types this font works well with
- download_url: Where to download the font
- license: Font license information
"""
fonts = []
for font_id, font_info in KOREAN_FONTS.items():
fonts.append({
"id": font_info.id,
"name": font_info.name,
"style": font_info.style,
"recommended_for": font_info.recommended_for,
"download_url": font_info.download_url,
"license": font_info.license,
})
return {
"fonts": fonts,
"total": len(fonts),
"default": "NanumGothic",
"usage": "Set subtitle_style.font_name to the font id",
}
@router.get("/recommend/{content_type}")
async def recommend_fonts(content_type: str):
"""
Get font recommendations for a specific content type.
Available content types:
- tutorial: 튜토리얼, 강의
- gaming: 게임, 리액션
- cooking: 요리, 먹방
- comedy: 코미디, 유머
- travel: 여행, 브이로그
- news: 뉴스, 정보
- asmr: ASMR, 릴렉스
- fitness: 운동, 피트니스
- tech: 기술, IT
- lifestyle: 라이프스타일, 일상
"""
content_type_lower = content_type.lower()
if content_type_lower not in FONT_RECOMMENDATIONS:
available_types = list(FONT_RECOMMENDATIONS.keys())
raise HTTPException(
status_code=404,
detail=f"Unknown content type. Available: {', '.join(available_types)}"
)
recommended_ids = FONT_RECOMMENDATIONS[content_type_lower]
recommendations = []
for font_id in recommended_ids:
if font_id in KOREAN_FONTS:
font = KOREAN_FONTS[font_id]
recommendations.append({
"id": font.id,
"name": font.name,
"style": font.style,
"download_url": font.download_url,
})
return {
"content_type": content_type_lower,
"recommendations": recommendations,
"primary": recommended_ids[0] if recommended_ids else "NanumGothic",
}
@router.get("/categories")
async def list_font_categories():
"""
List fonts grouped by style category.
"""
categories = {
"clean": {
"name": "깔끔/모던",
"description": "정보성 콘텐츠, 튜토리얼에 적합",
"fonts": ["Pretendard", "SpoqaHanSansNeo", "NanumGothic"],
},
"friendly": {
"name": "친근/둥글",
"description": "일상, 라이프스타일 콘텐츠에 적합",
"fonts": ["GmarketSans", "NanumSquareRound", "Cafe24SsurroundAir"],
},
"handwriting": {
"name": "손글씨/캐주얼",
"description": "먹방, 요리, 유머 콘텐츠에 적합",
"fonts": ["BMDoHyeon", "BMJua", "DoHyeon"],
},
"impact": {
"name": "강조/임팩트",
"description": "게임, 하이라이트, 리액션에 적합",
"fonts": ["Cafe24Ssurround", "BlackHanSans"],
},
}
# Add font details to each category
for category_id, category_info in categories.items():
font_details = []
for font_id in category_info["fonts"]:
if font_id in KOREAN_FONTS:
font = KOREAN_FONTS[font_id]
font_details.append({
"id": font.id,
"name": font.name,
})
category_info["font_details"] = font_details
return {
"categories": categories,
}
@router.get("/{font_id}")
async def get_font(font_id: str):
"""
Get detailed information about a specific font.
"""
if font_id not in KOREAN_FONTS:
available_fonts = list(KOREAN_FONTS.keys())
raise HTTPException(
status_code=404,
detail=f"Font not found. Available fonts: {', '.join(available_fonts)}"
)
font = KOREAN_FONTS[font_id]
return {
"id": font.id,
"name": font.name,
"style": font.style,
"recommended_for": font.recommended_for,
"download_url": font.download_url,
"license": font.license,
"usage_example": {
"subtitle_style": {
"font_name": font.id,
"font_size": 36,
"position": "center",
}
},
}

175
backend/app/routers/jobs.py Normal file
View File

@@ -0,0 +1,175 @@
import os
import shutil
from fastapi import APIRouter, HTTPException
from fastapi.responses import FileResponse
from app.models.schemas import JobInfo
from app.models.job_store import job_store
from app.config import settings
router = APIRouter()
@router.get("/", response_model=list[JobInfo])
async def list_jobs(limit: int = 50):
"""List all jobs."""
return job_store.list_jobs(limit=limit)
@router.get("/{job_id}", response_model=JobInfo)
async def get_job(job_id: str):
"""Get job details."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
print(f"[API GET] Job {job_id}: status={job.status}, progress={job.progress}")
return job
@router.delete("/{job_id}")
async def delete_job(job_id: str):
"""Delete a job and its files."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
# Delete associated files
download_dir = os.path.join(settings.DOWNLOAD_DIR, job_id)
processed_dir = os.path.join(settings.PROCESSED_DIR, job_id)
if os.path.exists(download_dir):
shutil.rmtree(download_dir)
if os.path.exists(processed_dir):
shutil.rmtree(processed_dir)
job_store.delete_job(job_id)
return {"message": f"Job {job_id} deleted"}
@router.get("/{job_id}/download")
async def download_output(job_id: str):
"""Download the processed video."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if not job.output_path or not os.path.exists(job.output_path):
raise HTTPException(status_code=404, detail="Output file not found")
return FileResponse(
path=job.output_path,
media_type="video/mp4",
filename=f"shorts_{job_id}.mp4"
)
@router.get("/{job_id}/original")
async def download_original(job_id: str):
"""Download the original video."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if not job.video_path or not os.path.exists(job.video_path):
raise HTTPException(status_code=404, detail="Original video not found")
filename = os.path.basename(job.video_path)
# Disable caching to ensure trimmed video is always fetched fresh
return FileResponse(
path=job.video_path,
media_type="video/mp4",
filename=filename,
headers={
"Cache-Control": "no-cache, no-store, must-revalidate",
"Pragma": "no-cache",
"Expires": "0"
}
)
@router.get("/{job_id}/subtitle")
async def download_subtitle(job_id: str, format: str = "ass"):
"""Download the subtitle file."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if not job.video_path:
raise HTTPException(status_code=404, detail="Video not found")
job_dir = os.path.dirname(job.video_path)
subtitle_path = os.path.join(job_dir, f"subtitle.{format}")
if not os.path.exists(subtitle_path):
# Try to generate from transcript
if job.transcript:
from app.services.transcriber import segments_to_ass, segments_to_srt
if format == "srt":
content = segments_to_srt(job.transcript, use_translated=True)
else:
content = segments_to_ass(job.transcript, use_translated=True)
with open(subtitle_path, "w", encoding="utf-8") as f:
f.write(content)
else:
raise HTTPException(status_code=404, detail="Subtitle not found")
return FileResponse(
path=subtitle_path,
media_type="text/plain",
filename=f"subtitle_{job_id}.{format}"
)
@router.get("/{job_id}/thumbnail")
async def download_thumbnail(job_id: str):
"""Download the generated thumbnail image."""
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
# Check for thumbnail in processed directory
thumbnail_path = os.path.join(settings.PROCESSED_DIR, f"{job_id}_thumbnail.jpg")
if not os.path.exists(thumbnail_path):
raise HTTPException(status_code=404, detail="Thumbnail not found. Generate it first using /process/{job_id}/thumbnail")
return FileResponse(
path=thumbnail_path,
media_type="image/jpeg",
filename=f"thumbnail_{job_id}.jpg"
)
@router.post("/{job_id}/re-edit")
async def re_edit_job(job_id: str):
"""Reset job status to awaiting_review for re-editing."""
from app.models.schemas import JobStatus
job = job_store.get_job(job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job.status != JobStatus.COMPLETED:
raise HTTPException(
status_code=400,
detail="Only completed jobs can be re-edited"
)
# Check if transcript exists for re-editing
if not job.transcript:
raise HTTPException(
status_code=400,
detail="No transcript found. Cannot re-edit."
)
# Reset status to awaiting_review
job_store.update_job(
job_id,
status=JobStatus.AWAITING_REVIEW,
progress=70,
error=None
)
return {"message": "Job ready for re-editing", "job_id": job_id}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,15 @@
from app.services.downloader import download_video, detect_platform, get_video_info
from app.services.transcriber import transcribe_video, segments_to_srt, segments_to_ass
from app.services.translator import (
translate_segments,
translate_single,
generate_shorts_script,
TranslationMode,
)
from app.services.video_processor import (
process_video,
get_video_duration,
extract_audio,
extract_audio_with_noise_reduction,
analyze_audio_noise_level,
)

View File

@@ -0,0 +1,317 @@
"""
Audio separation service using Demucs for vocal/music separation.
Also includes speech vs singing detection.
"""
import subprocess
import os
import shutil
from typing import Optional, Tuple
from pathlib import Path
# Demucs runs in a separate Python 3.11 environment due to compatibility issues
DEMUCS_VENV_PATH = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
"venv_demucs"
)
DEMUCS_PYTHON = os.path.join(DEMUCS_VENV_PATH, "bin", "python")
async def separate_vocals(
input_path: str,
output_dir: str,
model: str = "htdemucs"
) -> Tuple[bool, str, Optional[str], Optional[str]]:
"""
Separate vocals from background music using Demucs.
Args:
input_path: Path to input audio/video file
output_dir: Directory to save separated tracks
model: Demucs model to use (htdemucs, htdemucs_ft, mdx_extra)
Returns:
Tuple of (success, message, vocals_path, no_vocals_path)
"""
if not os.path.exists(input_path):
return False, f"Input file not found: {input_path}", None, None
os.makedirs(output_dir, exist_ok=True)
# Check if Demucs venv exists
if not os.path.exists(DEMUCS_PYTHON):
return False, f"Demucs environment not found at {DEMUCS_VENV_PATH}. Run setup script.", None, None
# Run Demucs for two-stem separation (vocals vs accompaniment)
cmd = [
DEMUCS_PYTHON, "-m", "demucs",
"--two-stems=vocals",
"-n", model,
"-o", output_dir,
input_path
]
try:
print(f"Running Demucs separation: {' '.join(cmd)}")
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=600, # 10 minute timeout
)
if result.returncode != 0:
error_msg = result.stderr[-500:] if result.stderr else "Unknown error"
return False, f"Demucs error: {error_msg}", None, None
# Find output files
# Demucs outputs to: output_dir/model_name/track_name/vocals.wav, no_vocals.wav
input_name = Path(input_path).stem
demucs_output = os.path.join(output_dir, model, input_name)
vocals_path = os.path.join(demucs_output, "vocals.wav")
no_vocals_path = os.path.join(demucs_output, "no_vocals.wav")
if not os.path.exists(vocals_path):
return False, "Vocals file not created", None, None
# Move files to simpler location
final_vocals = os.path.join(output_dir, "vocals.wav")
final_no_vocals = os.path.join(output_dir, "no_vocals.wav")
shutil.move(vocals_path, final_vocals)
if os.path.exists(no_vocals_path):
shutil.move(no_vocals_path, final_no_vocals)
# Clean up Demucs output directory
shutil.rmtree(os.path.join(output_dir, model), ignore_errors=True)
return True, "Vocals separated successfully", final_vocals, final_no_vocals
except subprocess.TimeoutExpired:
return False, "Separation timed out", None, None
except FileNotFoundError:
return False, "Demucs not installed. Run: pip install demucs", None, None
except Exception as e:
return False, f"Separation error: {str(e)}", None, None
async def analyze_vocal_type(
vocals_path: str,
speech_threshold: float = 0.7
) -> Tuple[str, float]:
"""
Analyze if vocal track contains speech or singing.
Uses multiple heuristics:
1. Speech has more silence gaps (pauses between words)
2. Speech has more varied pitch changes
3. Singing has more sustained notes
Args:
vocals_path: Path to vocals audio file
speech_threshold: Threshold for speech detection (0-1)
Returns:
Tuple of (vocal_type, confidence)
vocal_type: "speech", "singing", or "mixed"
"""
if not os.path.exists(vocals_path):
return "unknown", 0.0
# Analyze silence ratio using FFmpeg
# Speech typically has 30-50% silence, singing has less
silence_ratio = await _get_silence_ratio(vocals_path)
# Analyze zero-crossing rate (speech has higher ZCR variance)
zcr_variance = await _get_zcr_variance(vocals_path)
# Analyze spectral flatness (speech has higher flatness)
spectral_score = await _get_spectral_analysis(vocals_path)
# Combine scores
speech_score = 0.0
# High silence ratio indicates speech (pauses between sentences)
if silence_ratio > 0.25:
speech_score += 0.4
elif silence_ratio > 0.15:
speech_score += 0.2
# High spectral variance indicates speech
if spectral_score > 0.5:
speech_score += 0.3
elif spectral_score > 0.3:
speech_score += 0.15
# ZCR variance
if zcr_variance > 0.5:
speech_score += 0.3
elif zcr_variance > 0.3:
speech_score += 0.15
# Determine type
# speech_threshold=0.7: High confidence speech
# singing_threshold=0.4: Below this is likely singing (music)
# Between 0.4-0.7: Mixed or uncertain
if speech_score >= speech_threshold:
return "speech", speech_score
elif speech_score < 0.4:
return "singing", 1.0 - speech_score
else:
# For mixed, lean towards singing if score is closer to lower bound
# This helps avoid transcribing song lyrics as speech
return "mixed", speech_score
async def _get_silence_ratio(audio_path: str, threshold_db: float = -35) -> float:
"""Get ratio of silence in audio file."""
cmd = [
"ffmpeg", "-i", audio_path,
"-af", f"silencedetect=noise={threshold_db}dB:d=0.3",
"-f", "null", "-"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
stderr = result.stderr
# Count silence periods
silence_count = stderr.count("silence_end")
# Get total duration
duration = await _get_audio_duration(audio_path)
if not duration or duration == 0:
return 0.0
# Parse total silence duration
total_silence = 0.0
lines = stderr.split('\n')
for line in lines:
if 'silence_duration' in line:
try:
dur = float(line.split('silence_duration:')[1].strip().split()[0])
total_silence += dur
except (IndexError, ValueError):
pass
return min(total_silence / duration, 1.0)
except Exception:
return 0.0
async def _get_zcr_variance(audio_path: str) -> float:
"""Get zero-crossing rate variance (simplified estimation)."""
# Use FFmpeg to analyze audio stats
cmd = [
"ffmpeg", "-i", audio_path,
"-af", "astats=metadata=1:reset=1",
"-f", "null", "-"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
stderr = result.stderr
# Look for RMS level variations as proxy for ZCR variance
rms_values = []
for line in stderr.split('\n'):
if 'RMS_level' in line:
try:
val = float(line.split(':')[1].strip().split()[0])
if val != float('-inf'):
rms_values.append(val)
except (IndexError, ValueError):
pass
if len(rms_values) > 1:
mean_rms = sum(rms_values) / len(rms_values)
variance = sum((x - mean_rms) ** 2 for x in rms_values) / len(rms_values)
# Normalize to 0-1 range
return min(variance / 100, 1.0)
return 0.3 # Default moderate value
except Exception:
return 0.3
async def _get_spectral_analysis(audio_path: str) -> float:
"""Analyze spectral characteristics (speech has more flat spectrum)."""
# Use volume detect as proxy for spectral analysis
cmd = [
"ffmpeg", "-i", audio_path,
"-af", "volumedetect",
"-f", "null", "-"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
stderr = result.stderr
mean_vol = None
max_vol = None
for line in stderr.split('\n'):
if 'mean_volume' in line:
try:
mean_vol = float(line.split(':')[1].strip().replace(' dB', ''))
except (IndexError, ValueError):
pass
elif 'max_volume' in line:
try:
max_vol = float(line.split(':')[1].strip().replace(' dB', ''))
except (IndexError, ValueError):
pass
if mean_vol is not None and max_vol is not None:
# Large difference between mean and max indicates speech dynamics
diff = abs(max_vol - mean_vol)
# Speech typically has 15-25dB dynamic range
if diff > 20:
return 0.7
elif diff > 12:
return 0.5
else:
return 0.2
return 0.3
except Exception:
return 0.3
async def _get_audio_duration(audio_path: str) -> Optional[float]:
"""Get audio duration in seconds."""
cmd = [
"ffprobe",
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
audio_path
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
if result.returncode == 0:
return float(result.stdout.strip())
except Exception:
pass
return None
async def check_demucs_available() -> bool:
"""Check if Demucs is installed in the dedicated environment."""
if not os.path.exists(DEMUCS_PYTHON):
return False
try:
result = subprocess.run(
[DEMUCS_PYTHON, "-m", "demucs", "--help"],
capture_output=True,
timeout=10
)
return result.returncode == 0
except Exception:
return False

View File

@@ -0,0 +1,495 @@
"""
BGM Provider Service - Freesound & Pixabay Integration
Freesound API: https://freesound.org/docs/api/
- 500,000+ Creative Commons licensed sounds
- Free API with generous rate limits
- Various licenses (CC0, CC-BY, CC-BY-NC, etc.)
Pixabay: Manual download recommended (no public Music API)
"""
import os
import httpx
import aiofiles
from typing import Optional, List, Tuple
from pydantic import BaseModel
from app.config import settings
class FreesoundTrack(BaseModel):
"""Freesound track model."""
id: int
name: str
duration: float # seconds
tags: List[str]
license: str
username: str
preview_url: str # HQ preview (128kbps mp3)
download_url: str # Original file (requires auth)
description: str = ""
class BGMSearchResult(BaseModel):
"""BGM search result."""
id: str
title: str
duration: int
tags: List[str]
preview_url: str
download_url: str = ""
license: str = ""
source: str = "freesound"
# Freesound license filters for commercial use
# CC0 and CC-BY are commercially usable, CC-BY-NC is NOT
COMMERCIAL_LICENSES = [
"Creative Commons 0", # CC0 - Public Domain
"Attribution", # CC-BY - Attribution required
"Attribution Noncommercial", # Exclude this (NOT commercial)
]
# License filter string for commercial-only search
COMMERCIAL_LICENSE_FILTER = 'license:"Creative Commons 0" OR license:"Attribution"'
async def search_freesound(
query: str,
min_duration: int = 10,
max_duration: int = 180, # Shorts typically < 60s, allow some buffer
page: int = 1,
page_size: int = 15,
filter_music: bool = True,
commercial_only: bool = True, # Default: only commercially usable
) -> Tuple[bool, str, List[BGMSearchResult]]:
"""
Search for sounds on Freesound API.
Args:
query: Search keywords (e.g., "upbeat music", "chill background")
min_duration: Minimum duration in seconds
max_duration: Maximum duration in seconds
page: Page number (1-indexed)
page_size: Results per page (max 150)
filter_music: Add "music" to query for better BGM results
commercial_only: Only return commercially usable licenses (CC0, CC-BY)
Returns:
Tuple of (success, message, results)
"""
api_key = settings.FREESOUND_API_KEY
if not api_key:
return False, "Freesound API key not configured. Get one at https://freesound.org/apiv2/apply", []
# Add "music" filter for better BGM results
search_query = f"{query} music" if filter_music and "music" not in query.lower() else query
# Build filter string for duration and license
filter_parts = [f"duration:[{min_duration} TO {max_duration}]"]
if commercial_only:
# Filter for commercially usable licenses only
# CC0 (Creative Commons 0) and CC-BY (Attribution) are commercial-OK
# Exclude CC-BY-NC (Noncommercial)
filter_parts.append('license:"Creative Commons 0"')
filter_str = " ".join(filter_parts)
params = {
"token": api_key,
"query": search_query,
"filter": filter_str,
"page": page,
"page_size": min(page_size, 150),
"fields": "id,name,duration,tags,license,username,previews,description",
"sort": "score", # relevance
}
try:
async with httpx.AsyncClient() as client:
response = await client.get(
"https://freesound.org/apiv2/search/text/",
params=params,
timeout=30,
)
if response.status_code == 401:
return False, "Invalid Freesound API key", []
if response.status_code != 200:
return False, f"Freesound API error: HTTP {response.status_code}", []
data = response.json()
results = []
for sound in data.get("results", []):
# Get preview URLs (prefer high quality)
previews = sound.get("previews", {})
preview_url = (
previews.get("preview-hq-mp3") or
previews.get("preview-lq-mp3") or
""
)
# Parse license for display
license_url = sound.get("license", "")
license_name = _parse_freesound_license(license_url)
results.append(BGMSearchResult(
id=str(sound["id"]),
title=sound.get("name", "Unknown"),
duration=int(sound.get("duration", 0)),
tags=sound.get("tags", [])[:10], # Limit tags
preview_url=preview_url,
download_url=f"https://freesound.org/apiv2/sounds/{sound['id']}/download/",
license=license_name,
source="freesound",
))
total = data.get("count", 0)
license_info = " (commercial use OK)" if commercial_only else ""
message = f"Found {total} sounds on Freesound{license_info}"
return True, message, results
except httpx.TimeoutException:
return False, "Freesound API timeout", []
except Exception as e:
return False, f"Freesound search error: {str(e)}", []
def _parse_freesound_license(license_url: str) -> str:
"""Parse Freesound license URL to human-readable name."""
if "zero" in license_url or "cc0" in license_url.lower():
return "CC0 (Public Domain)"
elif "by-nc" in license_url:
return "CC BY-NC (Non-Commercial)"
elif "by-sa" in license_url:
return "CC BY-SA (Share Alike)"
elif "by/" in license_url:
return "CC BY (Attribution)"
elif "sampling+" in license_url:
return "Sampling+"
else:
return "See License"
async def download_freesound(
sound_id: str,
output_dir: str,
filename: str,
) -> Tuple[bool, str, Optional[str]]:
"""
Download a sound from Freesound.
Note: Freesound requires OAuth for original file downloads.
This function downloads the HQ preview (128kbps MP3) which is sufficient for BGM.
Args:
sound_id: Freesound sound ID
output_dir: Directory to save file
filename: Output filename (without extension)
Returns:
Tuple of (success, message, file_path)
"""
api_key = settings.FREESOUND_API_KEY
if not api_key:
return False, "Freesound API key not configured", None
try:
async with httpx.AsyncClient() as client:
# First, get sound info to get preview URL
info_response = await client.get(
f"https://freesound.org/apiv2/sounds/{sound_id}/",
params={
"token": api_key,
"fields": "id,name,previews,license,username",
},
timeout=30,
)
if info_response.status_code != 200:
return False, f"Failed to get sound info: HTTP {info_response.status_code}", None
sound_data = info_response.json()
previews = sound_data.get("previews", {})
# Get high quality preview URL
preview_url = previews.get("preview-hq-mp3")
if not preview_url:
preview_url = previews.get("preview-lq-mp3")
if not preview_url:
return False, "No preview URL available", None
# Download the preview
audio_response = await client.get(preview_url, timeout=60, follow_redirects=True)
if audio_response.status_code != 200:
return False, f"Download failed: HTTP {audio_response.status_code}", None
# Save file
os.makedirs(output_dir, exist_ok=True)
file_path = os.path.join(output_dir, f"{filename}.mp3")
async with aiofiles.open(file_path, 'wb') as f:
await f.write(audio_response.content)
# Get attribution info
username = sound_data.get("username", "Unknown")
license_name = _parse_freesound_license(sound_data.get("license", ""))
return True, f"Downloaded from Freesound (by {username}, {license_name})", file_path
except httpx.TimeoutException:
return False, "Download timeout", None
except Exception as e:
return False, f"Download error: {str(e)}", None
async def search_and_download_bgm(
keywords: List[str],
output_dir: str,
max_duration: int = 120,
commercial_only: bool = True,
) -> Tuple[bool, str, Optional[str], Optional[BGMSearchResult]]:
"""
Search for BGM and download the best match.
Args:
keywords: Search keywords from BGM recommendation
output_dir: Directory to save downloaded file
max_duration: Maximum duration in seconds
commercial_only: Only search commercially usable licenses (CC0)
Returns:
Tuple of (success, message, file_path, matched_result)
"""
if not settings.FREESOUND_API_KEY:
return False, "Freesound API key not configured", None, None
# Try searching with combined keywords
query = " ".join(keywords[:3])
success, message, results = await search_freesound(
query=query,
min_duration=15,
max_duration=max_duration,
page_size=10,
commercial_only=commercial_only,
)
if not success or not results:
# Try with individual keywords
for keyword in keywords[:3]:
success, message, results = await search_freesound(
query=keyword,
min_duration=15,
max_duration=max_duration,
page_size=5,
commercial_only=commercial_only,
)
if success and results:
break
if not results:
return False, "No matching BGM found on Freesound", None, None
# Select the best result (first one, sorted by relevance)
best_match = results[0]
# Download it
safe_filename = best_match.title.lower().replace(" ", "_")[:50]
safe_filename = "".join(c for c in safe_filename if c.isalnum() or c == "_")
success, download_msg, file_path = await download_freesound(
sound_id=best_match.id,
output_dir=output_dir,
filename=safe_filename,
)
if not success:
return False, download_msg, None, best_match
return True, download_msg, file_path, best_match
async def search_pixabay_music(
query: str = "",
category: str = "",
min_duration: int = 0,
max_duration: int = 120,
page: int = 1,
per_page: int = 20,
) -> Tuple[bool, str, List[BGMSearchResult]]:
"""
Search for royalty-free music on Pixabay.
Note: Pixabay doesn't have a public Music API, returns curated list instead.
"""
# Pixabay's music API is not publicly available
# Return curated recommendations instead
return await _get_curated_bgm_list(query)
async def _get_curated_bgm_list(query: str = "") -> Tuple[bool, str, List[BGMSearchResult]]:
"""
Return curated list of recommended free BGM sources.
Since Pixabay Music API requires special access, we provide curated recommendations.
"""
# Curated BGM recommendations (these are categories/suggestions, not actual files)
curated_bgm = [
{
"id": "upbeat_energetic",
"title": "Upbeat & Energetic",
"duration": 60,
"tags": ["upbeat", "energetic", "happy", "positive"],
"description": "활기찬 쇼츠에 적합",
},
{
"id": "chill_lofi",
"title": "Chill Lo-Fi",
"duration": 60,
"tags": ["chill", "lofi", "relaxing", "calm"],
"description": "편안한 분위기의 콘텐츠",
},
{
"id": "epic_cinematic",
"title": "Epic & Cinematic",
"duration": 60,
"tags": ["epic", "cinematic", "dramatic", "intense"],
"description": "드라마틱한 순간",
},
{
"id": "funny_quirky",
"title": "Funny & Quirky",
"duration": 30,
"tags": ["funny", "quirky", "comedy", "playful"],
"description": "유머러스한 콘텐츠",
},
{
"id": "corporate_tech",
"title": "Corporate & Tech",
"duration": 60,
"tags": ["corporate", "tech", "modern", "professional"],
"description": "정보성 콘텐츠",
},
]
# Filter by query if provided
if query:
query_lower = query.lower()
filtered = [
bgm for bgm in curated_bgm
if query_lower in bgm["title"].lower()
or any(query_lower in tag for tag in bgm["tags"])
]
curated_bgm = filtered if filtered else curated_bgm
results = [
BGMSearchResult(
id=bgm["id"],
title=bgm["title"],
duration=bgm["duration"],
tags=bgm["tags"],
preview_url="", # Would be filled with actual URL
source="curated",
)
for bgm in curated_bgm
]
return True, "Curated BGM list", results
async def download_from_url(
url: str,
output_path: str,
filename: str,
) -> Tuple[bool, str, Optional[str]]:
"""
Download audio file from URL.
Args:
url: Audio file URL
output_path: Directory to save file
filename: Output filename (without extension)
Returns:
Tuple of (success, message, file_path)
"""
try:
os.makedirs(output_path, exist_ok=True)
async with httpx.AsyncClient() as client:
response = await client.get(url, timeout=60, follow_redirects=True)
if response.status_code != 200:
return False, f"Download failed: HTTP {response.status_code}", None
# Determine file extension from content-type
content_type = response.headers.get("content-type", "")
if "mpeg" in content_type:
ext = ".mp3"
elif "wav" in content_type:
ext = ".wav"
elif "ogg" in content_type:
ext = ".ogg"
else:
ext = ".mp3" # Default to mp3
file_path = os.path.join(output_path, f"{filename}{ext}")
with open(file_path, "wb") as f:
f.write(response.content)
return True, "Download complete", file_path
except Exception as e:
return False, f"Download error: {str(e)}", None
# Popular free BGM download links
FREE_BGM_SOURCES = {
"freesound": {
"name": "Freesound",
"url": "https://freesound.org/",
"license": "CC0/CC-BY/CC-BY-NC (Various)",
"description": "500,000+ CC licensed sounds, API available",
"api_available": True,
"api_url": "https://freesound.org/apiv2/apply",
},
"pixabay": {
"name": "Pixabay Music",
"url": "https://pixabay.com/music/",
"license": "Pixabay License (Free for commercial use)",
"description": "Large collection of royalty-free music",
"api_available": False,
},
"mixkit": {
"name": "Mixkit",
"url": "https://mixkit.co/free-stock-music/",
"license": "Mixkit License (Free for commercial use)",
"description": "High-quality free music tracks",
"api_available": False,
},
"uppbeat": {
"name": "Uppbeat",
"url": "https://uppbeat.io/",
"license": "Free tier: 10 tracks/month",
"description": "YouTube-friendly music",
"api_available": False,
},
"youtube_audio_library": {
"name": "YouTube Audio Library",
"url": "https://studio.youtube.com/channel/UC/music",
"license": "Free for YouTube videos",
"description": "Google's free music library",
"api_available": False,
},
}
def get_free_bgm_sources() -> dict:
"""Get list of recommended free BGM sources."""
return FREE_BGM_SOURCES

View File

@@ -0,0 +1,295 @@
"""
BGM Recommender Service
Analyzes script content and recommends appropriate BGM based on mood/tone.
Uses GPT to analyze the emotional tone and suggests matching music.
"""
import os
from typing import List, Tuple, Optional
from openai import OpenAI
from pydantic import BaseModel
from app.config import settings
from app.models.schemas import TranscriptSegment
class BGMRecommendation(BaseModel):
"""BGM recommendation result."""
mood: str # detected mood
energy: str # low, medium, high
suggested_genres: List[str]
search_keywords: List[str]
reasoning: str
matched_bgm_id: Optional[str] = None # if found in local library
# Mood to BGM mapping
MOOD_BGM_MAPPING = {
"upbeat": {
"genres": ["pop", "electronic", "dance"],
"keywords": ["upbeat", "energetic", "happy", "positive"],
"energy": "high",
},
"chill": {
"genres": ["lofi", "ambient", "acoustic"],
"keywords": ["chill", "relaxing", "calm", "peaceful"],
"energy": "low",
},
"dramatic": {
"genres": ["cinematic", "orchestral", "epic"],
"keywords": ["dramatic", "epic", "intense", "cinematic"],
"energy": "high",
},
"funny": {
"genres": ["comedy", "quirky", "playful"],
"keywords": ["funny", "quirky", "comedy", "playful"],
"energy": "medium",
},
"emotional": {
"genres": ["piano", "strings", "ballad"],
"keywords": ["emotional", "sad", "touching", "heartfelt"],
"energy": "low",
},
"informative": {
"genres": ["corporate", "background", "minimal"],
"keywords": ["corporate", "background", "tech", "modern"],
"energy": "medium",
},
"exciting": {
"genres": ["rock", "action", "sports"],
"keywords": ["exciting", "action", "sports", "adventure"],
"energy": "high",
},
"mysterious": {
"genres": ["ambient", "dark", "suspense"],
"keywords": ["mysterious", "suspense", "dark", "tension"],
"energy": "medium",
},
}
async def analyze_script_mood(
segments: List[TranscriptSegment],
use_translated: bool = True,
) -> Tuple[bool, str, Optional[BGMRecommendation]]:
"""
Analyze script content to determine mood and recommend BGM.
Args:
segments: Transcript segments (original or translated)
use_translated: Whether to use translated text
Returns:
Tuple of (success, message, recommendation)
"""
if not settings.OPENAI_API_KEY:
return False, "OpenAI API key not configured", None
if not segments:
return False, "No transcript segments provided", None
# Combine script text
script_text = "\n".join([
seg.translated if use_translated and seg.translated else seg.text
for seg in segments
])
try:
client = OpenAI(api_key=settings.OPENAI_API_KEY)
response = client.chat.completions.create(
model=settings.OPENAI_MODEL,
messages=[
{
"role": "system",
"content": """You are a music supervisor for YouTube Shorts.
Analyze the script and determine the best background music mood.
Respond in JSON format ONLY:
{
"mood": "one of: upbeat, chill, dramatic, funny, emotional, informative, exciting, mysterious",
"energy": "low, medium, or high",
"reasoning": "brief explanation in Korean (1 sentence)"
}
Consider:
- Overall emotional tone of the content
- Pacing and energy level
- Target audience engagement
- What would make viewers watch till the end"""
},
{
"role": "user",
"content": f"Script:\n{script_text}"
}
],
temperature=0.3,
max_tokens=200,
)
# Parse response
import json
result_text = response.choices[0].message.content.strip()
# Clean up JSON if wrapped in markdown
if result_text.startswith("```"):
result_text = result_text.split("```")[1]
if result_text.startswith("json"):
result_text = result_text[4:]
result = json.loads(result_text)
mood = result.get("mood", "upbeat")
energy = result.get("energy", "medium")
reasoning = result.get("reasoning", "")
# Get BGM suggestions based on mood
mood_info = MOOD_BGM_MAPPING.get(mood, MOOD_BGM_MAPPING["upbeat"])
recommendation = BGMRecommendation(
mood=mood,
energy=energy,
suggested_genres=mood_info["genres"],
search_keywords=mood_info["keywords"],
reasoning=reasoning,
)
return True, f"Mood analysis complete: {mood}", recommendation
except json.JSONDecodeError as e:
return False, f"Failed to parse mood analysis: {str(e)}", None
except Exception as e:
return False, f"Mood analysis error: {str(e)}", None
async def find_matching_bgm(
recommendation: BGMRecommendation,
available_bgm: List[dict],
) -> Optional[str]:
"""
Find a matching BGM from available library based on recommendation.
Args:
recommendation: BGM recommendation from mood analysis
available_bgm: List of available BGM info dicts with 'id' and 'name'
Returns:
BGM ID if found, None otherwise
"""
if not available_bgm:
return None
keywords = recommendation.search_keywords + [recommendation.mood]
# Score each BGM based on keyword matching
best_match = None
best_score = 0
for bgm in available_bgm:
bgm_name = bgm.get("name", "").lower()
bgm_id = bgm.get("id", "").lower()
score = 0
for keyword in keywords:
if keyword.lower() in bgm_name or keyword.lower() in bgm_id:
score += 1
if score > best_score:
best_score = score
best_match = bgm.get("id")
return best_match if best_score > 0 else None
async def recommend_bgm_for_script(
segments: List[TranscriptSegment],
available_bgm: List[dict],
use_translated: bool = True,
) -> Tuple[bool, str, Optional[BGMRecommendation]]:
"""
Complete BGM recommendation workflow:
1. Analyze script mood
2. Find matching BGM from library
3. Return recommendation with search keywords for external sources
Args:
segments: Transcript segments
available_bgm: List of available BGM in library
use_translated: Whether to use translated text
Returns:
Tuple of (success, message, recommendation with matched_bgm_id if found)
"""
# Step 1: Analyze mood
success, message, recommendation = await analyze_script_mood(
segments, use_translated
)
if not success or not recommendation:
return success, message, recommendation
# Step 2: Find matching BGM in library
matched_id = await find_matching_bgm(recommendation, available_bgm)
if matched_id:
recommendation.matched_bgm_id = matched_id
message = f"Mood: {recommendation.mood} | Matched BGM: {matched_id}"
else:
message = f"Mood: {recommendation.mood} | No local BGM matched, search with: {', '.join(recommendation.search_keywords[:3])}"
return True, message, recommendation
# Predefined BGM presets for common content types
BGM_PRESETS = {
"cooking": {
"mood": "chill",
"keywords": ["cooking", "food", "kitchen", "cozy"],
},
"fitness": {
"mood": "upbeat",
"keywords": ["workout", "fitness", "energetic", "motivation"],
},
"tutorial": {
"mood": "informative",
"keywords": ["tutorial", "tech", "corporate", "background"],
},
"comedy": {
"mood": "funny",
"keywords": ["funny", "comedy", "quirky", "playful"],
},
"travel": {
"mood": "exciting",
"keywords": ["travel", "adventure", "upbeat", "inspiring"],
},
"asmr": {
"mood": "chill",
"keywords": ["asmr", "relaxing", "ambient", "soft"],
},
"news": {
"mood": "informative",
"keywords": ["news", "corporate", "serious", "background"],
},
"gaming": {
"mood": "exciting",
"keywords": ["gaming", "electronic", "action", "intense"],
},
}
def get_preset_recommendation(content_type: str) -> Optional[BGMRecommendation]:
"""Get BGM recommendation for common content types."""
preset = BGM_PRESETS.get(content_type.lower())
if not preset:
return None
mood = preset["mood"]
mood_info = MOOD_BGM_MAPPING.get(mood, MOOD_BGM_MAPPING["upbeat"])
return BGMRecommendation(
mood=mood,
energy=mood_info["energy"],
suggested_genres=mood_info["genres"],
search_keywords=preset["keywords"],
reasoning=f"Preset for {content_type} content",
)

View File

@@ -0,0 +1,297 @@
"""
Default BGM Initializer
Downloads pre-selected royalty-free BGM tracks on first startup.
Tracks are from Kevin MacLeod (incompetech.com) - CC-BY 4.0 License.
Free for commercial use with attribution: "Kevin MacLeod (incompetech.com)"
"""
import os
import httpx
import aiofiles
import asyncio
from typing import List, Tuple, Optional
from pydantic import BaseModel
class DefaultBGM(BaseModel):
"""Default BGM track info."""
id: str
name: str
url: str
category: str
description: str
# Curated list of royalty-free BGM from Kevin MacLeod (incompetech.com)
# CC-BY 4.0 License - Free for commercial use with attribution
# Attribution: "Kevin MacLeod (incompetech.com)"
DEFAULT_BGM_TRACKS: List[DefaultBGM] = [
# === 활기찬/에너지 (Upbeat/Energetic) ===
DefaultBGM(
id="upbeat_energetic",
name="Upbeat Energetic",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Vivacity.mp3",
category="upbeat",
description="활기차고 에너지 넘치는 BGM - 피트니스, 챌린지 영상",
),
DefaultBGM(
id="happy_pop",
name="Happy Pop",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Carefree.mp3",
category="upbeat",
description="밝고 경쾌한 팝 BGM - 제품 소개, 언박싱",
),
DefaultBGM(
id="upbeat_fun",
name="Upbeat Fun",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Happy%20Happy%20Game%20Show.mp3",
category="upbeat",
description="신나는 게임쇼 비트 - 트렌디한 쇼츠",
),
# === 차분한/편안한 (Chill/Relaxing) ===
DefaultBGM(
id="chill_lofi",
name="Chill Lo-Fi",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Gymnopedie%20No%201.mp3",
category="chill",
description="차분하고 편안한 피아노 BGM - 일상, 브이로그",
),
DefaultBGM(
id="calm_piano",
name="Calm Piano",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Prelude%20No.%201.mp3",
category="chill",
description="잔잔한 피아노 BGM - 감성적인 콘텐츠",
),
DefaultBGM(
id="soft_ambient",
name="Soft Ambient",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Dreamlike.mp3",
category="chill",
description="부드러운 앰비언트 - ASMR, 수면 콘텐츠",
),
# === 유머/코미디 (Funny/Comedy) ===
DefaultBGM(
id="funny_comedy",
name="Funny Comedy",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Sneaky%20Snitch.mp3",
category="funny",
description="유쾌한 코미디 BGM - 코미디, 밈 영상",
),
DefaultBGM(
id="quirky_playful",
name="Quirky Playful",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Monkeys%20Spinning%20Monkeys.mp3",
category="funny",
description="장난스럽고 귀여운 BGM - 펫, 키즈 콘텐츠",
),
# === 드라마틱/시네마틱 (Cinematic) ===
DefaultBGM(
id="cinematic_epic",
name="Cinematic Epic",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Epic%20Unease.mp3",
category="cinematic",
description="웅장한 시네마틱 BGM - 리뷰, 소개 영상",
),
DefaultBGM(
id="inspirational",
name="Inspirational",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Hero%20Theme.mp3",
category="cinematic",
description="영감을 주는 BGM - 동기부여, 성장 콘텐츠",
),
# === 생활용품/제품 리뷰 (Lifestyle/Product) ===
DefaultBGM(
id="lifestyle_modern",
name="Lifestyle Modern",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Acoustic%20Breeze.mp3",
category="lifestyle",
description="모던한 라이프스타일 BGM - 제품 리뷰",
),
DefaultBGM(
id="shopping_bright",
name="Shopping Bright",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Pleasant%20Porridge.mp3",
category="lifestyle",
description="밝은 쇼핑 BGM - 하울, 추천 영상",
),
DefaultBGM(
id="soft_corporate",
name="Soft Corporate",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Laid%20Back%20Guitars.mp3",
category="lifestyle",
description="부드러운 기업형 BGM - 정보성 콘텐츠",
),
# === 어쿠스틱/감성 (Acoustic/Emotional) ===
DefaultBGM(
id="soft_acoustic",
name="Soft Acoustic",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Peaceful.mp3",
category="acoustic",
description="따뜻한 어쿠스틱 BGM - 요리, 일상 브이로그",
),
DefaultBGM(
id="gentle_guitar",
name="Gentle Guitar",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Sunflower%20Slow%20Drag.mp3",
category="acoustic",
description="잔잔한 기타 BGM - 여행, 풍경 영상",
),
# === 트렌디/일렉트로닉 (Trendy/Electronic) ===
DefaultBGM(
id="electronic_chill",
name="Electronic Chill",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Digital%20Lemonade.mp3",
category="electronic",
description="일렉트로닉 칠아웃 - 테크, 게임 콘텐츠",
),
DefaultBGM(
id="driving_beat",
name="Driving Beat",
url="https://incompetech.com/music/royalty-free/mp3-royaltyfree/Cipher.mp3",
category="electronic",
description="드라이빙 비트 - 스포츠, 액션 영상",
),
]
async def download_bgm_file(
url: str,
output_path: str,
timeout: int = 60,
) -> Tuple[bool, str]:
"""
Download a single BGM file.
Args:
url: Download URL
output_path: Full path to save the file
timeout: Download timeout in seconds
Returns:
Tuple of (success, message)
"""
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept": "audio/mpeg,audio/*;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
try:
async with httpx.AsyncClient(follow_redirects=True, headers=headers) as client:
response = await client.get(url, timeout=timeout)
if response.status_code != 200:
return False, f"HTTP {response.status_code}"
# Ensure directory exists
os.makedirs(os.path.dirname(output_path), exist_ok=True)
# Save file
async with aiofiles.open(output_path, 'wb') as f:
await f.write(response.content)
return True, "Downloaded successfully"
except httpx.TimeoutException:
return False, "Download timeout"
except Exception as e:
return False, str(e)
async def initialize_default_bgm(
bgm_dir: str,
force: bool = False,
) -> Tuple[int, int, List[str]]:
"""
Initialize default BGM tracks.
Downloads default BGM tracks if not already present.
Args:
bgm_dir: Directory to save BGM files
force: Force re-download even if files exist
Returns:
Tuple of (downloaded_count, skipped_count, error_messages)
"""
os.makedirs(bgm_dir, exist_ok=True)
downloaded = 0
skipped = 0
errors = []
for track in DEFAULT_BGM_TRACKS:
output_path = os.path.join(bgm_dir, f"{track.id}.mp3")
# Skip if already exists (unless force=True)
if os.path.exists(output_path) and not force:
skipped += 1
print(f"[BGM] Skipping {track.name} (already exists)")
continue
print(f"[BGM] Downloading {track.name}...")
success, message = await download_bgm_file(track.url, output_path)
if success:
downloaded += 1
print(f"[BGM] Downloaded {track.name}")
else:
errors.append(f"{track.name}: {message}")
print(f"[BGM] Failed to download {track.name}: {message}")
return downloaded, skipped, errors
async def get_default_bgm_list() -> List[dict]:
"""
Get list of default BGM tracks with metadata.
Returns:
List of BGM info dictionaries
"""
return [
{
"id": track.id,
"name": track.name,
"category": track.category,
"description": track.description,
}
for track in DEFAULT_BGM_TRACKS
]
def check_default_bgm_status(bgm_dir: str) -> dict:
"""
Check which default BGM tracks are installed.
Args:
bgm_dir: BGM directory path
Returns:
Status dictionary with installed/missing tracks
"""
installed = []
missing = []
for track in DEFAULT_BGM_TRACKS:
file_path = os.path.join(bgm_dir, f"{track.id}.mp3")
if os.path.exists(file_path):
installed.append(track.id)
else:
missing.append(track.id)
return {
"total": len(DEFAULT_BGM_TRACKS),
"installed": len(installed),
"missing": len(missing),
"installed_ids": installed,
"missing_ids": missing,
}

View File

@@ -0,0 +1,158 @@
import subprocess
import os
import re
from typing import Optional, Tuple
from app.config import settings
def detect_platform(url: str) -> str:
"""Detect video platform from URL."""
if "douyin" in url or "iesdouyin" in url:
return "douyin"
elif "kuaishou" in url or "gifshow" in url:
return "kuaishou"
elif "bilibili" in url:
return "bilibili"
elif "youtube" in url or "youtu.be" in url:
return "youtube"
elif "tiktok" in url:
return "tiktok"
else:
return "unknown"
def sanitize_filename(filename: str) -> str:
"""Sanitize filename to be safe for filesystem."""
# Remove or replace invalid characters
filename = re.sub(r'[<>:"/\\|?*]', '_', filename)
# Limit length
if len(filename) > 100:
filename = filename[:100]
return filename
def get_cookies_path(platform: str) -> Optional[str]:
"""Get cookies file path for a platform."""
cookies_dir = os.path.join(os.path.dirname(settings.DOWNLOAD_DIR), "cookies")
# Check for platform-specific cookies first (e.g., douyin.txt)
platform_cookies = os.path.join(cookies_dir, f"{platform}.txt")
if os.path.exists(platform_cookies):
return platform_cookies
# Check for generic cookies.txt
generic_cookies = os.path.join(cookies_dir, "cookies.txt")
if os.path.exists(generic_cookies):
return generic_cookies
return None
async def download_video(url: str, job_id: str) -> Tuple[bool, str, Optional[str]]:
"""
Download video using yt-dlp.
Returns:
Tuple of (success, message, video_path)
"""
output_dir = os.path.join(settings.DOWNLOAD_DIR, job_id)
os.makedirs(output_dir, exist_ok=True)
output_template = os.path.join(output_dir, "%(title).50s.%(ext)s")
# yt-dlp command with options for Chinese platforms
cmd = [
"yt-dlp",
"--no-playlist",
"-f", "best[ext=mp4]/best",
"--merge-output-format", "mp4",
"-o", output_template,
"--no-check-certificate",
"--socket-timeout", "30",
"--retries", "3",
"--user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
]
platform = detect_platform(url)
# Add cookies if available (required for Douyin, Kuaishou)
cookies_path = get_cookies_path(platform)
if cookies_path:
cmd.extend(["--cookies", cookies_path])
print(f"Using cookies from: {cookies_path}")
elif platform in ["douyin", "kuaishou", "bilibili"]:
# Try to use browser cookies if no cookies file
# Priority: Chrome > Firefox > Edge
cmd.extend(["--cookies-from-browser", "chrome"])
print(f"Using cookies from Chrome browser for {platform}")
# Platform-specific options
if platform in ["douyin", "kuaishou"]:
# Use browser impersonation for anti-bot bypass
cmd.extend([
"--impersonate", "chrome-123:macos-14",
"--extractor-args", "generic:impersonate",
])
# Add proxy if configured (for geo-restricted platforms)
if settings.PROXY_URL:
cmd.extend(["--proxy", settings.PROXY_URL])
print(f"Using proxy: {settings.PROXY_URL}")
cmd.append(url)
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=300, # 5 minute timeout
)
if result.returncode != 0:
error_msg = result.stderr or result.stdout or "Unknown error"
return False, f"Download failed: {error_msg}", None
# Find the downloaded file
for file in os.listdir(output_dir):
if file.endswith((".mp4", ".webm", ".mkv")):
video_path = os.path.join(output_dir, file)
return True, "Download successful", video_path
return False, "No video file found after download", None
except subprocess.TimeoutExpired:
return False, "Download timed out (5 minutes)", None
except Exception as e:
return False, f"Download error: {str(e)}", None
def get_video_info(url: str) -> Optional[dict]:
"""Get video metadata without downloading."""
cmd = [
"yt-dlp",
"-j", # JSON output
"--no-download",
]
# Add proxy if configured
if settings.PROXY_URL:
cmd.extend(["--proxy", settings.PROXY_URL])
cmd.append(url)
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=60,
)
if result.returncode == 0:
import json
return json.loads(result.stdout)
except Exception:
pass
return None

View File

@@ -0,0 +1,399 @@
"""
Thumbnail Generator Service
Generates YouTube Shorts thumbnails with:
1. Frame extraction from video
2. GPT-generated catchphrase
3. Text overlay with styling
"""
import os
import subprocess
import asyncio
from typing import Optional, Tuple, List
from openai import OpenAI
from PIL import Image, ImageDraw, ImageFont
from app.config import settings
from app.models.schemas import TranscriptSegment
def get_openai_client() -> OpenAI:
"""Get OpenAI client."""
return OpenAI(api_key=settings.OPENAI_API_KEY)
async def extract_frame(
video_path: str,
output_path: str,
timestamp: float = 2.0,
) -> Tuple[bool, str]:
"""
Extract a single frame from video.
Args:
video_path: Path to video file
output_path: Path to save thumbnail image
timestamp: Time in seconds to extract frame
Returns:
Tuple of (success, message)
"""
try:
cmd = [
"ffmpeg", "-y",
"-ss", str(timestamp),
"-i", video_path,
"-vframes", "1",
"-q:v", "2", # High quality JPEG
output_path
]
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
_, stderr = await process.communicate()
if process.returncode != 0:
return False, f"FFmpeg error: {stderr.decode()[:200]}"
if not os.path.exists(output_path):
return False, "Frame extraction failed - no output file"
return True, "Frame extracted successfully"
except Exception as e:
return False, f"Frame extraction error: {str(e)}"
async def generate_catchphrase(
transcript: List[TranscriptSegment],
style: str = "homeshopping",
) -> Tuple[bool, str, str]:
"""
Generate a catchy thumbnail text using GPT.
Args:
transcript: List of transcript segments (with translations)
style: Style of catchphrase (homeshopping, viral, informative)
Returns:
Tuple of (success, message, catchphrase)
"""
if not settings.OPENAI_API_KEY:
return False, "OpenAI API key not configured", ""
try:
client = get_openai_client()
# Combine translated text
if transcript and transcript[0].translated:
full_text = " ".join([seg.translated for seg in transcript if seg.translated])
else:
full_text = " ".join([seg.text for seg in transcript])
style_guides = {
"homeshopping": """홈쇼핑 스타일의 임팩트 있는 문구를 만드세요.
- "이거 하나면 끝!" 같은 강렬한 어필
- 혜택/효과 강조
- 숫자 활용 (예: "10초만에", "50% 절약")
- 질문형도 OK (예: "아직도 힘들게?")""",
"viral": """바이럴 쇼츠 스타일의 호기심 유발 문구를 만드세요.
- 궁금증 유발
- 반전/놀라움 암시
- 이모지 1-2개 사용 가능""",
"informative": """정보성 콘텐츠 스타일의 명확한 문구를 만드세요.
- 핵심 정보 전달
- 간결하고 명확하게""",
}
style_guide = style_guides.get(style, style_guides["homeshopping"])
system_prompt = f"""당신은 YouTube Shorts 썸네일 문구 전문가입니다.
{style_guide}
규칙:
- 반드시 15자 이내!
- 한 줄로 작성
- 한글만 사용 (영어/한자 금지)
- 출력은 문구만! (설명 없이)
예시 출력:
이거 하나면 끝!
10초면 완성!
아직도 힘들게?
진짜 이게 돼요?"""
response = client.chat.completions.create(
model=settings.OPENAI_MODEL,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"다음 영상 내용으로 썸네일 문구를 만들어주세요:\n\n{full_text[:500]}"}
],
temperature=0.8,
max_tokens=50,
)
catchphrase = response.choices[0].message.content.strip()
# Clean up
catchphrase = catchphrase.strip('"\'""''')
# Ensure max length
if len(catchphrase) > 20:
catchphrase = catchphrase[:20]
return True, "Catchphrase generated", catchphrase
except Exception as e:
return False, f"GPT error: {str(e)}", ""
def add_text_overlay(
image_path: str,
output_path: str,
text: str,
font_size: int = 80,
font_color: str = "#FFFFFF",
stroke_color: str = "#000000",
stroke_width: int = 4,
position: str = "center",
font_name: str = "NanumGothicBold",
) -> Tuple[bool, str]:
"""
Add text overlay to image using PIL.
Args:
image_path: Input image path
output_path: Output image path
text: Text to overlay
font_size: Font size in pixels
font_color: Text color (hex)
stroke_color: Outline color (hex)
stroke_width: Outline thickness
position: Text position (top, center, bottom)
font_name: Font family name
Returns:
Tuple of (success, message)
"""
try:
# Open image
img = Image.open(image_path)
draw = ImageDraw.Draw(img)
img_width, img_height = img.size
# Maximum text width (90% of image width)
max_text_width = int(img_width * 0.9)
# Try to load font
def load_font(size):
font_paths = [
f"/usr/share/fonts/truetype/nanum/{font_name}.ttf",
f"/usr/share/fonts/opentype/nanum/{font_name}.otf",
f"/System/Library/Fonts/{font_name}.ttf",
f"/Library/Fonts/{font_name}.ttf",
f"~/Library/Fonts/{font_name}.ttf",
f"/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
]
for path in font_paths:
expanded_path = os.path.expanduser(path)
if os.path.exists(expanded_path):
try:
return ImageFont.truetype(expanded_path, size)
except:
continue
return None
font = load_font(font_size)
if font is None:
font = ImageFont.load_default()
font_size = 40
# Check text width and adjust if necessary
bbox = draw.textbbox((0, 0), text, font=font)
text_width = bbox[2] - bbox[0]
lines = [text]
if text_width > max_text_width:
# Try splitting into 2 lines first
mid = len(text) // 2
# Find best split point near middle (at space or comma if exists)
split_pos = mid
for i in range(mid, max(0, mid - 5), -1):
if text[i] in ' ,、,':
split_pos = i + 1
break
for i in range(mid, min(len(text), mid + 5)):
if text[i] in ' ,、,':
split_pos = i + 1
break
# Split text into 2 lines
line1 = text[:split_pos].strip()
line2 = text[split_pos:].strip()
lines = [line1, line2] if line2 else [line1]
# Check if 2-line version fits
max_line_width = max(
draw.textbbox((0, 0), line, font=font)[2] - draw.textbbox((0, 0), line, font=font)[0]
for line in lines
)
# If still too wide, reduce font size
while max_line_width > max_text_width and font_size > 40:
font_size -= 5
font = load_font(font_size)
if font is None:
font = ImageFont.load_default()
break
max_line_width = max(
draw.textbbox((0, 0), line, font=font)[2] - draw.textbbox((0, 0), line, font=font)[0]
for line in lines
)
# Calculate total text height for multi-line
line_height = font_size + 10
total_height = line_height * len(lines)
# Calculate starting y position
if position == "top":
start_y = img_height // 6
elif position == "bottom":
start_y = img_height - img_height // 4 - total_height
else: # center
start_y = (img_height - total_height) // 2
# Convert hex colors to RGB
def hex_to_rgb(hex_color):
hex_color = hex_color.lstrip('#')
return tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
text_rgb = hex_to_rgb(font_color)
stroke_rgb = hex_to_rgb(stroke_color)
# Draw each line
for i, line in enumerate(lines):
bbox = draw.textbbox((0, 0), line, font=font)
line_width = bbox[2] - bbox[0]
# Account for left bearing (bbox[0]) to prevent first character cut-off
# Some fonts/characters have non-zero left offset
x = (img_width - line_width) // 2 - bbox[0]
y = start_y + i * line_height
# Draw text with stroke (outline)
for dx in range(-stroke_width, stroke_width + 1):
for dy in range(-stroke_width, stroke_width + 1):
if dx != 0 or dy != 0:
draw.text((x + dx, y + dy), line, font=font, fill=stroke_rgb)
# Draw main text
draw.text((x, y), line, font=font, fill=text_rgb)
# Save
img.save(output_path, "JPEG", quality=95)
return True, "Text overlay added"
except Exception as e:
return False, f"Text overlay error: {str(e)}"
async def generate_thumbnail(
job_id: str,
video_path: str,
transcript: List[TranscriptSegment],
timestamp: float = 2.0,
style: str = "homeshopping",
custom_text: Optional[str] = None,
font_size: int = 80,
position: str = "center",
) -> Tuple[bool, str, Optional[str]]:
"""
Generate a complete thumbnail with text overlay.
Args:
job_id: Job ID for naming
video_path: Path to video file
transcript: Transcript segments
timestamp: Time to extract frame
style: Catchphrase style
custom_text: Custom text (skip GPT generation)
font_size: Font size
position: Text position
Returns:
Tuple of (success, message, thumbnail_path)
"""
# Paths
frame_path = os.path.join(settings.PROCESSED_DIR, f"{job_id}_frame.jpg")
thumbnail_path = os.path.join(settings.PROCESSED_DIR, f"{job_id}_thumbnail.jpg")
# Step 1: Extract frame
success, msg = await extract_frame(video_path, frame_path, timestamp)
if not success:
return False, msg, None
# Step 2: Generate or use custom text
if custom_text:
catchphrase = custom_text
else:
success, msg, catchphrase = await generate_catchphrase(transcript, style)
if not success:
# Fallback: use first translation
catchphrase = transcript[0].translated if transcript and transcript[0].translated else "확인해보세요!"
# Step 3: Add text overlay
success, msg = add_text_overlay(
frame_path,
thumbnail_path,
catchphrase,
font_size=font_size,
position=position,
)
if not success:
return False, msg, None
# Cleanup frame
if os.path.exists(frame_path):
os.remove(frame_path)
return True, f"Thumbnail generated: {catchphrase}", thumbnail_path
async def get_video_timestamps(video_path: str, count: int = 5) -> List[float]:
"""
Get evenly distributed timestamps from video for thumbnail selection.
Args:
video_path: Path to video
count: Number of timestamps to return
Returns:
List of timestamps in seconds
"""
try:
cmd = [
"ffprobe", "-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
video_path
]
result = subprocess.run(cmd, capture_output=True, text=True)
duration = float(result.stdout.strip())
# Generate evenly distributed timestamps (skip first and last 10%)
start = duration * 0.1
end = duration * 0.9
step = (end - start) / (count - 1) if count > 1 else 0
timestamps = [start + i * step for i in range(count)]
return timestamps
except Exception:
return [1.0, 3.0, 5.0, 7.0, 10.0] # Fallback

View File

@@ -0,0 +1,421 @@
import whisper
import asyncio
import os
from typing import List, Optional, Tuple
from app.models.schemas import TranscriptSegment
from app.config import settings
# Global model cache
_model = None
def get_whisper_model():
"""Load Whisper model (cached)."""
global _model
if _model is None:
print(f"Loading Whisper model: {settings.WHISPER_MODEL}")
_model = whisper.load_model(settings.WHISPER_MODEL)
return _model
async def check_audio_availability(video_path: str) -> Tuple[bool, str]:
"""
Check if video has usable audio for transcription.
Returns:
Tuple of (has_audio, message)
"""
from app.services.video_processor import has_audio_stream, get_audio_volume_info, is_audio_silent
# Check if audio stream exists
if not await has_audio_stream(video_path):
return False, "no_audio_stream"
# Check if audio is silent
volume_info = await get_audio_volume_info(video_path)
if is_audio_silent(volume_info):
return False, "audio_silent"
return True, "audio_ok"
async def transcribe_video(
video_path: str,
use_noise_reduction: bool = True,
noise_reduction_level: str = "medium",
use_vocal_separation: bool = False,
progress_callback: Optional[callable] = None,
) -> Tuple[bool, str, Optional[List[TranscriptSegment]]]:
"""
Transcribe video audio using Whisper.
Args:
video_path: Path to video file
use_noise_reduction: Whether to apply noise reduction before transcription
noise_reduction_level: "light", "medium", or "heavy"
use_vocal_separation: Whether to separate vocals from background music first
progress_callback: Optional async callback function(step: str, progress: int) for progress updates
Returns:
Tuple of (success, message, segments, detected_language)
- success=False with message="NO_AUDIO" means video has no audio
- success=False with message="SILENT_AUDIO" means audio is too quiet
- success=False with message="SINGING_ONLY" means only singing detected (no speech)
"""
# Helper to call progress callback if provided
async def report_progress(step: str, progress: int):
print(f"[Transcriber] report_progress: {step} ({progress}%), has_callback: {progress_callback is not None}")
if progress_callback:
await progress_callback(step, progress)
if not os.path.exists(video_path):
return False, f"Video file not found: {video_path}", None, None
# Check audio availability
has_audio, audio_status = await check_audio_availability(video_path)
if not has_audio:
if audio_status == "no_audio_stream":
return False, "NO_AUDIO", None, None
elif audio_status == "audio_silent":
return False, "SILENT_AUDIO", None, None
audio_path = video_path # Default to video path (Whisper can handle it)
temp_files = [] # Track temp files for cleanup
try:
video_dir = os.path.dirname(video_path)
# Step 1: Vocal separation (if enabled)
if use_vocal_separation:
from app.services.audio_separator import separate_vocals, analyze_vocal_type
await report_progress("vocal_separation", 15)
print("Separating vocals from background music...")
separation_dir = os.path.join(video_dir, "separated")
success, message, vocals_path, _ = await separate_vocals(
video_path,
separation_dir
)
if success and vocals_path:
print(f"Vocal separation complete: {vocals_path}")
temp_files.append(separation_dir)
# Analyze if vocals are speech or singing
print("Analyzing vocal type (speech vs singing)...")
vocal_type, confidence = await analyze_vocal_type(vocals_path)
print(f"Vocal analysis: {vocal_type} (confidence: {confidence:.2f})")
# Treat as singing if:
# 1. Explicitly detected as singing
# 2. Mixed with low confidence (< 0.6) - likely music, not clear speech
if vocal_type == "singing" or (vocal_type == "mixed" and confidence < 0.6):
# Only singing/music detected - no clear speech to transcribe
_cleanup_temp_files(temp_files)
reason = "SINGING_ONLY" if vocal_type == "singing" else "MUSIC_DOMINANT"
print(f"No clear speech detected ({reason}), awaiting manual subtitle")
return False, "SINGING_ONLY", None, None
# Use vocals for transcription
audio_path = vocals_path
else:
print(f"Vocal separation failed: {message}, continuing with original audio")
# Step 2: Apply noise reduction (if enabled and not using separated vocals)
if use_noise_reduction and audio_path == video_path:
from app.services.video_processor import extract_audio_with_noise_reduction
await report_progress("extracting_audio", 20)
cleaned_path = os.path.join(video_dir, "audio_cleaned.wav")
await report_progress("noise_reduction", 25)
print(f"Applying {noise_reduction_level} noise reduction...")
success, message = await extract_audio_with_noise_reduction(
video_path,
cleaned_path,
noise_reduction_level
)
if success:
print(f"Noise reduction complete: {message}")
audio_path = cleaned_path
temp_files.append(cleaned_path)
else:
print(f"Noise reduction failed: {message}, falling back to original audio")
# Step 3: Transcribe with Whisper
await report_progress("transcribing", 35)
model = get_whisper_model()
print(f"Transcribing audio: {audio_path}")
# Run Whisper in thread pool to avoid blocking the event loop
result = await asyncio.to_thread(
model.transcribe,
audio_path,
task="transcribe",
language=None, # Auto-detect
verbose=False,
word_timestamps=True,
)
# Split long segments using word-level timestamps
segments = _split_segments_by_words(
result.get("segments", []),
max_duration=2.0, # Maximum segment duration in seconds (shorter for better sync)
min_words=1, # Minimum words per segment
)
# Clean up temp files
_cleanup_temp_files(temp_files)
detected_lang = result.get("language", "unknown")
print(f"Detected language: {detected_lang}")
extras = []
if use_vocal_separation:
extras.append("vocal separation")
if use_noise_reduction:
extras.append(f"noise reduction: {noise_reduction_level}")
extra_info = f" ({', '.join(extras)})" if extras else ""
# Return tuple with 4 elements: success, message, segments, detected_language
return True, f"Transcription complete (detected: {detected_lang}){extra_info}", segments, detected_lang
except Exception as e:
_cleanup_temp_files(temp_files)
return False, f"Transcription error: {str(e)}", None, None
def _split_segments_by_words(
raw_segments: list,
max_duration: float = 4.0,
min_words: int = 2,
) -> List[TranscriptSegment]:
"""
Split long Whisper segments into shorter ones using word-level timestamps.
Args:
raw_segments: Raw segments from Whisper output
max_duration: Maximum duration for each segment in seconds
min_words: Minimum words per segment (to avoid single-word segments)
Returns:
List of TranscriptSegment with shorter durations
"""
segments = []
for seg in raw_segments:
words = seg.get("words", [])
seg_text = seg.get("text", "").strip()
seg_start = seg.get("start", 0)
seg_end = seg.get("end", 0)
seg_duration = seg_end - seg_start
# If no word timestamps or segment is short enough, use as-is
if not words or seg_duration <= max_duration:
segments.append(TranscriptSegment(
start=seg_start,
end=seg_end,
text=seg_text,
))
continue
# Split segment using word timestamps
current_words = []
current_start = None
for i, word in enumerate(words):
word_start = word.get("start", seg_start)
word_end = word.get("end", seg_end)
word_text = word.get("word", "").strip()
if not word_text:
continue
# Start a new segment
if current_start is None:
current_start = word_start
current_words.append(word_text)
current_duration = word_end - current_start
# Check if we should split here
is_last_word = (i == len(words) - 1)
should_split = False
if is_last_word:
should_split = True
elif current_duration >= max_duration and len(current_words) >= min_words:
should_split = True
elif current_duration >= max_duration * 0.5:
# Split at natural break points (punctuation) more aggressively
if word_text.endswith((',', '.', '!', '?', '', '', '', '', '', '', ';')):
should_split = True
elif current_duration >= 1.0 and word_text.endswith(('', '', '', '.', '!', '?')):
# Always split at sentence endings if we have at least 1 second of content
should_split = True
if should_split and current_words:
# Create segment
text = " ".join(current_words)
# For Chinese/Japanese, remove spaces between words
if any('\u4e00' <= c <= '\u9fff' for c in text):
text = text.replace(" ", "")
segments.append(TranscriptSegment(
start=current_start,
end=word_end,
text=text,
))
# Reset for next segment
current_words = []
current_start = None
return segments
def _cleanup_temp_files(paths: list):
"""Clean up temporary files and directories."""
import shutil
for path in paths:
try:
if os.path.isdir(path):
shutil.rmtree(path, ignore_errors=True)
elif os.path.exists(path):
os.remove(path)
except Exception:
pass
def segments_to_srt(segments: List[TranscriptSegment], use_translated: bool = True) -> str:
"""Convert segments to SRT format."""
srt_lines = []
for i, seg in enumerate(segments, 1):
start_time = format_srt_time(seg.start)
end_time = format_srt_time(seg.end)
text = seg.translated if use_translated and seg.translated else seg.text
srt_lines.append(f"{i}")
srt_lines.append(f"{start_time} --> {end_time}")
srt_lines.append(text)
srt_lines.append("")
return "\n".join(srt_lines)
def format_srt_time(seconds: float) -> str:
"""Format seconds to SRT timestamp format (HH:MM:SS,mmm)."""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
millis = int((seconds % 1) * 1000)
return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
def segments_to_ass(
segments: List[TranscriptSegment],
use_translated: bool = True,
font_size: int = 28,
font_color: str = "FFFFFF",
outline_color: str = "000000",
font_name: str = "NanumGothic",
position: str = "bottom", # top, center, bottom
outline_width: int = 3,
bold: bool = True,
shadow: int = 1,
background_box: bool = True,
background_opacity: str = "E0", # 00=transparent, FF=opaque
animation: str = "none", # none, fade, pop
time_offset: float = 0.0, # Delay all subtitles by this amount (for intro text)
) -> str:
"""
Convert segments to ASS format with styling.
Args:
segments: List of transcript segments
use_translated: Use translated text if available
font_size: Font size in pixels
font_color: Font color in hex (without #)
outline_color: Outline color in hex (without #)
font_name: Font family name
position: Subtitle position - "top", "center", or "bottom"
outline_width: Outline thickness
bold: Use bold text
shadow: Shadow depth (0-4)
background_box: Show semi-transparent background box
animation: Animation type - "none", "fade", or "pop"
time_offset: Delay all subtitle timings by this amount in seconds (useful when intro text is shown)
Returns:
ASS formatted subtitle string
"""
# ASS Alignment values:
# 1=Bottom-Left, 2=Bottom-Center, 3=Bottom-Right
# 4=Middle-Left, 5=Middle-Center, 6=Middle-Right
# 7=Top-Left, 8=Top-Center, 9=Top-Right
alignment_map = {
"top": 8, # Top-Center
"center": 5, # Middle-Center (영상 가운데)
"bottom": 2, # Bottom-Center (기본값)
}
alignment = alignment_map.get(position, 2)
# Adjust margin based on position (낮은 값 = 화면 가장자리에 더 가까움)
# 원본 자막을 덮기 위해 하단 마진을 작게 설정
margin_v = 30 if position == "bottom" else (100 if position == "top" else 10)
# Bold: -1 = bold, 0 = normal
bold_value = -1 if bold else 0
# BorderStyle: 1 = outline + shadow, 3 = opaque box (background)
border_style = 3 if background_box else 1
# BackColour alpha: use provided opacity or default
back_alpha = background_opacity if background_box else "80"
# ASS header
ass_content = f"""[Script Info]
Title: Shorts Maker Subtitle
ScriptType: v4.00+
PlayDepth: 0
PlayResX: 1080
PlayResY: 1920
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,{font_name},{font_size},&H00{font_color},&H00FFFFFF,&H00{outline_color},&H{back_alpha}000000,{bold_value},0,0,0,100,100,0,0,{border_style},{outline_width},{shadow},{alignment},30,30,{margin_v},1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
for seg in segments:
# Apply time offset (for intro text overlay)
start_time = format_ass_time(seg.start + time_offset)
end_time = format_ass_time(seg.end + time_offset)
text = seg.translated if use_translated and seg.translated else seg.text
# Escape special characters
text = text.replace("\\", "\\\\").replace("{", "\\{").replace("}", "\\}")
# Add animation effects
if animation == "fade":
# Fade in/out effect (250ms)
text = f"{{\\fad(250,250)}}{text}"
elif animation == "pop":
# Pop-in effect with scale animation
text = f"{{\\t(0,150,\\fscx110\\fscy110)\\t(150,300,\\fscx100\\fscy100)}}{text}"
ass_content += f"Dialogue: 0,{start_time},{end_time},Default,,0,0,0,,{text}\n"
return ass_content
def format_ass_time(seconds: float) -> str:
"""Format seconds to ASS timestamp format (H:MM:SS.cc)."""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
centis = int((seconds % 1) * 100)
return f"{hours}:{minutes:02d}:{secs:02d}.{centis:02d}"

View File

@@ -0,0 +1,468 @@
import re
from typing import List, Tuple, Optional
from openai import OpenAI
from app.models.schemas import TranscriptSegment
from app.config import settings
def get_openai_client() -> OpenAI:
"""Get OpenAI client."""
return OpenAI(api_key=settings.OPENAI_API_KEY)
class TranslationMode:
"""Translation mode options."""
DIRECT = "direct" # 직접 번역 (원본 구조 유지)
SUMMARIZE = "summarize" # 요약 후 번역
REWRITE = "rewrite" # 요약 + 한글 대본 재작성
async def shorten_text(client: OpenAI, text: str, max_chars: int) -> str:
"""
Shorten a Korean text to fit within character limit.
Args:
client: OpenAI client
text: Text to shorten
max_chars: Maximum character count
Returns:
Shortened text
"""
try:
response = client.chat.completions.create(
model=settings.OPENAI_MODEL,
messages=[
{
"role": "system",
"content": f"""한국어 자막을 {max_chars}자 이내로 줄이세요.
규칙:
- 반드시 {max_chars}자 이하!
- 핵심 의미만 유지
- 자연스러운 한국어
- 존댓말 유지
- 출력은 줄인 문장만!
예시:
입력: "요리할 때마다 한 시간이 걸리셨죠?" (18자)
제한: 10자
출력: "시간 오래 걸리죠" (8자)
입력: "채소 다듬는 데만 30분 걸리셨죠" (16자)
제한: 10자
출력: "채소만 30분" (6자)"""
},
{
"role": "user",
"content": f"입력: \"{text}\" ({len(text)}자)\n제한: {max_chars}\n출력:"
}
],
temperature=0.3,
max_tokens=50,
)
shortened = response.choices[0].message.content.strip()
# Remove quotes, parentheses, and extra characters
shortened = shortened.strip('"\'""''')
# Remove any trailing parenthetical notes like "(10자)"
shortened = re.sub(r'\s*\([^)]*자\)\s*$', '', shortened)
shortened = re.sub(r'\s*\(\d+자\)\s*$', '', shortened)
# Remove any remaining quotes
shortened = shortened.replace('"', '').replace('"', '').replace('"', '')
shortened = shortened.replace("'", '').replace("'", '').replace("'", '')
shortened = shortened.strip()
# If still too long, truncate cleanly
if len(shortened) > max_chars:
shortened = shortened[:max_chars]
return shortened
except Exception as e:
# Fallback: simple truncation
if len(text) > max_chars:
return text[:max_chars-1] + ""
return text
async def translate_segments(
segments: List[TranscriptSegment],
target_language: str = "Korean",
mode: str = TranslationMode.DIRECT,
max_tokens: Optional[int] = None,
) -> Tuple[bool, str, List[TranscriptSegment]]:
"""
Translate transcript segments to target language using OpenAI.
Args:
segments: List of transcript segments
target_language: Target language for translation
mode: Translation mode (direct, summarize, rewrite)
max_tokens: Maximum output tokens (for cost control)
Returns:
Tuple of (success, message, translated_segments)
"""
if not settings.OPENAI_API_KEY:
return False, "OpenAI API key not configured", segments
try:
client = get_openai_client()
# Batch translate for efficiency
texts = [seg.text for seg in segments]
combined_text = "\n---\n".join(texts)
# Calculate video duration for context
total_duration = segments[-1].end if segments else 0
# Calculate segment info for length guidance
segment_info = []
for i, seg in enumerate(segments):
duration = seg.end - seg.start
max_chars = int(duration * 5) # ~5 Korean chars per second (stricter for better sync)
segment_info.append(f"[{i+1}] {duration:.1f}초 = 최대 {max_chars}자 (엄수!)")
# Get custom prompt settings from config
gpt_role = settings.GPT_ROLE or "친근한 유튜브 쇼츠 자막 작가"
gpt_tone = settings.GPT_TONE or "존댓말"
gpt_style = settings.GPT_STYLE or ""
# Tone examples
tone_examples = {
"존댓말": '~해요, ~이에요, ~하죠',
"반말": '~해, ~야, ~지',
"격식체": '~합니다, ~입니다',
}
tone_example = tone_examples.get(gpt_tone, tone_examples["존댓말"])
# Additional style instruction
style_instruction = f"\n6. Style: {gpt_style}" if gpt_style else ""
# Select prompt based on mode
if mode == TranslationMode.REWRITE:
# Build indexed timeline input with Chinese text
# Use segment numbers to handle duplicate timestamps
timeline_input = []
for i, seg in enumerate(segments):
mins = int(seg.start // 60)
secs = int(seg.start % 60)
timeline_input.append(f"[{i+1}] {mins}:{secs:02d} {seg.text}")
system_prompt = f"""당신은 생활용품 유튜브 쇼츠 자막 작가입니다.
중국어 원문의 "의미"만 참고하여, 한국인이 직접 말하는 것처럼 자연스러운 자막을 작성하세요.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🎯 핵심 원칙: 번역이 아니라 "재창작"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ 필수 규칙:
1. 한 문장 = 한 가지 정보 (두 개 이상 금지)
2. 중복 표현 절대 금지 ("편해요"가 이미 나왔으면 다시 안 씀)
3. {gpt_tone} 사용 ({tone_example})
4. 세그먼트 수 유지: 입력 {len(segments)}개 → 출력 {len(segments)}
5. 중국어 한자 금지, 순수 한글만
❌ 금지 표현 (번역투):
- "~할 수 있어요""~돼요", "~됩니다"
- "매우/아주/정말" 남용 → 꼭 필요할 때만
- "그것은/이것은""이거", "이건"
- "~하는 것이" → 직접 표현으로
- "편리해요/편해요" 반복 → 한 번만, 이후 다른 표현
- "좋아요/좋고요" 반복 → 구체적 장점으로 대체
🎵 쇼츠 리듬감:
- 짧게 끊어서
- 한 호흡에 하나씩
- 시청자가 따라 읽을 수 있게
📝 좋은 예시:
원문: "이 작은 박스 디자인이 참 좋네요. 평소에 씨앗 먹을 때 간편하게 먹을 수 있어요."
❌ 나쁜 번역: "이 작은 박스 디자인이 참 좋네요. 평소에 씨앗 먹을 때 간편하게 먹을 수 있어요."
✅ 좋은 재창작: "이 작은 박스, 생각보다 정말 잘 만들었어요."
원문: "테이블에 두거나 손에 들고 사용하기에도 좋고요. 침대에 누워서나 사무실에서도 간식이나 과일 먹기 정말 편해요."
❌ 나쁜 번역: "테이블에 두거나 손에 들고 사용하기에도 좋고요. 침대에 누워서나 사무실에서도 간식이나 과일 먹기 정말 편해요."
✅ 좋은 재창작 (2개로 분리):
- "테이블 위에서도, 침대에서도, 사무실에서도 사용하기 좋고"
- "과일 씻고 물기 빼는 데도 활용 가능합니다."
원문: "가정에서 필수 아이템이에요. 정말 유용하죠. 꼭 하나씩 가져야 할 제품이에요."
❌ 나쁜 번역: 그대로 3문장
✅ 좋은 재창작: "집에 하나 있으면 은근히 자주 쓰게 됩니다."{style_instruction}
출력 형식:
[번호] 시간 자막 내용
⚠️ 입력과 동일한 세그먼트 수({len(segments)}개)를 출력하세요!
⚠️ 각 [번호]는 입력과 1:1 대응해야 합니다!"""
# Use indexed timeline format for user content
combined_text = "[중국어 원문]\n\n" + "\n".join(timeline_input)
elif mode == TranslationMode.SUMMARIZE:
system_prompt = f"""You are: {gpt_role}
Task: Translate Chinese to SHORT Korean subtitles.
Length limits (자막 싱크!):
{chr(10).join(segment_info)}
Rules:
1. Use {gpt_tone} ({tone_example})
2. Summarize to core meaning - be BRIEF
3. Max one short sentence per segment
4. {len(segments)} segments separated by '---'{style_instruction}"""
else: # DIRECT mode
system_prompt = f"""You are: {gpt_role}
Task: Translate Chinese to Korean subtitles.
Length limits (자막 싱크!):
{chr(10).join(segment_info)}
Rules:
1. Use {gpt_tone} ({tone_example})
2. Keep translations SHORT and readable
3. {len(segments)} segments separated by '---'{style_instruction}"""
# Build API request
request_params = {
"model": settings.OPENAI_MODEL,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": combined_text}
],
"temperature": 0.65 if mode == TranslationMode.REWRITE else 0.3,
}
# Add max_tokens if specified (for cost control)
effective_max_tokens = max_tokens or settings.TRANSLATION_MAX_TOKENS
if effective_max_tokens:
# Use higher token limit for REWRITE mode
if mode == TranslationMode.REWRITE:
request_params["max_tokens"] = max(effective_max_tokens, 700)
else:
request_params["max_tokens"] = effective_max_tokens
response = client.chat.completions.create(**request_params)
translated_text = response.choices[0].message.content
# Parse based on mode
if mode == TranslationMode.REWRITE:
# Parse indexed timeline format: "[1] 0:00 자막\n[2] 0:02 자막\n..."
indexed_pattern = re.compile(r'^\[(\d+)\]\s*\d+:\d{2}\s+(.+)$', re.MULTILINE)
matches = indexed_pattern.findall(translated_text)
# Create mapping from segment index to translation
translations_by_index = {}
for idx, text in matches:
translations_by_index[int(idx)] = text.strip()
# Map translations back to segments by index (1-based)
for i, seg in enumerate(segments):
seg_num = i + 1 # 1-based index
if seg_num in translations_by_index:
seg.translated = translations_by_index[seg_num]
else:
# No matching translation found - try fallback to old timestamp-based parsing
seg.translated = ""
# Fallback: if no indexed matches, try old timestamp format
if not matches:
print("[Warning] No indexed format found, falling back to timestamp parsing")
timeline_pattern = re.compile(r'^(\d+):(\d{2})\s+(.+)$', re.MULTILINE)
timestamp_matches = timeline_pattern.findall(translated_text)
# Create mapping from timestamp to translation
translations_by_time = {}
for mins, secs, text in timestamp_matches:
time_sec = int(mins) * 60 + int(secs)
translations_by_time[time_sec] = text.strip()
# Track used translations to prevent duplicates
used_translations = set()
# Map translations back to segments by matching start times
for seg in segments:
start_sec = int(seg.start)
matched_time = None
# Try exact match first
if start_sec in translations_by_time and start_sec not in used_translations:
matched_time = start_sec
else:
# Try to find closest UNUSED match within 1 second
for t in range(start_sec - 1, start_sec + 2):
if t in translations_by_time and t not in used_translations:
matched_time = t
break
if matched_time is not None:
seg.translated = translations_by_time[matched_time]
used_translations.add(matched_time)
else:
seg.translated = ""
else:
# Original parsing for other modes
translated_parts = translated_text.split("---")
for i, seg in enumerate(segments):
if i < len(translated_parts):
seg.translated = translated_parts[i].strip()
else:
seg.translated = seg.text # Fallback to original
# Calculate token usage for logging
usage = response.usage
token_info = f"(tokens: {usage.prompt_tokens}+{usage.completion_tokens}={usage.total_tokens})"
# Post-processing: Shorten segments that exceed character limit
# Skip for REWRITE mode - the prompt handles length naturally
shortened_count = 0
if mode != TranslationMode.REWRITE:
chars_per_sec = 5
for i, seg in enumerate(segments):
if seg.translated:
duration = seg.end - seg.start
max_chars = int(duration * chars_per_sec)
current_len = len(seg.translated)
if current_len > max_chars * 1.3 and max_chars >= 5:
seg.translated = await shorten_text(client, seg.translated, max_chars)
shortened_count += 1
print(f"[Shorten] Seg {i+1}: {current_len}{len(seg.translated)}자 (제한:{max_chars}자)")
shorten_info = f" [축약:{shortened_count}개]" if shortened_count > 0 else ""
return True, f"Translation complete [{mode}] {token_info}{shorten_info}", segments
except Exception as e:
return False, f"Translation error: {str(e)}", segments
async def generate_shorts_script(
segments: List[TranscriptSegment],
style: str = "engaging",
max_tokens: int = 500,
) -> Tuple[bool, str, Optional[str]]:
"""
Generate a completely new Korean Shorts script from Chinese transcript.
Args:
segments: Original transcript segments
style: Script style (engaging, informative, funny, dramatic)
max_tokens: Maximum output tokens
Returns:
Tuple of (success, message, script)
"""
if not settings.OPENAI_API_KEY:
return False, "OpenAI API key not configured", None
try:
client = get_openai_client()
# Combine all text
full_text = " ".join([seg.text for seg in segments])
total_duration = segments[-1].end if segments else 0
style_guides = {
"engaging": "Use hooks, questions, and emotional expressions. Start with attention-grabbing line.",
"informative": "Focus on facts and clear explanations. Use simple, direct language.",
"funny": "Add humor, wordplay, and light-hearted tone. Include relatable jokes.",
"dramatic": "Build tension and suspense. Use impactful short sentences.",
}
style_guide = style_guides.get(style, style_guides["engaging"])
system_prompt = f"""You are a viral Korean YouTube Shorts script writer.
Create a COMPLETELY ORIGINAL Korean script inspired by the Chinese video content.
=== CRITICAL: ANTI-PLAGIARISM RULES ===
- This is NOT translation - it's ORIGINAL CONTENT CREATION
- NEVER copy sentence structures, word order, or phrasing from original
- Extract only the CORE IDEA, then write YOUR OWN script from scratch
- Imagine you're a Korean creator who just learned this interesting fact
- Add your own personality, reactions, and Korean cultural context
=======================================
Video duration: ~{int(total_duration)} seconds
Style: {style}
Guide: {style_guide}
Output format:
[0:00] 첫 번째 대사
[0:03] 두 번째 대사
...
Requirements:
- Write in POLITE FORMAL KOREAN (존댓말/경어) - friendly but respectful
- Each line: 2-3 seconds when spoken aloud
- Start with a HOOK that grabs attention
- Use polite Korean expressions: "이거 아세요?", "정말 신기하죠", "근데 여기서 중요한 건요"
- End with engagement: question, call-to-action, or surprise
- Make it feel like ORIGINAL Korean content, not a translation"""
response = client.chat.completions.create(
model=settings.OPENAI_MODEL,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Chinese transcript:\n{full_text}"}
],
temperature=0.7,
max_tokens=max_tokens,
)
script = response.choices[0].message.content
usage = response.usage
token_info = f"(tokens: {usage.total_tokens})"
return True, f"Script generated [{style}] {token_info}", script
except Exception as e:
return False, f"Script generation error: {str(e)}", None
async def translate_single(
text: str,
target_language: str = "Korean",
max_tokens: Optional[int] = None,
) -> Tuple[bool, str]:
"""Translate a single text."""
if not settings.OPENAI_API_KEY:
return False, text
try:
client = get_openai_client()
request_params = {
"model": settings.OPENAI_MODEL,
"messages": [
{
"role": "system",
"content": f"Translate to {target_language}. Only output the translation, nothing else."
},
{
"role": "user",
"content": text
}
],
"temperature": 0.3,
}
if max_tokens:
request_params["max_tokens"] = max_tokens
response = client.chat.completions.create(**request_params)
translated = response.choices[0].message.content
return True, translated.strip()
except Exception as e:
return False, text

View File

@@ -0,0 +1,659 @@
import subprocess
import asyncio
import os
from typing import Optional, Tuple
from app.config import settings
async def process_video(
input_path: str,
output_path: str,
subtitle_path: Optional[str] = None,
bgm_path: Optional[str] = None,
bgm_volume: float = 0.3,
keep_original_audio: bool = False,
intro_text: Optional[str] = None,
intro_duration: float = 0.7,
intro_font_size: int = 100,
) -> Tuple[bool, str]:
"""
Process video: remove audio, add subtitles, add BGM, add intro text.
Args:
input_path: Path to input video
output_path: Path for output video
subtitle_path: Path to ASS/SRT subtitle file
bgm_path: Path to BGM audio file
bgm_volume: Volume level for BGM (0.0 - 1.0)
keep_original_audio: Whether to keep original audio
intro_text: Text to display at the beginning of video (YouTube Shorts thumbnail)
intro_duration: How long to display intro text (seconds)
intro_font_size: Font size for intro text (100-120 recommended)
Returns:
Tuple of (success, message)
"""
if not os.path.exists(input_path):
return False, f"Input video not found: {input_path}"
os.makedirs(os.path.dirname(output_path), exist_ok=True)
# Build FFmpeg command
cmd = ["ffmpeg", "-y"] # -y to overwrite
# Input video
cmd.extend(["-i", input_path])
# Input BGM if provided (stream_loop must come BEFORE -i)
if bgm_path and os.path.exists(bgm_path):
cmd.extend(["-stream_loop", "-1"]) # Loop BGM infinitely
cmd.extend(["-i", bgm_path])
# Build filter complex
filter_parts = []
audio_parts = []
# Audio handling
if keep_original_audio and bgm_path and os.path.exists(bgm_path):
# Mix original audio with BGM
filter_parts.append(f"[0:a]volume=1.0[original]")
filter_parts.append(f"[1:a]volume={bgm_volume}[bgm]")
filter_parts.append(f"[original][bgm]amix=inputs=2:duration=shortest[audio]")
audio_output = "[audio]"
elif bgm_path and os.path.exists(bgm_path):
# BGM only (no original audio)
filter_parts.append(f"[1:a]volume={bgm_volume}[audio]")
audio_output = "[audio]"
elif keep_original_audio:
# Original audio only
audio_output = "0:a"
else:
# No audio
audio_output = None
# Build video filter chain
video_filters = []
# Note: We no longer use tpad to add frozen frames, as it extends the video duration.
# Instead, intro text is simply overlaid on the existing video content.
# 2. Add subtitle overlay if provided
if subtitle_path and os.path.exists(subtitle_path):
escaped_path = subtitle_path.replace("\\", "/").replace(":", "\\:").replace("'", "\\'")
video_filters.append(f"ass='{escaped_path}'")
# 3. Add intro text overlay if provided (shown during frozen frame portion)
if intro_text:
# Find a suitable font - try common Korean fonts
font_options = [
"/System/Library/Fonts/Supplemental/AppleGothic.ttf", # macOS Korean
"/System/Library/Fonts/AppleSDGothicNeo.ttc", # macOS Korean
"/usr/share/fonts/truetype/nanum/NanumGothicBold.ttf", # Linux Korean
"/usr/share/fonts/opentype/noto/NotoSansCJK-Bold.ttc", # Linux CJK
]
font_file = None
for font in font_options:
if os.path.exists(font):
font_file = font.replace(":", "\\:")
break
# Adjust font size and split text if too long
# Shorts video is 1080 width, so ~10-12 chars fit comfortably at 100px
text_len = len(intro_text)
adjusted_font_size = intro_font_size
# Split into 2 lines if text is long (more than 10 chars)
lines = []
if text_len > 10:
# Find best split point near middle
mid = text_len // 2
split_pos = mid
for i in range(mid, max(0, mid - 5), -1):
if intro_text[i] in ' ,、,':
split_pos = i + 1
break
for i in range(mid, min(text_len, mid + 5)):
if intro_text[i] in ' ,、,':
split_pos = i + 1
break
line1 = intro_text[:split_pos].strip()
line2 = intro_text[split_pos:].strip()
if line2:
lines = [line1, line2]
else:
lines = [intro_text]
else:
lines = [intro_text]
# Adjust font size based on longest line length
max_line_len = max(len(line) for line in lines)
if max_line_len > 12:
adjusted_font_size = int(intro_font_size * 10 / max_line_len)
adjusted_font_size = max(50, min(adjusted_font_size, intro_font_size)) # Clamp between 50-100
# Add fade effect timing
fade_out_start = max(0.1, intro_duration - 0.3)
alpha_expr = f"if(gt(t,{fade_out_start}),(({intro_duration}-t)/0.3),1)"
# Create drawtext filter(s) for each line
line_height = adjusted_font_size + 20
total_height = line_height * len(lines)
for i, line in enumerate(lines):
escaped_text = line.replace("'", "\\'").replace(":", "\\:").replace("\\", "\\\\")
# Calculate y position for this line (centered overall)
if len(lines) == 1:
y_expr = "(h-text_h)/2"
else:
# Center the block of lines, then position each line
y_offset = int((i - (len(lines) - 1) / 2) * line_height)
y_expr = f"(h-text_h)/2+{y_offset}"
drawtext_parts = [
f"text='{escaped_text}'",
f"fontsize={adjusted_font_size}",
"fontcolor=white",
"x=(w-text_w)/2", # Center horizontally
f"y={y_expr}",
f"enable='lt(t,{intro_duration})'",
"borderw=3",
"bordercolor=black",
"box=1",
"boxcolor=black@0.6",
"boxborderw=15",
f"alpha='{alpha_expr}'",
]
if font_file:
drawtext_parts.insert(1, f"fontfile='{font_file}'")
video_filters.append(f"drawtext={':'.join(drawtext_parts)}")
# Combine video filters
video_filter_str = ",".join(video_filters) if video_filters else None
# Construct FFmpeg command
if filter_parts or video_filter_str:
if filter_parts and video_filter_str:
full_filter = ";".join(filter_parts) + f";[0:v]{video_filter_str}[vout]"
cmd.extend(["-filter_complex", full_filter])
cmd.extend(["-map", "[vout]"])
if audio_output and audio_output.startswith("["):
cmd.extend(["-map", audio_output])
elif audio_output:
cmd.extend(["-map", audio_output])
elif video_filter_str:
cmd.extend(["-vf", video_filter_str])
if bgm_path and os.path.exists(bgm_path):
cmd.extend(["-filter_complex", f"[1:a]volume={bgm_volume}[audio]"])
cmd.extend(["-map", "0:v", "-map", "[audio]"])
elif not keep_original_audio:
cmd.extend(["-an"]) # No audio
elif filter_parts:
cmd.extend(["-filter_complex", ";".join(filter_parts)])
cmd.extend(["-map", "0:v"])
if audio_output and audio_output.startswith("["):
cmd.extend(["-map", audio_output])
else:
if not keep_original_audio:
cmd.extend(["-an"])
# Output settings
cmd.extend([
"-c:v", "libx264",
"-preset", "medium",
"-crf", "23",
"-c:a", "aac",
"-b:a", "128k",
"-shortest",
output_path
])
try:
# Run FFmpeg in thread pool to avoid blocking the event loop
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=600, # 10 minute timeout
)
if result.returncode != 0:
error_msg = result.stderr[-500:] if result.stderr else "Unknown error"
return False, f"FFmpeg error: {error_msg}"
if os.path.exists(output_path):
return True, "Video processing complete"
else:
return False, "Output file not created"
except subprocess.TimeoutExpired:
return False, "Processing timed out"
except Exception as e:
return False, f"Processing error: {str(e)}"
async def get_video_duration(video_path: str) -> Optional[float]:
"""Get video duration in seconds."""
cmd = [
"ffprobe",
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
video_path
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
if result.returncode == 0:
return float(result.stdout.strip())
except Exception:
pass
return None
async def get_video_info(video_path: str) -> Optional[dict]:
"""Get video information (duration, resolution, etc.)."""
import json as json_module
cmd = [
"ffprobe",
"-v", "error",
"-select_streams", "v:0",
"-show_entries", "stream=width,height,duration:format=duration",
"-of", "json",
video_path
]
try:
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=30,
)
if result.returncode == 0:
data = json_module.loads(result.stdout)
info = {}
# Get duration from format (more reliable)
if "format" in data and "duration" in data["format"]:
info["duration"] = float(data["format"]["duration"])
# Get resolution from stream
if "streams" in data and len(data["streams"]) > 0:
stream = data["streams"][0]
info["width"] = stream.get("width")
info["height"] = stream.get("height")
return info if info else None
except Exception:
pass
return None
async def trim_video(
input_path: str,
output_path: str,
start_time: float,
end_time: float,
) -> Tuple[bool, str]:
"""
Trim video to specified time range.
Args:
input_path: Path to input video
output_path: Path for output video
start_time: Start time in seconds
end_time: End time in seconds
Returns:
Tuple of (success, message)
"""
if not os.path.exists(input_path):
return False, f"Input video not found: {input_path}"
# Validate time range
duration = await get_video_duration(input_path)
if duration is None:
return False, "Could not get video duration"
if start_time < 0:
start_time = 0
if end_time > duration:
end_time = duration
if start_time >= end_time:
return False, f"Invalid time range: start ({start_time}) >= end ({end_time})"
os.makedirs(os.path.dirname(output_path), exist_ok=True)
trim_duration = end_time - start_time
# Log trim parameters for debugging
print(f"[Trim] Input: {input_path}")
print(f"[Trim] Original duration: {duration:.3f}s")
print(f"[Trim] Requested: start={start_time:.3f}s, end={end_time:.3f}s")
print(f"[Trim] Output duration should be: {trim_duration:.3f}s")
# Use -ss BEFORE -i for input seeking (faster and more reliable for end trimming)
# Combined with -t for accurate duration control
# -accurate_seek ensures frame-accurate seeking
cmd = [
"ffmpeg", "-y",
"-accurate_seek", # Enable accurate seeking
"-ss", str(start_time), # Input seeking (before -i)
"-i", input_path,
"-t", str(trim_duration), # Duration of output
"-c:v", "libx264", # Re-encode video for accurate cut
"-preset", "fast", # Fast encoding preset
"-crf", "18", # High quality (lower = better)
"-c:a", "aac", # Re-encode audio
"-b:a", "128k", # Audio bitrate
"-avoid_negative_ts", "make_zero", # Fix timestamp issues
output_path
]
print(f"[Trim] Command: {' '.join(cmd)}")
try:
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
error_msg = result.stderr[-300:] if result.stderr else "Unknown error"
print(f"[Trim] FFmpeg error: {error_msg}")
return False, f"Trim failed: {error_msg}"
if os.path.exists(output_path):
new_duration = await get_video_duration(output_path)
print(f"[Trim] Success! New duration: {new_duration:.3f}s (expected: {trim_duration:.3f}s)")
print(f"[Trim] Difference from expected: {abs(new_duration - trim_duration):.3f}s")
return True, f"Video trimmed successfully ({new_duration:.1f}s)"
else:
print("[Trim] Error: Output file not created")
return False, "Output file not created"
except subprocess.TimeoutExpired:
print("[Trim] Error: Timeout")
return False, "Trim operation timed out"
except Exception as e:
print(f"[Trim] Error: {str(e)}")
return False, f"Trim error: {str(e)}"
async def extract_frame(
video_path: str,
output_path: str,
timestamp: float,
) -> Tuple[bool, str]:
"""
Extract a single frame from video at specified timestamp.
Args:
video_path: Path to input video
output_path: Path for output image (jpg/png)
timestamp: Time in seconds
Returns:
Tuple of (success, message)
"""
if not os.path.exists(video_path):
return False, f"Video not found: {video_path}"
os.makedirs(os.path.dirname(output_path), exist_ok=True)
cmd = [
"ffmpeg", "-y",
"-ss", str(timestamp),
"-i", video_path,
"-frames:v", "1",
"-q:v", "2",
output_path
]
try:
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=30,
)
if result.returncode == 0 and os.path.exists(output_path):
return True, "Frame extracted"
return False, result.stderr[-200:] if result.stderr else "Unknown error"
except Exception as e:
return False, str(e)
async def get_audio_duration(audio_path: str) -> Optional[float]:
"""Get audio duration in seconds."""
return await get_video_duration(audio_path) # Same command works
async def extract_audio(video_path: str, output_path: str) -> Tuple[bool, str]:
"""Extract audio from video."""
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-vn",
"-acodec", "pcm_s16le",
"-ar", "16000",
"-ac", "1",
output_path
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode == 0:
return True, "Audio extracted"
return False, result.stderr
except Exception as e:
return False, str(e)
async def extract_audio_with_noise_reduction(
video_path: str,
output_path: str,
noise_reduction_level: str = "medium"
) -> Tuple[bool, str]:
"""
Extract audio from video with noise reduction for better STT accuracy.
Args:
video_path: Path to input video
output_path: Path for output audio (WAV format recommended)
noise_reduction_level: "light", "medium", or "heavy"
Returns:
Tuple of (success, message)
"""
if not os.path.exists(video_path):
return False, f"Video file not found: {video_path}"
# Build audio filter chain based on noise reduction level
filters = []
# 1. High-pass filter: Remove low frequency rumble (< 80Hz)
filters.append("highpass=f=80")
# 2. Low-pass filter: Remove high frequency hiss (> 8000Hz for speech)
filters.append("lowpass=f=8000")
if noise_reduction_level == "light":
# Light: Just basic frequency filtering
pass
elif noise_reduction_level == "medium":
# Medium: Add FFT-based denoiser
# afftdn: nr=noise reduction amount (0-100), nf=noise floor
filters.append("afftdn=nf=-25:nr=10:nt=w")
elif noise_reduction_level == "heavy":
# Heavy: More aggressive noise reduction
filters.append("afftdn=nf=-20:nr=20:nt=w")
# Add dynamic range compression to normalize volume
filters.append("acompressor=threshold=-20dB:ratio=4:attack=5:release=50")
# 3. Normalize audio levels
filters.append("loudnorm=I=-16:TP=-1.5:LRA=11")
filter_chain = ",".join(filters)
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-vn", # No video
"-af", filter_chain,
"-acodec", "pcm_s16le", # PCM format for Whisper
"-ar", "16000", # 16kHz sample rate (Whisper optimal)
"-ac", "1", # Mono
output_path
]
try:
# Run FFmpeg in thread pool to avoid blocking the event loop
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
error_msg = result.stderr[-300:] if result.stderr else "Unknown error"
return False, f"Audio extraction failed: {error_msg}"
if os.path.exists(output_path):
return True, f"Audio extracted with {noise_reduction_level} noise reduction"
else:
return False, "Output file not created"
except subprocess.TimeoutExpired:
return False, "Audio extraction timed out"
except Exception as e:
return False, f"Audio extraction error: {str(e)}"
async def analyze_audio_noise_level(audio_path: str) -> Optional[dict]:
"""
Analyze audio to detect noise level.
Returns dict with mean_volume, max_volume, noise_floor estimates.
"""
cmd = [
"ffmpeg",
"-i", audio_path,
"-af", "volumedetect",
"-f", "null",
"-"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
stderr = result.stderr
# Parse volume detection output
info = {}
for line in stderr.split('\n'):
if 'mean_volume' in line:
info['mean_volume'] = float(line.split(':')[1].strip().replace(' dB', ''))
elif 'max_volume' in line:
info['max_volume'] = float(line.split(':')[1].strip().replace(' dB', ''))
return info if info else None
except Exception:
return None
async def has_audio_stream(video_path: str) -> bool:
"""
Check if video file has an audio stream.
Returns:
True if video has audio, False otherwise
"""
cmd = [
"ffprobe",
"-v", "error",
"-select_streams", "a", # Select only audio streams
"-show_entries", "stream=codec_type",
"-of", "csv=p=0",
video_path
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
# If there's audio, ffprobe will output "audio"
return "audio" in result.stdout.lower()
except Exception:
return False
async def get_audio_volume_info(video_path: str) -> Optional[dict]:
"""
Get audio volume information to detect silent audio.
Returns:
dict with mean_volume, or None if no audio or error
"""
# First check if audio stream exists
if not await has_audio_stream(video_path):
return None
cmd = [
"ffmpeg",
"-i", video_path,
"-af", "volumedetect",
"-f", "null",
"-"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
stderr = result.stderr
info = {}
for line in stderr.split('\n'):
if 'mean_volume' in line:
info['mean_volume'] = float(line.split(':')[1].strip().replace(' dB', ''))
elif 'max_volume' in line:
info['max_volume'] = float(line.split(':')[1].strip().replace(' dB', ''))
return info if info else None
except Exception:
return None
def is_audio_silent(volume_info: Optional[dict], threshold_db: float = -50.0) -> bool:
"""
Check if audio is effectively silent (below threshold).
Args:
volume_info: dict from get_audio_volume_info
threshold_db: Volume below this is considered silent (default -50dB)
Returns:
True if silent or no audio, False otherwise
"""
if not volume_info:
return True
mean_volume = volume_info.get('mean_volume', -100)
return mean_volume < threshold_db

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

14
backend/requirements.txt Normal file
View File

@@ -0,0 +1,14 @@
fastapi>=0.109.0
uvicorn[standard]>=0.27.0
python-multipart>=0.0.6
yt-dlp>=2024.10.7
openai-whisper>=20231117
openai>=1.12.0
pydantic>=2.5.3
pydantic-settings>=2.1.0
aiofiles>=23.2.1
python-jose[cryptography]>=3.3.0
celery>=5.3.6
redis>=5.0.1
httpx>=0.26.0
curl_cffi>=0.13.0,<0.14.0 # For yt-dlp browser impersonation (yt-dlp only supports <0.14)

51
docker-compose.yml Normal file
View File

@@ -0,0 +1,51 @@
version: '3.8'
services:
backend:
build:
context: ./backend
dockerfile: Dockerfile
container_name: shorts-maker-backend
restart: unless-stopped
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- WHISPER_MODEL=${WHISPER_MODEL:-medium}
- REDIS_URL=redis://redis:6379/0
volumes:
- ./data/downloads:/app/data/downloads
- ./data/processed:/app/data/processed
- ./data/bgm:/app/data/bgm
- ./data/jobs.json:/app/data/jobs.json
depends_on:
- redis
networks:
- shorts-network
frontend:
build:
context: ./frontend
dockerfile: Dockerfile
container_name: shorts-maker-frontend
restart: unless-stopped
ports:
- "${PORT:-3000}:80"
depends_on:
- backend
networks:
- shorts-network
redis:
image: redis:7-alpine
container_name: shorts-maker-redis
restart: unless-stopped
volumes:
- redis-data:/data
networks:
- shorts-network
volumes:
redis-data:
networks:
shorts-network:
driver: bridge

29
frontend/Dockerfile Normal file
View File

@@ -0,0 +1,29 @@
# Build stage
FROM node:20-alpine as build
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm install
# Copy source code
COPY . .
# Build the app
RUN npm run build
# Production stage
FROM nginx:alpine
# Copy built files
COPY --from=build /app/dist /usr/share/nginx/html
# Copy nginx config
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

15
frontend/index.html Normal file
View File

@@ -0,0 +1,15 @@
<!DOCTYPE html>
<html lang="ko">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Shorts Maker - 쇼츠 한글 자막 변환기</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+KR:wght@400;500;600;700&display=swap" rel="stylesheet">
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.jsx"></script>
</body>
</html>

43
frontend/nginx.conf Normal file
View File

@@ -0,0 +1,43 @@
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html;
index index.html;
# Gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript;
# API proxy
location /api/ {
proxy_pass http://backend:8000/api/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache_bypass $http_upgrade;
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
# Static files proxy
location /static/ {
proxy_pass http://backend:8000/static/;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
# SPA routing
location / {
try_files $uri $uri/ /index.html;
}
# Cache static assets
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
}

2912
frontend/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

27
frontend/package.json Normal file
View File

@@ -0,0 +1,27 @@
{
"name": "shorts-maker-frontend",
"private": true,
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"react": "^18.2.0",
"react-dom": "^18.2.0",
"axios": "^1.6.5",
"react-router-dom": "^6.21.2",
"lucide-react": "^0.312.0"
},
"devDependencies": {
"@types/react": "^18.2.48",
"@types/react-dom": "^18.2.18",
"@vitejs/plugin-react": "^4.2.1",
"autoprefixer": "^10.4.17",
"postcss": "^8.4.33",
"tailwindcss": "^3.4.1",
"vite": "^5.0.12"
}
}

View File

@@ -0,0 +1,6 @@
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}

74
frontend/src/App.jsx Normal file
View File

@@ -0,0 +1,74 @@
import React, { useState, useEffect } from 'react';
import { BrowserRouter as Router, Routes, Route, Link, useLocation } from 'react-router-dom';
import { Video, List, Music, Settings } from 'lucide-react';
import HomePage from './pages/HomePage';
import JobsPage from './pages/JobsPage';
import BGMPage from './pages/BGMPage';
function NavLink({ to, icon: Icon, children }) {
const location = useLocation();
const isActive = location.pathname === to;
return (
<Link
to={to}
className={`flex items-center gap-2 px-4 py-2 rounded-lg transition-colors ${
isActive
? 'bg-red-600 text-white'
: 'text-gray-400 hover:text-white hover:bg-gray-800'
}`}
>
<Icon size={20} />
<span>{children}</span>
</Link>
);
}
function Layout({ children }) {
return (
<div className="min-h-screen flex flex-col">
{/* Header */}
<header className="bg-gray-900 border-b border-gray-800 px-6 py-4">
<div className="max-w-7xl mx-auto flex items-center justify-between">
<Link to="/" className="flex items-center gap-3">
<Video className="text-red-500" size={32} />
<h1 className="text-xl font-bold">Shorts Maker</h1>
</Link>
<nav className="flex items-center gap-2">
<NavLink to="/" icon={Video}> 작업</NavLink>
<NavLink to="/jobs" icon={List}>작업 목록</NavLink>
<NavLink to="/bgm" icon={Music}>BGM 관리</NavLink>
</nav>
</div>
</header>
{/* Main Content */}
<main className="flex-1 px-6 py-8">
<div className="max-w-7xl mx-auto">
{children}
</div>
</main>
{/* Footer */}
<footer className="bg-gray-900 border-t border-gray-800 px-6 py-4 text-center text-gray-500 text-sm">
Shorts Maker - 중국 쇼츠 영상 한글 자막 변환기
</footer>
</div>
);
}
function App() {
return (
<Router>
<Layout>
<Routes>
<Route path="/" element={<HomePage />} />
<Route path="/jobs" element={<JobsPage />} />
<Route path="/bgm" element={<BGMPage />} />
</Routes>
</Layout>
</Router>
);
}
export default App;

159
frontend/src/api/client.js Normal file
View File

@@ -0,0 +1,159 @@
import axios from 'axios';
const API_BASE = '/api';
const client = axios.create({
baseURL: API_BASE,
headers: {
'Content-Type': 'application/json',
},
});
// Download API
export const downloadApi = {
start: (url) => client.post('/download/', { url }),
getPlatforms: () => client.get('/download/platforms'),
};
// Process API
export const processApi = {
// Legacy: Full auto processing (not used in manual workflow)
start: (jobId, options = {}) =>
client.post('/process/', {
job_id: jobId,
bgm_id: options.bgmId || null,
bgm_volume: options.bgmVolume || 0.3,
subtitle_style: options.subtitleStyle || null,
keep_original_audio: options.keepOriginalAudio || false,
translation_mode: options.translationMode || null,
use_vocal_separation: options.useVocalSeparation || false,
}),
// === Step-by-step Processing API ===
// Skip trimming and proceed to transcription
skipTrim: (jobId) => client.post(`/process/${jobId}/skip-trim`),
// Start transcription step (audio extraction + STT + translation)
startTranscription: (jobId, options = {}) =>
client.post(`/process/${jobId}/start-transcription`, {
translation_mode: options.translationMode || 'rewrite',
use_vocal_separation: options.useVocalSeparation || false,
}),
// Render final video with subtitles and BGM
render: (jobId, options = {}) =>
client.post(`/process/${jobId}/render`, {
bgm_id: options.bgmId || null,
bgm_volume: options.bgmVolume || 0.3,
subtitle_style: options.subtitleStyle || null,
keep_original_audio: options.keepOriginalAudio || false,
// Intro text overlay (YouTube Shorts thumbnail)
intro_text: options.introText || null,
intro_duration: options.introDuration || 0.7,
intro_font_size: options.introFontSize || 100,
}),
// === Other APIs ===
// Re-run GPT translation
retranslate: (jobId) => client.post(`/process/${jobId}/retranslate`),
transcribe: (jobId) => client.post(`/process/${jobId}/transcribe`),
updateTranscript: (jobId, segments) =>
client.put(`/process/${jobId}/transcript`, segments),
// Continue processing for jobs with no audio
continue: (jobId, options = {}) =>
client.post(`/process/${jobId}/continue`, {
job_id: jobId,
bgm_id: options.bgmId || null,
bgm_volume: options.bgmVolume || 0.3,
subtitle_style: options.subtitleStyle || null,
keep_original_audio: options.keepOriginalAudio || false,
}),
// Add manual subtitles
addManualSubtitle: (jobId, segments) =>
client.post(`/process/${jobId}/manual-subtitle`, segments),
// Get video info for trimming
getVideoInfo: (jobId) => client.get(`/process/${jobId}/video-info`),
// Get frame at specific timestamp (for precise trimming preview)
getFrameUrl: (jobId, timestamp) => `${API_BASE}/process/${jobId}/frame?timestamp=${timestamp}`,
// Trim video (reprocess default: false for manual workflow)
trim: (jobId, startTime, endTime, reprocess = false) =>
client.post(`/process/${jobId}/trim`, {
start_time: startTime,
end_time: endTime,
reprocess,
}),
};
// Jobs API
export const jobsApi = {
list: (limit = 50) => client.get('/jobs/', { params: { limit } }),
get: (jobId) => client.get(`/jobs/${jobId}`),
delete: (jobId) => client.delete(`/jobs/${jobId}`),
downloadOutput: (jobId) => `${API_BASE}/jobs/${jobId}/download`,
downloadOriginal: (jobId) => `${API_BASE}/jobs/${jobId}/original`,
downloadSubtitle: (jobId, format = 'ass') =>
`${API_BASE}/jobs/${jobId}/subtitle?format=${format}`,
downloadThumbnail: (jobId) => `${API_BASE}/jobs/${jobId}/thumbnail`,
// Re-edit completed job (reset to awaiting_review)
reEdit: (jobId) => client.post(`/jobs/${jobId}/re-edit`),
};
// Thumbnail API
export const thumbnailApi = {
// Get suggested timestamps for frame selection
getTimestamps: (jobId, count = 5) =>
client.get(`/process/${jobId}/thumbnail-timestamps`, { params: { count } }),
// Generate catchphrase using GPT
generateCatchphrase: (jobId, style = 'homeshopping') =>
client.post(`/process/${jobId}/generate-catchphrase`, null, { params: { style } }),
// Generate thumbnail with text overlay
generate: (jobId, options = {}) =>
client.post(`/process/${jobId}/thumbnail`, null, {
params: {
timestamp: options.timestamp || 2.0,
style: options.style || 'homeshopping',
custom_text: options.customText || null,
font_size: options.fontSize || 80,
position: options.position || 'center',
},
}),
};
// Fonts API
export const fontsApi = {
list: () => client.get('/fonts/'),
recommend: (contentType) => client.get(`/fonts/recommend/${contentType}`),
categories: () => client.get('/fonts/categories'),
};
// BGM API
export const bgmApi = {
list: () => client.get('/bgm/'),
get: (bgmId) => client.get(`/bgm/${bgmId}`),
upload: (file, name) => {
const formData = new FormData();
formData.append('file', file);
if (name) formData.append('name', name);
return client.post('/bgm/upload', formData, {
headers: { 'Content-Type': 'multipart/form-data' },
});
},
delete: (bgmId) => client.delete(`/bgm/${bgmId}`),
// Auto-download BGM from Freesound
autoDownload: (keywords, maxDuration = 60) =>
client.post('/bgm/auto-download', {
keywords,
max_duration: maxDuration,
commercial_only: true,
}),
// Download default BGM tracks (force re-download if needed)
initializeDefaults: (force = false) =>
client.post(`/bgm/defaults/initialize?force=${force}`),
// Get default BGM list with status
getDefaultStatus: () => client.get('/bgm/defaults/status'),
};
export default client;

View File

@@ -0,0 +1,219 @@
import React, { useState, useEffect } from 'react';
import { Plus, Trash2, Save, Clock, PenLine, Volume2, XCircle } from 'lucide-react';
import { processApi } from '../api/client';
export default function ManualSubtitleInput({ job, onProcessWithSubtitle, onProcessWithoutSubtitle, onCancel }) {
const [segments, setSegments] = useState([]);
const [isSaving, setIsSaving] = useState(false);
const [videoDuration, setVideoDuration] = useState(60); // Default 60 seconds
// Initialize with one empty segment
useEffect(() => {
if (segments.length === 0) {
addSegment();
}
}, []);
const addSegment = () => {
const lastEnd = segments.length > 0 ? segments[segments.length - 1].end : 0;
setSegments([
...segments,
{
id: Date.now(),
start: lastEnd,
end: Math.min(lastEnd + 3, videoDuration),
text: '',
translated: '',
},
]);
};
const removeSegment = (id) => {
setSegments(segments.filter((seg) => seg.id !== id));
};
const updateSegment = (id, field, value) => {
setSegments(
segments.map((seg) =>
seg.id === id ? { ...seg, [field]: value } : seg
)
);
};
const formatTimeInput = (seconds) => {
const mins = Math.floor(seconds / 60);
const secs = (seconds % 60).toFixed(1);
return `${mins}:${secs.padStart(4, '0')}`;
};
const parseTimeInput = (timeStr) => {
const parts = timeStr.split(':');
if (parts.length === 2) {
const mins = parseInt(parts[0]) || 0;
const secs = parseFloat(parts[1]) || 0;
return mins * 60 + secs;
}
return parseFloat(timeStr) || 0;
};
const handleSaveSubtitles = async () => {
// Validate segments
const validSegments = segments.filter(
(seg) => seg.text.trim() || seg.translated.trim()
);
if (validSegments.length === 0) {
alert('최소 한 개의 자막을 입력해주세요.');
return;
}
setIsSaving(true);
try {
// Format segments for API
const formattedSegments = validSegments.map((seg) => ({
start: seg.start,
end: seg.end,
text: seg.text.trim() || seg.translated.trim(),
translated: seg.translated.trim() || seg.text.trim(),
}));
await processApi.addManualSubtitle(job.job_id, formattedSegments);
onProcessWithSubtitle?.(formattedSegments);
} catch (err) {
console.error('Failed to save subtitles:', err);
alert('자막 저장 실패: ' + (err.response?.data?.detail || err.message));
} finally {
setIsSaving(false);
}
};
const handleProcessWithoutSubtitle = () => {
onProcessWithoutSubtitle?.();
};
return (
<div className="bg-gray-900 rounded-xl border border-gray-800 p-6">
<div className="flex items-center justify-between mb-4">
<div className="flex items-center gap-2">
<PenLine size={20} className="text-amber-400" />
<h3 className="font-medium text-white">자막 직접 입력</h3>
</div>
<span className="text-xs text-gray-500">
{segments.length} 세그먼트
</span>
</div>
<p className="text-sm text-gray-400 mb-4">
영상에 표시할 자막을 직접 입력하세요. 시작/종료 시간과 텍스트를 설정합니다.
</p>
{/* Segments List */}
<div className="space-y-3 max-h-96 overflow-y-auto mb-4">
{segments.map((segment, index) => (
<div
key={segment.id}
className="p-4 bg-gray-800/50 rounded-lg border border-gray-700"
>
<div className="flex items-center justify-between mb-3">
<span className="text-xs text-gray-500 font-mono">#{index + 1}</span>
<button
onClick={() => removeSegment(segment.id)}
className="text-gray-500 hover:text-red-400 transition-colors"
disabled={segments.length === 1}
>
<Trash2 size={16} />
</button>
</div>
{/* Time Inputs */}
<div className="flex gap-4 mb-3">
<div className="flex-1">
<label className="text-xs text-gray-500 mb-1 block">
<Clock size={12} className="inline mr-1" />
시작 ()
</label>
<input
type="number"
step="0.1"
min="0"
value={segment.start}
onChange={(e) =>
updateSegment(segment.id, 'start', parseFloat(e.target.value) || 0)
}
className="w-full bg-gray-900 border border-gray-700 rounded px-3 py-2 text-sm focus:outline-none focus:border-amber-500"
/>
</div>
<div className="flex-1">
<label className="text-xs text-gray-500 mb-1 block">
<Clock size={12} className="inline mr-1" />
종료 ()
</label>
<input
type="number"
step="0.1"
min="0"
value={segment.end}
onChange={(e) =>
updateSegment(segment.id, 'end', parseFloat(e.target.value) || 0)
}
className="w-full bg-gray-900 border border-gray-700 rounded px-3 py-2 text-sm focus:outline-none focus:border-amber-500"
/>
</div>
</div>
{/* Text Input */}
<div>
<label className="text-xs text-gray-500 mb-1 block">자막 텍스트</label>
<textarea
value={segment.translated || segment.text}
onChange={(e) => updateSegment(segment.id, 'translated', e.target.value)}
placeholder="표시할 자막을 입력하세요..."
className="w-full bg-gray-900 border border-gray-700 rounded px-3 py-2 text-sm focus:outline-none focus:border-amber-500 resize-none"
rows={2}
/>
</div>
</div>
))}
</div>
{/* Add Segment Button */}
<button
onClick={addSegment}
className="w-full py-2 border border-dashed border-gray-700 rounded-lg text-gray-400 hover:text-white hover:border-gray-600 transition-colors flex items-center justify-center gap-2 mb-4"
>
<Plus size={16} />
자막 추가
</button>
{/* Action Buttons */}
<div className="space-y-3">
<div className="flex gap-3">
<button
onClick={handleSaveSubtitles}
disabled={isSaving || segments.every((s) => !s.text && !s.translated)}
className="flex-1 btn-primary py-3 flex items-center justify-center gap-2"
>
<PenLine size={18} />
{isSaving ? '저장 중...' : '자막으로 처리'}
</button>
<button
onClick={handleProcessWithoutSubtitle}
className="flex-1 btn-secondary py-3 flex items-center justify-center gap-2"
>
<Volume2 size={18} />
BGM만 추가
</button>
</div>
{onCancel && (
<button
onClick={onCancel}
className="w-full py-2 border border-gray-700 rounded-lg text-gray-400 hover:text-red-400 hover:border-red-800 transition-colors flex items-center justify-center gap-2"
>
<XCircle size={16} />
작업 취소
</button>
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,474 @@
import { useState } from 'react';
import {
Download,
Languages,
Film,
CheckCircle,
XCircle,
Loader,
ChevronDown,
ChevronUp,
Volume2,
VolumeX,
PenLine,
Music2,
Mic,
FileAudio,
Wand2,
RefreshCw,
} from 'lucide-react';
// 6-step pipeline
const PIPELINE_STEPS = [
{
key: 'downloading',
label: '영상 다운로드',
icon: Download,
description: 'yt-dlp로 원본 영상 다운로드',
statusKey: 'downloading',
},
{
key: 'extracting_audio',
label: '오디오 추출',
icon: FileAudio,
description: 'FFmpeg로 오디오 스트림 추출',
statusKey: 'extracting_audio',
},
{
key: 'noise_reduction',
label: '노이즈 제거',
icon: Wand2,
description: '배경 잡음 제거 및 음성 강화',
statusKey: 'noise_reduction',
},
{
key: 'transcribing',
label: 'Whisper STT',
icon: Mic,
description: 'OpenAI Whisper로 음성 인식',
statusKey: 'transcribing',
},
{
key: 'translating',
label: '요약 + 한국어 변환',
icon: Languages,
description: 'GPT로 요약 및 자연스러운 한국어 작성 (중국어만)',
statusKey: 'translating',
},
{
key: 'processing',
label: '영상 합성 + BGM',
icon: Film,
description: 'FFmpeg로 자막 합성 및 BGM 추가',
statusKey: 'processing',
},
];
// Status progression order
const STATUS_ORDER = [
'pending',
'downloading',
'ready_for_trim', // Download complete, ready for trimming
'trimming', // Video trimming step
'extracting_audio',
'noise_reduction',
'transcribing',
'awaiting_subtitle',
'translating',
'awaiting_review', // Script ready, waiting for user review
'processing',
'completed',
];
function getStepStatus(step, job) {
const currentIndex = STATUS_ORDER.indexOf(job.status);
const stepIndex = STATUS_ORDER.indexOf(step.statusKey);
if (job.status === 'failed') {
// Check if this step was the one that failed
if (stepIndex <= currentIndex) {
return stepIndex === currentIndex ? 'failed' : 'completed';
}
return 'pending';
}
if (job.status === 'completed') {
// Check if translation was skipped (non-Chinese)
if (step.key === 'translating' && job.detected_language && !['zh', 'zh-cn', 'zh-tw', 'chinese', 'mandarin'].includes(job.detected_language.toLowerCase())) {
return 'skipped';
}
return 'completed';
}
if (job.status === 'awaiting_subtitle') {
// Special handling for no-audio case
if (step.key === 'transcribing') {
return 'warning'; // Show as warning (no audio)
}
if (stepIndex < STATUS_ORDER.indexOf('transcribing')) {
return 'completed';
}
return 'pending';
}
if (stepIndex < currentIndex) {
return 'completed';
}
if (stepIndex === currentIndex) {
return 'active';
}
return 'pending';
}
function StepIcon({ step, status }) {
const Icon = step.icon;
const baseClasses = 'w-10 h-10 rounded-xl flex items-center justify-center transition-all duration-300';
switch (status) {
case 'completed':
return (
<div className={`${baseClasses} bg-green-500 shadow-lg shadow-green-500/30`}>
<CheckCircle size={20} className="text-white" />
</div>
);
case 'active':
return (
<div className={`${baseClasses} bg-red-500 shadow-lg shadow-red-500/30 animate-pulse`}>
<Loader size={20} className="text-white animate-spin" />
</div>
);
case 'failed':
return (
<div className={`${baseClasses} bg-red-600 shadow-lg shadow-red-600/30`}>
<XCircle size={20} className="text-white" />
</div>
);
case 'warning':
return (
<div className={`${baseClasses} bg-amber-500 shadow-lg shadow-amber-500/30`}>
<VolumeX size={20} className="text-white" />
</div>
);
case 'skipped':
return (
<div className={`${baseClasses} bg-gray-600 shadow-lg shadow-gray-600/20`}>
<CheckCircle size={20} className="text-gray-300" />
</div>
);
default:
return (
<div className={`${baseClasses} bg-gray-700`}>
<Icon size={20} className="text-gray-400" />
</div>
);
}
}
function StepCard({ step, status, index, isLast, job, onRetranslate }) {
const [isExpanded, setIsExpanded] = useState(false);
const [isRetranslating, setIsRetranslating] = useState(false);
const getStatusBadge = () => {
switch (status) {
case 'completed':
return <span className="text-xs px-2 py-0.5 bg-green-500/20 text-green-400 rounded-full">완료</span>;
case 'active':
return <span className="text-xs px-2 py-0.5 bg-red-500/20 text-red-400 rounded-full animate-pulse">진행 </span>;
case 'failed':
return <span className="text-xs px-2 py-0.5 bg-red-600/20 text-red-400 rounded-full">실패</span>;
case 'warning':
return (
<span className="text-xs px-2 py-0.5 bg-amber-500/20 text-amber-400 rounded-full">
{job.audio_status === 'singing_only' ? '노래만 감지' : '오디오 없음'}
</span>
);
case 'skipped':
return <span className="text-xs px-2 py-0.5 bg-gray-600/20 text-gray-400 rounded-full">스킵</span>;
default:
return <span className="text-xs px-2 py-0.5 bg-gray-700/50 text-gray-500 rounded-full">대기</span>;
}
};
const getTextColor = () => {
switch (status) {
case 'completed':
return 'text-green-400';
case 'active':
return 'text-white';
case 'failed':
return 'text-red-400';
case 'warning':
return 'text-amber-400';
case 'skipped':
return 'text-gray-400';
default:
return 'text-gray-500';
}
};
return (
<div className="flex items-start gap-4">
{/* Step Icon & Connector */}
<div className="flex flex-col items-center">
<StepIcon step={step} status={status} />
{!isLast && (
<div className={`w-0.5 h-16 mt-2 transition-colors duration-500 ${
status === 'completed' ? 'bg-green-500' : 'bg-gray-700'
}`} />
)}
</div>
{/* Step Content */}
<div className="flex-1 pb-6">
<div className="flex items-center gap-3 mb-1">
<span className="text-xs font-mono text-gray-500">0{index + 1}</span>
<h4 className={`font-semibold ${getTextColor()}`}>{step.label}</h4>
{getStatusBadge()}
</div>
<p className={`text-sm ${status === 'pending' ? 'text-gray-600' : 'text-gray-400'}`}>
{status === 'warning'
? job.audio_status === 'singing_only'
? '노래/배경음악만 감지됨 - 음성 없음'
: '오디오가 없거나 무음입니다'
: status === 'skipped'
? `중국어가 아닌 콘텐츠 (${job.detected_language || '알 수 없음'}) - GPT 스킵`
: step.description}
</p>
{/* Expandable Details */}
{status === 'completed' && step.key === 'transcribing' && job.transcript?.length > 0 && (
<div className="mt-3">
<button
onClick={() => setIsExpanded(!isExpanded)}
className="flex items-center gap-2 text-xs text-gray-400 hover:text-white transition-colors"
>
{isExpanded ? <ChevronUp size={14} /> : <ChevronDown size={14} />}
인식 결과 ({job.transcript.length} 세그먼트)
</button>
{isExpanded && (
<div className="mt-2 p-3 bg-gray-800/50 rounded-lg border border-gray-700 max-h-64 overflow-y-auto">
{job.transcript.map((seg, i) => (
<div key={i} className="text-xs text-gray-300 mb-1">
<span className="text-gray-500 font-mono mr-2">
{formatTime(seg.start)}
</span>
{seg.text}
</div>
))}
</div>
)}
</div>
)}
{status === 'completed' && step.key === 'translating' && job.transcript?.some((s) => s.translated) && (
<div className="mt-3">
<div className="flex items-center gap-3">
<button
onClick={() => setIsExpanded(!isExpanded)}
className="flex items-center gap-2 text-xs text-gray-400 hover:text-white transition-colors"
>
{isExpanded ? <ChevronUp size={14} /> : <ChevronDown size={14} />}
번역 결과 보기
</button>
{onRetranslate && (
<button
onClick={async () => {
setIsRetranslating(true);
await onRetranslate();
setIsRetranslating(false);
}}
disabled={isRetranslating}
className="flex items-center gap-1.5 text-xs px-2 py-1 bg-blue-600/20 text-blue-400 hover:bg-blue-600/30 rounded-md transition-colors disabled:opacity-50"
>
<RefreshCw size={12} className={isRetranslating ? 'animate-spin' : ''} />
{isRetranslating ? '재번역 중...' : 'GPT 재번역'}
</button>
)}
</div>
{isExpanded && (
<div className="mt-2 p-3 bg-gray-800/50 rounded-lg border border-gray-700 max-h-64 overflow-y-auto space-y-2">
{job.transcript
.filter((s) => s.translated)
.map((seg, i) => (
<div key={i} className="text-xs">
<div className="text-gray-500 line-through">{seg.text}</div>
<div className="text-white mt-0.5">{seg.translated}</div>
</div>
))}
</div>
)}
</div>
)}
</div>
</div>
);
}
function formatTime(seconds) {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, '0')}`;
}
export default function PipelineView({ job, onRetranslate }) {
const getOverallProgress = () => {
if (job.status === 'completed') return 100;
if (job.status === 'failed') return job.progress || 0;
const currentIndex = STATUS_ORDER.indexOf(job.status);
const totalSteps = PIPELINE_STEPS.length;
const stepProgress = (currentIndex / (totalSteps + 2)) * 100;
return Math.min(Math.round(stepProgress), 99);
};
const getStatusText = () => {
switch (job.status) {
case 'completed':
return '처리 완료';
case 'failed':
return '처리 실패';
case 'awaiting_subtitle':
return '자막 입력 대기';
case 'awaiting_review':
return '스크립트 확인 대기';
case 'ready_for_trim':
return '다운로드 완료';
case 'pending':
return '시작 대기';
case 'trimming':
return '영상 자르기 중...';
default:
return '처리 중...';
}
};
const getStatusColor = () => {
switch (job.status) {
case 'completed':
return 'text-green-400';
case 'failed':
return 'text-red-400';
case 'awaiting_subtitle':
return 'text-amber-400';
case 'awaiting_review':
return 'text-blue-400';
case 'ready_for_trim':
return 'text-blue-400';
case 'trimming':
return 'text-amber-400';
default:
return 'text-white';
}
};
return (
<div className="space-y-6">
{/* Progress Header */}
<div className="flex items-center justify-between">
<div className="flex items-center gap-4">
<div className={`text-3xl font-bold ${getStatusColor()}`}>
{job.progress || getOverallProgress()}%
</div>
<div>
<div className={`font-medium ${getStatusColor()}`}>{getStatusText()}</div>
<div className="text-sm text-gray-500">
{job.original_url?.substring(0, 40)}...
</div>
</div>
</div>
</div>
{/* Progress Bar */}
<div className="relative h-2 bg-gray-800 rounded-full overflow-hidden">
<div
className={`absolute inset-y-0 left-0 transition-all duration-700 ease-out ${
job.status === 'failed'
? 'bg-red-500'
: job.status === 'completed'
? 'bg-green-500'
: job.status === 'awaiting_subtitle'
? 'bg-amber-500'
: job.status === 'awaiting_review' || job.status === 'ready_for_trim'
? 'bg-blue-500'
: 'bg-gradient-to-r from-red-500 to-red-400'
}`}
style={{ width: `${job.progress || getOverallProgress()}%` }}
/>
{!['completed', 'failed', 'awaiting_subtitle', 'awaiting_review', 'ready_for_trim', 'pending'].includes(job.status) && (
<div
className="absolute inset-y-0 bg-white/20 animate-pulse"
style={{
left: `${(job.progress || getOverallProgress()) - 5}%`,
width: '5%',
}}
/>
)}
</div>
{/* Pipeline Steps */}
<div className="mt-8">
{PIPELINE_STEPS.map((step, index) => (
<StepCard
key={step.key}
step={step}
status={getStepStatus(step, job)}
index={index}
isLast={index === PIPELINE_STEPS.length - 1}
job={job}
onRetranslate={step.key === 'translating' ? onRetranslate : undefined}
/>
))}
</div>
{/* No Audio Alert */}
{job.status === 'awaiting_subtitle' && (
<div className="p-4 bg-amber-900/20 border border-amber-700/50 rounded-xl">
<div className="flex items-center gap-2 text-amber-400 font-medium mb-3">
{job.audio_status === 'singing_only' ? <Music2 size={18} /> : <VolumeX size={18} />}
{job.audio_status === 'no_audio_stream'
? '오디오 스트림 없음'
: job.audio_status === 'singing_only'
? '노래/배경음악만 감지됨'
: '무음 오디오'}
</div>
<p className="text-amber-200/80 text-sm mb-4">
{job.audio_status === 'no_audio_stream'
? '이 영상에는 오디오 트랙이 없습니다.'
: job.audio_status === 'singing_only'
? '음성 없이 배경음악/노래만 감지되었습니다.'
: '오디오가 거의 무음입니다.'}
</p>
<div className="grid grid-cols-2 gap-3">
<div className="p-3 bg-gray-800/50 rounded-lg border border-gray-700/50">
<div className="flex items-center gap-2 text-white font-medium text-sm mb-1">
<PenLine size={14} />
자막 직접 입력
</div>
<p className="text-xs text-gray-400">원하는 자막을 수동으로 입력</p>
</div>
<div className="p-3 bg-gray-800/50 rounded-lg border border-gray-700/50">
<div className="flex items-center gap-2 text-white font-medium text-sm mb-1">
<Volume2 size={14} />
BGM만 추가
</div>
<p className="text-xs text-gray-400">자막 없이 음악만 추가</p>
</div>
</div>
</div>
)}
{/* Error Display */}
{job.error && (
<div className="p-4 bg-red-900/20 border border-red-800/50 rounded-xl">
<div className="flex items-center gap-2 text-red-400 font-medium mb-2">
<XCircle size={18} />
오류 발생
</div>
<p className="text-red-300/80 text-sm font-mono">{job.error}</p>
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,93 @@
import React from 'react';
import { CheckCircle, XCircle, Loader } from 'lucide-react';
const STEPS = [
{ key: 'downloading', label: '영상 다운로드' },
{ key: 'transcribing', label: '음성 인식 (Whisper)' },
{ key: 'translating', label: '한글 번역 (GPT)' },
{ key: 'processing', label: '영상 처리 (FFmpeg)' },
];
const STATUS_ORDER = ['pending', 'downloading', 'transcribing', 'translating', 'processing', 'completed'];
export default function ProcessingStatus({ status, progress, error }) {
const currentIndex = STATUS_ORDER.indexOf(status);
return (
<div className="space-y-4">
{/* Progress bar */}
<div className="relative">
<div className="h-2 bg-gray-800 rounded-full overflow-hidden">
<div
className={`h-full transition-all duration-500 ${
status === 'failed' ? 'bg-red-500' : 'bg-red-500'
}`}
style={{ width: `${progress}%` }}
/>
</div>
<div className="mt-2 flex justify-between text-sm text-gray-500">
<span>{progress}%</span>
<span>
{status === 'completed' && '완료!'}
{status === 'failed' && '실패'}
{!['completed', 'failed'].includes(status) && '처리 중...'}
</span>
</div>
</div>
{/* Steps */}
<div className="space-y-3">
{STEPS.map((step, index) => {
const stepIndex = STATUS_ORDER.indexOf(step.key);
const isComplete = currentIndex > stepIndex || status === 'completed';
const isCurrent = status === step.key;
const isFailed = status === 'failed' && isCurrent;
return (
<div
key={step.key}
className={`flex items-center gap-3 p-3 rounded-lg transition-colors ${
isCurrent ? 'bg-gray-800' : ''
}`}
>
<div className="flex-shrink-0">
{isComplete ? (
<CheckCircle className="text-green-500" size={20} />
) : isFailed ? (
<XCircle className="text-red-500" size={20} />
) : isCurrent ? (
<Loader className="text-red-500 animate-spin" size={20} />
) : (
<div className="w-5 h-5 rounded-full border-2 border-gray-600" />
)}
</div>
<span
className={`${
isComplete
? 'text-green-500'
: isFailed
? 'text-red-500'
: isCurrent
? 'text-white'
: 'text-gray-500'
}`}
>
{step.label}
</span>
{isCurrent && !isFailed && (
<span className="text-sm text-gray-500 ml-auto">진행 ...</span>
)}
</div>
);
})}
</div>
{/* Error message */}
{error && (
<div className="p-4 bg-red-900/30 border border-red-800 rounded-lg text-red-400">
<strong>오류:</strong> {error}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,130 @@
import React, { useState } from 'react';
import { Edit2, Save, X } from 'lucide-react';
import { processApi } from '../api/client';
export default function SubtitleEditor({ segments, jobId, onUpdate, compact = false }) {
const [editingIndex, setEditingIndex] = useState(null);
const [editText, setEditText] = useState('');
const [isSaving, setIsSaving] = useState(false);
const formatTime = (seconds) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
const ms = Math.floor((seconds % 1) * 100);
return `${mins}:${String(secs).padStart(2, '0')}.${String(ms).padStart(2, '0')}`;
};
const handleEdit = (index) => {
setEditingIndex(index);
setEditText(segments[index].translated || segments[index].text);
};
const handleSave = async () => {
if (editingIndex === null) return;
const updatedSegments = [...segments];
updatedSegments[editingIndex] = {
...updatedSegments[editingIndex],
translated: editText,
};
setIsSaving(true);
try {
await processApi.updateTranscript(jobId, updatedSegments);
onUpdate(updatedSegments);
setEditingIndex(null);
} catch (err) {
console.error('Failed to save:', err);
alert('저장 실패');
} finally {
setIsSaving(false);
}
};
const handleCancel = () => {
setEditingIndex(null);
setEditText('');
};
// Show all segments (scrollable)
const displayedSegments = segments;
return (
<div>
{!compact && (
<>
<h3 className="font-medium mb-4">자막 편집</h3>
<p className="text-sm text-gray-500 mb-4">
번역된 자막을 수정할 있습니다. 클릭하여 편집하세요.
</p>
</>
)}
<div className={`space-y-2 overflow-y-auto ${compact ? 'max-h-64' : 'max-h-[500px]'}`}>
{displayedSegments.map((segment, index) => (
<div
key={index}
className={`p-3 rounded-lg border transition-colors ${
editingIndex === index
? 'border-red-500 bg-gray-800'
: 'border-gray-800 hover:border-gray-700'
}`}
>
<div className="flex items-start gap-3">
<span className="text-xs text-gray-500 font-mono whitespace-nowrap pt-1">
{formatTime(segment.start)}
</span>
{editingIndex === index ? (
<div className="flex-1 space-y-2">
<textarea
value={editText}
onChange={(e) => setEditText(e.target.value)}
className="w-full bg-gray-900 border border-gray-700 rounded px-3 py-2 text-sm focus:outline-none focus:border-red-500"
rows={2}
autoFocus
/>
<div className="flex gap-2">
<button
onClick={handleSave}
disabled={isSaving}
className="btn-primary py-1 px-3 text-sm flex items-center gap-1"
>
<Save size={14} />
{isSaving ? '저장 중...' : '저장'}
</button>
<button
onClick={handleCancel}
className="btn-secondary py-1 px-3 text-sm flex items-center gap-1"
>
<X size={14} />
취소
</button>
</div>
</div>
) : (
<div
className="flex-1 cursor-pointer group"
onClick={() => handleEdit(index)}
>
<div className="text-sm">
{segment.translated || segment.text}
</div>
{segment.translated && segment.text !== segment.translated && (
<div className="text-xs text-gray-500 mt-1">
원문: {segment.text}
</div>
)}
<Edit2
size={14}
className="text-gray-600 group-hover:text-gray-400 mt-1"
/>
</div>
)}
</div>
</div>
))}
</div>
</div>
);
}

View File

@@ -0,0 +1,308 @@
import React, { useState, useEffect, useRef } from 'react';
import { Image, Download, Wand2, RefreshCw } from 'lucide-react';
import { thumbnailApi, jobsApi } from '../api/client';
const THUMBNAIL_STYLES = [
{ id: 'homeshopping', name: '홈쇼핑', desc: '강렬한 어필' },
{ id: 'viral', name: '바이럴', desc: '호기심 유발' },
{ id: 'informative', name: '정보성', desc: '명확한 전달' },
];
export default function ThumbnailGenerator({ jobId, onClose }) {
const videoRef = useRef(null);
const [duration, setDuration] = useState(0);
const [currentTime, setCurrentTime] = useState(2.0);
const [thumbnailStyle, setThumbnailStyle] = useState('homeshopping');
const [thumbnailText, setThumbnailText] = useState('');
const [fontSize, setFontSize] = useState(80);
const [position, setPosition] = useState('center');
const [isGenerating, setIsGenerating] = useState(false);
const [isGeneratingText, setIsGeneratingText] = useState(false);
const [generatedUrl, setGeneratedUrl] = useState(null);
const [error, setError] = useState(null);
// Load video metadata
useEffect(() => {
if (videoRef.current) {
videoRef.current.addEventListener('loadedmetadata', () => {
setDuration(videoRef.current.duration);
// Start at 10% of video
const startTime = videoRef.current.duration * 0.1;
setCurrentTime(startTime);
videoRef.current.currentTime = startTime;
});
}
}, []);
// Seek video when currentTime changes
useEffect(() => {
if (videoRef.current && videoRef.current.readyState >= 2) {
videoRef.current.currentTime = currentTime;
}
}, [currentTime]);
const handleSliderChange = (e) => {
const time = parseFloat(e.target.value);
setCurrentTime(time);
};
const handleGenerateText = async () => {
setIsGeneratingText(true);
setError(null);
try {
const res = await thumbnailApi.generateCatchphrase(jobId, thumbnailStyle);
setThumbnailText(res.data.catchphrase || '');
} catch (err) {
setError(err.response?.data?.detail || '문구 생성 실패');
} finally {
setIsGeneratingText(false);
}
};
const handleGenerate = async () => {
setIsGenerating(true);
setError(null);
try {
const res = await thumbnailApi.generate(jobId, {
timestamp: currentTime,
style: thumbnailStyle,
customText: thumbnailText || null,
fontSize: fontSize,
position: position,
});
setGeneratedUrl(`${jobsApi.downloadThumbnail(jobId)}?t=${Date.now()}`);
if (res.data.text && !thumbnailText) {
setThumbnailText(res.data.text);
}
} catch (err) {
setError(err.response?.data?.detail || '썸네일 생성 실패');
} finally {
setIsGenerating(false);
}
};
const formatTime = (seconds) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, '0')}`;
};
// Calculate text position style
const getTextPositionStyle = () => {
const base = {
position: 'absolute',
left: '50%',
transform: 'translateX(-50%)',
textAlign: 'center',
fontWeight: 'bold',
color: 'white',
textShadow: `
-2px -2px 0 #000,
2px -2px 0 #000,
-2px 2px 0 #000,
2px 2px 0 #000,
-3px 0 0 #000,
3px 0 0 #000,
0 -3px 0 #000,
0 3px 0 #000
`,
fontSize: `${fontSize * 0.5}px`, // Scale down for preview
maxWidth: '90%',
wordBreak: 'keep-all',
};
switch (position) {
case 'top':
return { ...base, top: '15%' };
case 'bottom':
return { ...base, bottom: '20%' };
default: // center
return { ...base, top: '50%', transform: 'translate(-50%, -50%)' };
}
};
return (
<div className="space-y-4">
<div className="flex items-center justify-between mb-2">
<h3 className="font-semibold text-pink-400 flex items-center gap-2">
<Image size={20} />
썸네일 생성
</h3>
{onClose && (
<button
onClick={onClose}
className="text-gray-400 hover:text-white text-sm"
>
닫기
</button>
)}
</div>
{/* Video Preview with Text Overlay */}
<div className="relative bg-black rounded-lg overflow-hidden">
<video
ref={videoRef}
src={jobsApi.downloadOriginal(jobId)}
className="w-full h-auto"
muted
playsInline
preload="metadata"
/>
{/* Text Overlay Preview */}
{thumbnailText && (
<div style={getTextPositionStyle()}>
{thumbnailText}
</div>
)}
{/* Time indicator */}
<div className="absolute bottom-2 right-2 bg-black/70 text-white text-xs px-2 py-1 rounded">
{formatTime(currentTime)}
</div>
</div>
{/* Timeline Slider */}
<div className="space-y-1">
<label className="text-sm text-gray-400">프레임 선택</label>
<input
type="range"
min={0}
max={duration || 30}
step={0.1}
value={currentTime}
onChange={handleSliderChange}
className="w-full h-2 bg-gray-700 rounded-lg appearance-none cursor-pointer accent-pink-500"
/>
<div className="flex justify-between text-xs text-gray-500">
<span>0:00</span>
<span className="text-pink-400 font-medium">{formatTime(currentTime)}</span>
<span>{formatTime(duration)}</span>
</div>
</div>
{/* Style Selection */}
<div>
<label className="block text-sm text-gray-400 mb-2">문구 스타일</label>
<div className="grid grid-cols-3 gap-2">
{THUMBNAIL_STYLES.map((style) => (
<button
key={style.id}
onClick={() => setThumbnailStyle(style.id)}
className={`p-2 rounded-lg border text-sm transition-colors ${
thumbnailStyle === style.id
? 'border-pink-500 bg-pink-500/10'
: 'border-gray-700 hover:border-gray-600'
}`}
>
<div>{style.name}</div>
<div className="text-xs text-gray-500">{style.desc}</div>
</button>
))}
</div>
</div>
{/* Text Input */}
<div>
<label className="block text-sm text-gray-400 mb-2">썸네일 문구</label>
<div className="flex gap-2">
<input
type="text"
value={thumbnailText}
onChange={(e) => setThumbnailText(e.target.value)}
placeholder="문구를 입력하거나 자동 생성..."
maxLength={20}
className="flex-1 p-2 bg-gray-800 border border-gray-700 rounded-lg text-white text-sm focus:outline-none focus:border-pink-500"
/>
<button
onClick={handleGenerateText}
disabled={isGeneratingText}
className="px-3 py-2 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 rounded-lg transition-colors disabled:opacity-50"
title="GPT로 문구 생성"
>
{isGeneratingText ? (
<RefreshCw size={18} className="animate-spin" />
) : (
<Wand2 size={18} />
)}
</button>
</div>
<div className="text-xs text-gray-500 mt-1">
{thumbnailText.length}/20 (최대 15 권장)
</div>
</div>
{/* Font Size & Position */}
<div className="grid grid-cols-2 gap-4">
<div>
<label className="block text-sm text-gray-400 mb-2">
글자 크기: {fontSize}px
</label>
<input
type="range"
min="40"
max="120"
value={fontSize}
onChange={(e) => setFontSize(parseInt(e.target.value))}
className="w-full accent-pink-500"
/>
</div>
<div>
<label className="block text-sm text-gray-400 mb-2">위치</label>
<div className="flex gap-1">
{['top', 'center', 'bottom'].map((pos) => (
<button
key={pos}
onClick={() => setPosition(pos)}
className={`flex-1 p-2 rounded border text-xs transition-colors ${
position === pos
? 'border-pink-500 bg-pink-500/10'
: 'border-gray-700 hover:border-gray-600'
}`}
>
{pos === 'top' ? '상단' : pos === 'center' ? '중앙' : '하단'}
</button>
))}
</div>
</div>
</div>
{/* Error Message */}
{error && (
<div className="p-2 bg-red-900/30 border border-red-800 rounded-lg text-red-400 text-sm">
{error}
</div>
)}
{/* Generate Button */}
<button
onClick={handleGenerate}
disabled={isGenerating}
className="w-full py-3 bg-gradient-to-r from-pink-600 to-purple-600 hover:from-pink-500 hover:to-purple-500 rounded-lg font-medium flex items-center justify-center gap-2 disabled:opacity-50 transition-all"
>
<Image size={18} className={isGenerating ? 'animate-pulse' : ''} />
{isGenerating ? '생성 중...' : '썸네일 생성'}
</button>
{/* Generated Thumbnail Preview & Download */}
{generatedUrl && (
<div className="space-y-3 pt-4 border-t border-gray-700">
<h4 className="text-sm text-gray-400">생성된 썸네일</h4>
<div className="border border-gray-700 rounded-lg overflow-hidden">
<img
src={generatedUrl}
alt="Generated Thumbnail"
className="w-full h-auto"
/>
</div>
<a
href={generatedUrl}
download={`thumbnail_${jobId}.jpg`}
className="w-full py-2 border border-pink-700/50 bg-pink-900/20 hover:bg-pink-900/30 rounded-lg transition-colors flex items-center justify-center gap-2 text-pink-300"
>
<Download size={18} />
썸네일 다운로드
</a>
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,69 @@
import React from 'react';
import { Download, FileText, Film } from 'lucide-react';
import { jobsApi } from '../api/client';
export default function VideoPreview({ videoUrl, jobId }) {
return (
<div className="space-y-4">
<h3 className="font-medium flex items-center gap-2">
<Film size={20} className="text-green-500" />
처리 완료!
</h3>
{/* Video Player */}
<div className="relative bg-black rounded-lg overflow-hidden aspect-[9/16] max-w-sm mx-auto">
<video
src={videoUrl}
controls
className="w-full h-full object-contain"
poster=""
>
브라우저가 비디오를 지원하지 않습니다.
</video>
</div>
{/* Download Buttons */}
<div className="flex flex-wrap gap-3 justify-center">
<a
href={jobsApi.downloadOutput(jobId)}
download
className="btn-primary flex items-center gap-2"
>
<Download size={18} />
영상 다운로드
</a>
<a
href={jobsApi.downloadOriginal(jobId)}
download
className="btn-secondary flex items-center gap-2"
>
<Film size={18} />
원본 영상
</a>
<a
href={jobsApi.downloadSubtitle(jobId, 'srt')}
download
className="btn-secondary flex items-center gap-2"
>
<FileText size={18} />
자막 (SRT)
</a>
<a
href={jobsApi.downloadSubtitle(jobId, 'ass')}
download
className="btn-secondary flex items-center gap-2"
>
<FileText size={18} />
자막 (ASS)
</a>
</div>
<p className="text-sm text-gray-500 text-center">
영상을 YouTube Shorts에 업로드하세요!
</p>
</div>
);
}

View File

@@ -0,0 +1,498 @@
import { useState, useEffect, useRef, useCallback } from 'react';
import { Scissors, Play, Pause, RotateCcw, Check, AlertCircle, Image, ChevronsLeft, ChevronLeft, ChevronRight, ChevronsRight } from 'lucide-react';
import { processApi, jobsApi } from '../api/client';
function formatTime(seconds) {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
const ms = Math.floor((seconds % 1) * 10);
return `${mins}:${secs.toString().padStart(2, '0')}.${ms}`;
}
function formatTimeDetailed(seconds) {
return seconds.toFixed(2);
}
export default function VideoTrimmer({ jobId, onTrimComplete, onCancel }) {
const videoRef = useRef(null);
const [videoInfo, setVideoInfo] = useState(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState(null);
const [trimming, setTrimming] = useState(false);
// Trim range state
const [startTime, setStartTime] = useState(0);
const [endTime, setEndTime] = useState(0);
const [currentTime, setCurrentTime] = useState(0);
const [isPlaying, setIsPlaying] = useState(false);
const [cacheBuster, setCacheBuster] = useState(Date.now());
// Frame preview state
const [startFrameUrl, setStartFrameUrl] = useState(null);
const [endFrameUrl, setEndFrameUrl] = useState(null);
const [loadingStartFrame, setLoadingStartFrame] = useState(false);
const [loadingEndFrame, setLoadingEndFrame] = useState(false);
// Load video info
useEffect(() => {
const loadVideoInfo = async () => {
try {
setLoading(true);
const response = await processApi.getVideoInfo(jobId);
setVideoInfo(response.data);
setEndTime(response.data.duration);
} catch (err) {
setError(err.response?.data?.detail || 'Failed to load video info');
} finally {
setLoading(false);
}
};
loadVideoInfo();
}, [jobId]);
// Update current time while playing
useEffect(() => {
const video = videoRef.current;
if (!video) return;
const handleTimeUpdate = () => {
setCurrentTime(video.currentTime);
// Stop at end time
if (video.currentTime >= endTime) {
video.pause();
video.currentTime = startTime;
setIsPlaying(false);
}
};
video.addEventListener('timeupdate', handleTimeUpdate);
return () => video.removeEventListener('timeupdate', handleTimeUpdate);
}, [startTime, endTime]);
// Load frame previews with debouncing
const loadStartFrame = useCallback(async (time) => {
setLoadingStartFrame(true);
try {
const url = processApi.getFrameUrl(jobId, time);
setStartFrameUrl(`${url}&t=${Date.now()}`);
} finally {
setLoadingStartFrame(false);
}
}, [jobId]);
const loadEndFrame = useCallback(async (time) => {
setLoadingEndFrame(true);
try {
const url = processApi.getFrameUrl(jobId, time);
setEndFrameUrl(`${url}&t=${Date.now()}`);
} finally {
setLoadingEndFrame(false);
}
}, [jobId]);
// Load initial frames when video info is loaded
useEffect(() => {
if (videoInfo) {
loadStartFrame(0);
loadEndFrame(videoInfo.duration - 0.1);
}
}, [videoInfo, loadStartFrame, loadEndFrame]);
const handlePlayPause = () => {
const video = videoRef.current;
if (!video) return;
if (isPlaying) {
video.pause();
} else {
if (video.currentTime < startTime || video.currentTime >= endTime) {
video.currentTime = startTime;
}
video.play();
}
setIsPlaying(!isPlaying);
};
const handleSeek = (time) => {
const video = videoRef.current;
if (!video) return;
video.currentTime = time;
setCurrentTime(time);
};
const handleStartChange = (value) => {
const newStart = Math.max(0, Math.min(parseFloat(value) || 0, endTime - 0.1));
setStartTime(newStart);
handleSeek(newStart);
loadStartFrame(newStart);
};
const handleEndChange = (value) => {
const maxDuration = videoInfo?.duration || 0;
const newEnd = Math.min(maxDuration, Math.max(parseFloat(value) || 0, startTime + 0.1));
setEndTime(newEnd);
handleSeek(Math.max(0, newEnd - 0.5));
loadEndFrame(newEnd - 0.1);
};
const adjustStart = (delta) => {
const newStart = Math.max(0, Math.min(startTime + delta, endTime - 0.1));
setStartTime(newStart);
handleSeek(newStart);
loadStartFrame(newStart);
};
const adjustEnd = (delta) => {
const maxDuration = videoInfo?.duration || 0;
const newEnd = Math.min(maxDuration, Math.max(endTime + delta, startTime + 0.1));
setEndTime(newEnd);
handleSeek(Math.max(0, newEnd - 0.5));
loadEndFrame(newEnd - 0.1);
};
const handleTrim = async () => {
if (trimming) return;
try {
setTrimming(true);
// reprocess=false for manual workflow - user will manually proceed to next step
const response = await processApi.trim(jobId, startTime, endTime, false);
if (response.data.success) {
// Update cache buster to reload video with new trimmed version
setCacheBuster(Date.now());
// Update video info with new duration
if (response.data.new_duration) {
setVideoInfo(prev => ({ ...prev, duration: response.data.new_duration }));
setEndTime(response.data.new_duration);
setStartTime(0);
// Reload frame previews
loadStartFrame(0);
loadEndFrame(response.data.new_duration - 0.1);
}
onTrimComplete?.(response.data);
} else {
setError(response.data.message);
}
} catch (err) {
setError(err.response?.data?.detail || 'Trim failed');
} finally {
setTrimming(false);
}
};
const handleReset = () => {
if (videoInfo) {
setStartTime(0);
setEndTime(videoInfo.duration);
handleSeek(0);
loadStartFrame(0);
loadEndFrame(videoInfo.duration - 0.1);
}
};
if (loading) {
return (
<div className="flex items-center justify-center p-8">
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-red-500"></div>
<span className="ml-3 text-gray-400">Loading video...</span>
</div>
);
}
if (error) {
return (
<div className="p-4 bg-red-900/20 border border-red-800/50 rounded-xl">
<div className="flex items-center gap-2 text-red-400">
<AlertCircle size={18} />
<span>{error}</span>
</div>
<button
onClick={onCancel}
className="mt-3 px-4 py-2 bg-gray-700 text-white rounded-lg hover:bg-gray-600"
>
Close
</button>
</div>
);
}
const trimmedDuration = endTime - startTime;
// Add cache buster to ensure video is reloaded after trimming
const videoUrl = `${jobsApi.downloadOriginal(jobId)}?t=${cacheBuster}`;
return (
<div className="space-y-4">
{/* Header */}
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<Scissors className="text-red-400" size={20} />
<h3 className="font-semibold text-white">Video Trimmer</h3>
</div>
<div className="text-sm text-gray-400">
Original: {formatTime(videoInfo?.duration || 0)}
</div>
</div>
{/* Video Preview */}
<div className="relative aspect-[9/16] max-h-[400px] bg-black rounded-lg overflow-hidden mx-auto">
<video
key={cacheBuster}
ref={videoRef}
src={videoUrl}
className="w-full h-full object-contain"
onPlay={() => setIsPlaying(true)}
onPause={() => setIsPlaying(false)}
/>
{/* Play button overlay */}
<button
onClick={handlePlayPause}
className="absolute inset-0 flex items-center justify-center bg-black/30 opacity-0 hover:opacity-100 transition-opacity"
>
{isPlaying ? (
<Pause className="text-white" size={48} />
) : (
<Play className="text-white" size={48} />
)}
</button>
</div>
{/* Frame Preview Section */}
<div className="flex justify-between gap-4">
{/* Start Frame Preview */}
<div className="space-y-2">
<div className="text-sm text-gray-400 flex items-center gap-1">
<Image size={14} />
시작 프레임
</div>
<div className="relative w-[68px] h-[120px] bg-gray-800 rounded-lg overflow-hidden border-2 border-green-500/50">
{loadingStartFrame ? (
<div className="absolute inset-0 flex items-center justify-center">
<div className="animate-spin rounded-full h-6 w-6 border-b-2 border-green-500"></div>
</div>
) : startFrameUrl ? (
<img src={startFrameUrl} alt="Start frame" className="w-full h-full object-cover" />
) : null}
</div>
</div>
{/* End Frame Preview */}
<div className="space-y-2">
<div className="text-sm text-gray-400 flex items-center gap-1 justify-end">
<Image size={14} />
종료 프레임
</div>
<div className="relative w-[68px] h-[120px] bg-gray-800 rounded-lg overflow-hidden border-2 border-red-500/50">
{loadingEndFrame ? (
<div className="absolute inset-0 flex items-center justify-center">
<div className="animate-spin rounded-full h-6 w-6 border-b-2 border-red-500"></div>
</div>
) : endFrameUrl ? (
<img src={endFrameUrl} alt="End frame" className="w-full h-full object-cover" />
) : null}
</div>
</div>
</div>
{/* Timeline */}
<div className="space-y-3">
{/* Progress bar */}
<div className="relative h-8 bg-gray-800 rounded-lg overflow-hidden">
{/* Selected range */}
<div
className="absolute h-full bg-red-500/30"
style={{
left: `${(startTime / (videoInfo?.duration || 1)) * 100}%`,
width: `${((endTime - startTime) / (videoInfo?.duration || 1)) * 100}%`,
}}
/>
{/* Current position indicator */}
<div
className="absolute top-0 bottom-0 w-1 bg-white"
style={{
left: `${(currentTime / (videoInfo?.duration || 1)) * 100}%`,
}}
/>
{/* Click to seek */}
<div
className="absolute inset-0 cursor-pointer"
onClick={(e) => {
const rect = e.currentTarget.getBoundingClientRect();
const x = e.clientX - rect.left;
const percent = x / rect.width;
const time = percent * (videoInfo?.duration || 0);
handleSeek(Math.max(startTime, Math.min(endTime, time)));
}}
/>
</div>
{/* Start/End controls with precise inputs */}
<div className="grid grid-cols-2 gap-6">
{/* Start Time Control */}
<div className="space-y-2">
<label className="block text-sm text-gray-400">
시작: <span className="text-green-400 font-mono">{formatTimeDetailed(startTime)}</span>
</label>
<div className="flex items-center gap-1">
<button
onClick={() => adjustStart(-0.5)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="-0.5초"
>
<ChevronsLeft size={16} />
</button>
<button
onClick={() => adjustStart(-0.1)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="-0.1초"
>
<ChevronLeft size={16} />
</button>
<input
type="number"
min="0"
max={endTime - 0.1}
step="0.1"
value={startTime.toFixed(1)}
onChange={(e) => handleStartChange(e.target.value)}
className="w-16 px-2 py-1 bg-gray-800 border border-gray-600 rounded text-white text-center font-mono text-sm"
/>
<button
onClick={() => adjustStart(0.1)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="+0.1초"
>
<ChevronRight size={16} />
</button>
<button
onClick={() => adjustStart(0.5)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="+0.5초"
>
<ChevronsRight size={16} />
</button>
</div>
<input
type="range"
min="0"
max={videoInfo?.duration || 0}
step="0.1"
value={startTime}
onChange={(e) => handleStartChange(e.target.value)}
className="w-full accent-green-500"
/>
</div>
{/* End Time Control */}
<div className="space-y-2">
<label className="block text-sm text-gray-400">
종료: <span className="text-red-400 font-mono">{formatTimeDetailed(endTime)}</span>
</label>
<div className="flex items-center gap-1">
<button
onClick={() => adjustEnd(-0.5)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="-0.5초"
>
<ChevronsLeft size={16} />
</button>
<button
onClick={() => adjustEnd(-0.1)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="-0.1초"
>
<ChevronLeft size={16} />
</button>
<input
type="number"
min={startTime + 0.1}
max={videoInfo?.duration || 0}
step="0.1"
value={endTime.toFixed(1)}
onChange={(e) => handleEndChange(e.target.value)}
className="w-16 px-2 py-1 bg-gray-800 border border-gray-600 rounded text-white text-center font-mono text-sm"
/>
<button
onClick={() => adjustEnd(0.1)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="+0.1초"
>
<ChevronRight size={16} />
</button>
<button
onClick={() => adjustEnd(0.5)}
className="p-1.5 bg-gray-700 hover:bg-gray-600 rounded text-gray-300"
title="+0.5초"
>
<ChevronsRight size={16} />
</button>
</div>
<input
type="range"
min="0"
max={videoInfo?.duration || 0}
step="0.1"
value={endTime}
onChange={(e) => handleEndChange(e.target.value)}
className="w-full accent-red-500"
/>
</div>
</div>
{/* Duration info */}
<div className="flex items-center justify-between text-sm">
<span className="text-gray-400">
자를 영상 길이: <span className="text-white font-medium">{formatTime(trimmedDuration)}</span>
</span>
<span className="text-gray-400">
제거할 길이: <span className="text-red-400">{formatTime((videoInfo?.duration || 0) - trimmedDuration)}</span>
</span>
</div>
</div>
{/* Actions */}
<div className="flex items-center justify-between pt-2">
<button
onClick={handleReset}
className="flex items-center gap-2 px-4 py-2 text-gray-400 hover:text-white transition-colors"
>
<RotateCcw size={16} />
초기화
</button>
<div className="flex items-center gap-3">
<button
onClick={onCancel}
className="px-4 py-2 bg-gray-700 text-white rounded-lg hover:bg-gray-600 transition-colors"
>
취소
</button>
<button
onClick={handleTrim}
disabled={trimming || trimmedDuration < 1}
className="flex items-center gap-2 px-6 py-2 bg-red-600 text-white rounded-lg hover:bg-red-500 transition-colors disabled:opacity-50 disabled:cursor-not-allowed"
>
{trimming ? (
<>
<div className="animate-spin rounded-full h-4 w-4 border-b-2 border-white"></div>
자르는 ...
</>
) : (
<>
<Check size={16} />
영상 자르기
</>
)}
</button>
</div>
</div>
{/* Help text */}
<p className="text-xs text-gray-500 text-center">
« » 0.5 이동 · 0.1 이동 · 프레임 미리보기로 정확한 지점 확인
</p>
</div>
);
}

10
frontend/src/main.jsx Normal file
View File

@@ -0,0 +1,10 @@
import React from 'react'
import ReactDOM from 'react-dom/client'
import App from './App'
import './styles/index.css'
ReactDOM.createRoot(document.getElementById('root')).render(
<React.StrictMode>
<App />
</React.StrictMode>,
)

View File

@@ -0,0 +1,172 @@
import React, { useState, useEffect, useRef } from 'react';
import { Upload, Trash2, Play, Pause, Music } from 'lucide-react';
import { bgmApi } from '../api/client';
export default function BGMPage() {
const [bgmList, setBgmList] = useState([]);
const [isLoading, setIsLoading] = useState(true);
const [uploading, setUploading] = useState(false);
const [playingId, setPlayingId] = useState(null);
const audioRef = useRef(null);
const fileInputRef = useRef(null);
const fetchBgmList = async () => {
setIsLoading(true);
try {
const res = await bgmApi.list();
setBgmList(res.data);
} catch (err) {
console.error('Failed to fetch BGM list:', err);
} finally {
setIsLoading(false);
}
};
useEffect(() => {
fetchBgmList();
}, []);
const handleUpload = async (e) => {
const file = e.target.files?.[0];
if (!file) return;
setUploading(true);
try {
await bgmApi.upload(file);
await fetchBgmList();
} catch (err) {
console.error('Failed to upload BGM:', err);
alert('업로드 실패: ' + (err.response?.data?.detail || err.message));
} finally {
setUploading(false);
if (fileInputRef.current) {
fileInputRef.current.value = '';
}
}
};
const handleDelete = async (bgmId) => {
if (!confirm('이 BGM을 삭제하시겠습니까?')) return;
try {
await bgmApi.delete(bgmId);
setBgmList(bgmList.filter((b) => b.id !== bgmId));
if (playingId === bgmId) {
setPlayingId(null);
if (audioRef.current) {
audioRef.current.pause();
}
}
} catch (err) {
console.error('Failed to delete BGM:', err);
}
};
const handlePlay = (bgm) => {
if (playingId === bgm.id) {
// Stop playing
if (audioRef.current) {
audioRef.current.pause();
}
setPlayingId(null);
} else {
// Start playing
if (audioRef.current) {
audioRef.current.src = bgm.path;
audioRef.current.play();
}
setPlayingId(bgm.id);
}
};
const formatDuration = (seconds) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${String(secs).padStart(2, '0')}`;
};
return (
<div className="space-y-6">
<div className="flex items-center justify-between">
<h2 className="text-2xl font-bold">BGM 관리</h2>
<label className="btn-primary flex items-center gap-2 cursor-pointer">
<Upload size={18} />
{uploading ? '업로드 중...' : 'BGM 업로드'}
<input
ref={fileInputRef}
type="file"
accept=".mp3,.wav,.m4a,.ogg"
onChange={handleUpload}
className="hidden"
disabled={uploading}
/>
</label>
</div>
<audio
ref={audioRef}
onEnded={() => setPlayingId(null)}
className="hidden"
/>
{isLoading ? (
<div className="card text-center py-12 text-gray-500">
로딩 ...
</div>
) : bgmList.length === 0 ? (
<div className="card text-center py-12 text-gray-500">
<Music size={48} className="mx-auto mb-4 opacity-50" />
<p>등록된 BGM이 없습니다.</p>
<p className="text-sm mt-2">MP3, WAV 파일을 업로드하세요.</p>
</div>
) : (
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{bgmList.map((bgm) => (
<div
key={bgm.id}
className="card flex items-center gap-4 hover:border-gray-700 transition-colors"
>
<button
onClick={() => handlePlay(bgm)}
className={`w-12 h-12 rounded-full flex items-center justify-center transition-colors ${
playingId === bgm.id
? 'bg-red-600 text-white'
: 'bg-gray-800 text-gray-400 hover:text-white'
}`}
>
{playingId === bgm.id ? (
<Pause size={20} />
) : (
<Play size={20} className="ml-1" />
)}
</button>
<div className="flex-1 min-w-0">
<h3 className="font-medium truncate">{bgm.name}</h3>
<p className="text-sm text-gray-500">
{formatDuration(bgm.duration)}
</p>
</div>
<button
onClick={() => handleDelete(bgm.id)}
className="p-2 text-gray-500 hover:text-red-500 transition-colors"
title="삭제"
>
<Trash2 size={18} />
</button>
</div>
))}
</div>
)}
<div className="card bg-gray-800/50">
<h3 className="font-medium mb-2">지원 형식</h3>
<p className="text-sm text-gray-400">
MP3, WAV, M4A, OGG 형식의 오디오 파일을 업로드할 있습니다.
저작권에 유의하여 사용하세요.
</p>
</div>
</div>
);
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,163 @@
import React, { useState, useEffect } from 'react';
import { Trash2, Download, ExternalLink, RefreshCw } from 'lucide-react';
import { jobsApi } from '../api/client';
const STATUS_COLORS = {
pending: 'bg-yellow-500',
downloading: 'bg-blue-500',
transcribing: 'bg-purple-500',
translating: 'bg-indigo-500',
processing: 'bg-orange-500',
completed: 'bg-green-500',
failed: 'bg-red-500',
};
const STATUS_LABELS = {
pending: '대기',
downloading: '다운로드 중',
transcribing: '음성 인식',
translating: '번역',
processing: '처리 중',
completed: '완료',
failed: '실패',
};
export default function JobsPage() {
const [jobs, setJobs] = useState([]);
const [isLoading, setIsLoading] = useState(true);
const fetchJobs = async () => {
setIsLoading(true);
try {
const res = await jobsApi.list();
setJobs(res.data);
} catch (err) {
console.error('Failed to fetch jobs:', err);
} finally {
setIsLoading(false);
}
};
useEffect(() => {
fetchJobs();
// Auto refresh every 5 seconds
const interval = setInterval(fetchJobs, 5000);
return () => clearInterval(interval);
}, []);
const handleDelete = async (jobId) => {
if (!confirm('이 작업을 삭제하시겠습니까?')) return;
try {
await jobsApi.delete(jobId);
setJobs(jobs.filter((j) => j.job_id !== jobId));
} catch (err) {
console.error('Failed to delete job:', err);
}
};
const formatDate = (dateStr) => {
const date = new Date(dateStr);
return date.toLocaleString('ko-KR', {
month: 'short',
day: 'numeric',
hour: '2-digit',
minute: '2-digit',
});
};
return (
<div className="space-y-6">
<div className="flex items-center justify-between">
<h2 className="text-2xl font-bold">작업 목록</h2>
<button
onClick={fetchJobs}
className="btn-secondary flex items-center gap-2"
disabled={isLoading}
>
<RefreshCw size={18} className={isLoading ? 'animate-spin' : ''} />
새로고침
</button>
</div>
{jobs.length === 0 ? (
<div className="card text-center py-12 text-gray-500">
<p>아직 작업이 없습니다.</p>
<p className="text-sm mt-2"> 영상 URL을 입력하여 시작하세요.</p>
</div>
) : (
<div className="space-y-4">
{jobs.map((job) => (
<div key={job.job_id} className="card">
<div className="flex items-start justify-between gap-4">
<div className="flex-1 min-w-0">
<div className="flex items-center gap-3 mb-2">
<span className="font-mono text-sm bg-gray-800 px-2 py-1 rounded">
{job.job_id}
</span>
<span
className={`px-2 py-1 rounded text-xs font-medium ${STATUS_COLORS[job.status]} bg-opacity-20`}
>
<span
className={`inline-block w-2 h-2 rounded-full mr-1 ${STATUS_COLORS[job.status]}`}
/>
{STATUS_LABELS[job.status]}
</span>
<span className="text-sm text-gray-500">
{formatDate(job.created_at)}
</span>
</div>
{job.original_url && (
<p className="text-sm text-gray-400 truncate mb-2">
{job.original_url}
</p>
)}
{job.error && (
<p className="text-sm text-red-400 mt-2">
오류: {job.error}
</p>
)}
{/* Progress bar */}
{!['completed', 'failed'].includes(job.status) && (
<div className="mt-3">
<div className="h-1 bg-gray-800 rounded-full overflow-hidden">
<div
className="h-full bg-red-500 transition-all duration-500"
style={{ width: `${job.progress}%` }}
/>
</div>
</div>
)}
</div>
<div className="flex items-center gap-2">
{job.status === 'completed' && job.output_path && (
<a
href={jobsApi.downloadOutput(job.job_id)}
className="btn-primary flex items-center gap-2"
download
>
<Download size={18} />
다운로드
</a>
)}
<button
onClick={() => handleDelete(job.job_id)}
className="p-2 text-gray-500 hover:text-red-500 transition-colors"
title="삭제"
>
<Trash2 size={18} />
</button>
</div>
</div>
</div>
))}
</div>
)}
</div>
);
}

View File

@@ -0,0 +1,91 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
body {
font-family: 'Noto Sans KR', sans-serif;
background-color: #0f0f0f;
color: #ffffff;
}
@layer components {
.btn-primary {
@apply bg-red-600 hover:bg-red-700 text-white font-medium py-2 px-4 rounded-lg transition-colors duration-200;
}
.btn-secondary {
@apply bg-gray-700 hover:bg-gray-600 text-white font-medium py-2 px-4 rounded-lg transition-colors duration-200;
}
.input-field {
@apply w-full bg-gray-800 border border-gray-700 rounded-lg px-4 py-3 text-white placeholder-gray-500 focus:outline-none focus:border-red-500 transition-colors;
}
.card {
@apply bg-gray-900 rounded-xl p-6 border border-gray-800;
}
}
/* Custom scrollbar */
::-webkit-scrollbar {
width: 8px;
}
::-webkit-scrollbar-track {
background: #1f1f1f;
}
::-webkit-scrollbar-thumb {
background: #444;
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: #555;
}
/* Progress bar animation */
@keyframes progress {
0% {
width: 0%;
}
100% {
width: var(--progress);
}
}
.progress-bar {
animation: progress 0.5s ease-out forwards;
}
/* Fade in animation */
@keyframes fadeIn {
from {
opacity: 0;
transform: translateY(-8px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
.animate-fadeIn {
animation: fadeIn 0.3s ease-out forwards;
}
/* Pipeline step animation */
@keyframes slideIn {
from {
opacity: 0;
transform: translateX(-20px);
}
to {
opacity: 1;
transform: translateX(0);
}
}
.animate-slideIn {
animation: slideIn 0.4s ease-out forwards;
}

View File

@@ -0,0 +1,11 @@
/** @type {import('tailwindcss').Config} */
export default {
content: [
"./index.html",
"./src/**/*.{js,ts,jsx,tsx}",
],
theme: {
extend: {},
},
plugins: [],
}

19
frontend/vite.config.js Normal file
View File

@@ -0,0 +1,19 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
export default defineConfig({
plugins: [react()],
server: {
port: 3000,
proxy: {
'/api': {
target: 'http://localhost:8000',
changeOrigin: true,
},
'/static': {
target: 'http://localhost:8000',
changeOrigin: true,
},
},
},
})