Deep research memo · May 2026

YouTube transcript & audio extraction that actually works for Hermes

The failed video was not a normal “no captions” case. It was a cloud-IP / bot-gate case. The answer is not another freemium transcript API; it is a tiered local-first extraction chain with explicit blocked-state handling, browser-auth/local-worker escape hatches, and local Whisper transcription when captions fail.

HermesGBrainyoutube-memoryyt-dlpcaptionsfaster-whisperPO tokens

Executive answer

Use yt-dlp as the primary substrate, not a transcript-only scraper. It is the only option that cleanly spans captions, auto-subs, cookies, proxies, client variants, audio download, retries, and diagnostics. Keep youtube-transcript-api as the fast Python captions path. Add PO-token/provider experiments and a trusted local browser/desktop relay for the cases where a VPS simply cannot pass YouTube’s bot checks.

Observed blocker: both yt-dlp and youtube-transcript-api failed on the target video from the VPS with YouTube bot/auth gating: “Sign in to confirm you’re not a bot” / cloud-provider IP block. Library-switching alone will not reliably fix that.

Best next patch

Add a yt-dlp --write-auto-subs subtitle layer plus --cookies / --cookies-from-browser args to /root/youtube-memory.

highest ROI

Ranked repos / tools

yt-dlp/yt-dlp

~162k stars · active · Unlicense · captions + audio + cookies + proxy + client variants + PO-token provider framework

Best production foundation. Handles both transcript retrieval and audio fallback. It still fails on bad cloud IPs, but it gives the right controls and diagnostics.

core

jdepoix/youtube-transcript-api

~7.5k stars · active · MIT · captions/transcripts only

Excellent fast path for structured transcript segments. Its own docs say cloud providers are frequently blocked; use with proxies/local egress when needed.

fast path

LuanRT/YouTube.js

~4.9k stars · active · MIT · JavaScript InnerTube client

Best Node/InnerTube basis for a custom gateway or local worker. More code than yt-dlp, but useful if you want a browser/session-aware service.

gateway

JuanBindez/pytubefix

~1.5k stars · active · MIT · Python downloader/caption support

Potential Python fallback, more current than pytube. Not as robust operationally as yt-dlp and weaker cookie/proxy story.

fallback

Kakulukian/youtube-transcript

~560 stars · Node transcript helper

Simple Node transcript package. Useful for lightweight scripts; not enough to solve bot-gated VPS runs.

simple

mcp-server-youtube-transcript

~538 stars · MIT · MCP wrapper

Good interface inspiration for Hermes, but it wraps transcript fetching rather than solving cloud-IP/cookie/proxy control.

interface only

fent/node-ytdl-core and forks

~4.7k stars · older release cadence · downloader-oriented

Not recommended as the transcript/audio core. Node ecosystem is fragmented; YouTube.js is the better Node bet.

avoid core

algolia/youtube-captions-scraper

~330 stars · low maintenance · captions only

Simple but not a production answer for current YouTube gating.

low priority

pytube/pytube

~13k stars · historically popular · weaker current fit

Avoid for new production. Use pytubefix if you need this family.

avoid

The robust extraction chain

Native captions via youtube-transcript-api. Cheapest and easiest structured output.

yt-dlp subtitle-only fallback. Try --write-subs --write-auto-subs --skip-download before downloading audio.

Unauthenticated audio, slow mode. Conservative retries, low concurrency, cache artifacts by video ID.

Client variants. Target web_safari, android_vr, and web_embedded; do not blast player_client=all.

PO-token provider experiment. Try bgutil-ytdlp-pot-provider with mweb. PO tokens may help but are not magic.

Cookies / browser-auth. Add --cookies and --cookies-from-browser; prefer dedicated/throwaway account and rate limits.

Clean egress / local worker. The durable answer for bot-gated cloud IPs is a trusted desktop/home worker with normal browser state.

Local transcription. If captions fail but audio succeeds, transcribe with faster-whisper or whisper.cpp and keep external-video attribution.

ASR fallbacks

faster-whisper is already aligned with your Python stack and installed for audio-memory. whisper.cpp is a good native/binary fallback for a low-dependency worker.

Current fit: keep faster-whisper as the local default; add whisper.cpp only if you want a standalone binary relay worker.

Concrete commands to add/test

Subtitle-only yt-dlp layer

yt-dlp \
  --skip-download \
  --write-subs --write-auto-subs \
  --sub-langs "en.*,en" \
  --sub-format "vtt/json3/best" \
  -o "$CACHE/%(id)s/%(id)s.%(ext)s" \
  "$URL"

Cookies args

yt-dlp --cookies /secure/youtube.cookies.txt --skip-download --write-auto-subs "$URL"
yt-dlp --cookies-from-browser chrome -x --audio-format wav "$URL"

PO token provider experiment

python3 -m pip install -U bgutil-ytdlp-pot-provider

docker run --name bgutil-provider -d --restart unless-stopped --init \
  -p 127.0.0.1:4416:4416 brainicism/bgutil-ytdlp-pot-provider

yt-dlp \
  --extractor-args "youtube:player_client=mweb;fetch_pot=auto;youtubepot-bgutilhttp:base_url=http://127.0.0.1:4416" \
  -f "ba/b" -x --audio-format wav \
  --postprocessor-args "ffmpeg:-ac 1 -ar 16000" \
  -o "$CACHE/%(id)s.%(ext)s" "$URL"

What to patch in /root/youtube-memory

core.py: add download_subtitles_with_ytdlp(), VTT/JSON3 parsing, auth arg builder, blocked/error artifact writer, and failure classifier.
cli.py: add --cookies, --cookies-from-browser, --auth-profile, --prefer-subtitles, --no-audio, --keep-audio.
tests: cover fallback order: captions → yt-dlp subtitles → audio → blocked artifact.
manifest: store strategy attempted, stderr class, next recovery path, artifact paths, transcript hash.
GBrain: keep raw evidence under sources/youtube/... and candidate layer under inbox/youtube-candidates/...; never promote external video speech as Connor’s belief.

Local gap verified

/root/youtube-memory/src/youtube_memory/core.py currently jumps from youtube-transcript-api captions directly to audio download + faster-whisper. It does not yet have the subtitle-only yt-dlp layer or cookie CLI args.

Decision matrix

Failure	Classify as	Next move
“Sign in to confirm you’re not a bot”	bot_check / bad egress	PO token, cookies, clean egress, local worker
`RequestBlocked` in youtube-transcript-api	cloud IP blocked	proxy/local worker; do not keep retrying same VPS
caption list exists but VTT is empty	mirror unusable	treat as blocked, continue strategy chain
403 on media formats	GVS/PO token issue	PO-token provider + client variant
age restricted/account required	auth_required	cookies from dedicated account
captions unavailable but audio works	no_captions	faster-whisper / whisper.cpp

Recommended build order

Add subtitle-only yt-dlp fallback to youtube-memory.
Add --cookies / --cookies-from-browser CLI flags and pass them to subtitle/audio yt-dlp calls.
Add structured blocked artifacts for every failed video, not just stdout errors.
Add conservative yt-dlp audio options: low concurrency, sleeps, retries, 16 kHz mono conversion.
Add optional PO-token provider profile: bgutil + mweb.
Add a trusted local relay worker for residential/browser-auth extraction.
Add batch/playlist retry queue and GBrain candidate review for transcripts.

Evidence notes

Repo metadata was checked via GitHub API on 2026-05-15. Local pipeline inspection verified /root/youtube-memory tests pass and confirmed current code paths. yt-dlp and youtube-transcript-api both produced bot/cloud-IP failures for the original target video from the VPS. The recommendation intentionally favors local-first/open-source tools over freemium transcript-credit APIs.