Deep research memo · May 2026

YouTube transcript & audio extraction that actually works for Hermes

The failed video was not a normal “no captions” case. It was a cloud-IP / bot-gate case. The answer is not another freemium transcript API; it is a tiered local-first extraction chain with explicit blocked-state handling, browser-auth/local-worker escape hatches, and local Whisper transcription when captions fail.

HermesGBrainyoutube-memoryyt-dlpcaptionsfaster-whisperPO tokens

Executive answer

Use yt-dlp as the primary substrate, not a transcript-only scraper. It is the only option that cleanly spans captions, auto-subs, cookies, proxies, client variants, audio download, retries, and diagnostics. Keep youtube-transcript-api as the fast Python captions path. Add PO-token/provider experiments and a trusted local browser/desktop relay for the cases where a VPS simply cannot pass YouTube’s bot checks.

Observed blocker: both yt-dlp and youtube-transcript-api failed on the target video from the VPS with YouTube bot/auth gating: “Sign in to confirm you’re not a bot” / cloud-provider IP block. Library-switching alone will not reliably fix that.

Best next patch

Add a yt-dlp --write-auto-subs subtitle layer plus --cookies / --cookies-from-browser args to /root/youtube-memory.

highest ROI

Ranked repos / tools

1
~162k stars · active · Unlicense · captions + audio + cookies + proxy + client variants + PO-token provider framework

Best production foundation. Handles both transcript retrieval and audio fallback. It still fails on bad cloud IPs, but it gives the right controls and diagnostics.

core
2
~7.5k stars · active · MIT · captions/transcripts only

Excellent fast path for structured transcript segments. Its own docs say cloud providers are frequently blocked; use with proxies/local egress when needed.

fast path
3
~4.9k stars · active · MIT · JavaScript InnerTube client

Best Node/InnerTube basis for a custom gateway or local worker. More code than yt-dlp, but useful if you want a browser/session-aware service.

gateway
4
~1.5k stars · active · MIT · Python downloader/caption support

Potential Python fallback, more current than pytube. Not as robust operationally as yt-dlp and weaker cookie/proxy story.

fallback
5
~560 stars · Node transcript helper

Simple Node transcript package. Useful for lightweight scripts; not enough to solve bot-gated VPS runs.

simple
6
~538 stars · MIT · MCP wrapper

Good interface inspiration for Hermes, but it wraps transcript fetching rather than solving cloud-IP/cookie/proxy control.

interface only
7
~4.7k stars · older release cadence · downloader-oriented

Not recommended as the transcript/audio core. Node ecosystem is fragmented; YouTube.js is the better Node bet.

avoid core
8
~330 stars · low maintenance · captions only

Simple but not a production answer for current YouTube gating.

low priority
9
~13k stars · historically popular · weaker current fit

Avoid for new production. Use pytubefix if you need this family.

avoid

The robust extraction chain

Native captions via youtube-transcript-api. Cheapest and easiest structured output.
yt-dlp subtitle-only fallback. Try --write-subs --write-auto-subs --skip-download before downloading audio.
Unauthenticated audio, slow mode. Conservative retries, low concurrency, cache artifacts by video ID.
Client variants. Target web_safari, android_vr, and web_embedded; do not blast player_client=all.
PO-token provider experiment. Try bgutil-ytdlp-pot-provider with mweb. PO tokens may help but are not magic.
Cookies / browser-auth. Add --cookies and --cookies-from-browser; prefer dedicated/throwaway account and rate limits.
Clean egress / local worker. The durable answer for bot-gated cloud IPs is a trusted desktop/home worker with normal browser state.
Local transcription. If captions fail but audio succeeds, transcribe with faster-whisper or whisper.cpp and keep external-video attribution.

ASR fallbacks

faster-whisper is already aligned with your Python stack and installed for audio-memory. whisper.cpp is a good native/binary fallback for a low-dependency worker.

Current fit: keep faster-whisper as the local default; add whisper.cpp only if you want a standalone binary relay worker.

Concrete commands to add/test

Subtitle-only yt-dlp layer

yt-dlp \
  --skip-download \
  --write-subs --write-auto-subs \
  --sub-langs "en.*,en" \
  --sub-format "vtt/json3/best" \
  -o "$CACHE/%(id)s/%(id)s.%(ext)s" \
  "$URL"

Cookies args

yt-dlp --cookies /secure/youtube.cookies.txt --skip-download --write-auto-subs "$URL"
yt-dlp --cookies-from-browser chrome -x --audio-format wav "$URL"

PO token provider experiment

python3 -m pip install -U bgutil-ytdlp-pot-provider

docker run --name bgutil-provider -d --restart unless-stopped --init \
  -p 127.0.0.1:4416:4416 brainicism/bgutil-ytdlp-pot-provider

yt-dlp \
  --extractor-args "youtube:player_client=mweb;fetch_pot=auto;youtubepot-bgutilhttp:base_url=http://127.0.0.1:4416" \
  -f "ba/b" -x --audio-format wav \
  --postprocessor-args "ffmpeg:-ac 1 -ar 16000" \
  -o "$CACHE/%(id)s.%(ext)s" "$URL"

What to patch in /root/youtube-memory

  • core.py: add download_subtitles_with_ytdlp(), VTT/JSON3 parsing, auth arg builder, blocked/error artifact writer, and failure classifier.
  • cli.py: add --cookies, --cookies-from-browser, --auth-profile, --prefer-subtitles, --no-audio, --keep-audio.
  • tests: cover fallback order: captions → yt-dlp subtitles → audio → blocked artifact.
  • manifest: store strategy attempted, stderr class, next recovery path, artifact paths, transcript hash.
  • GBrain: keep raw evidence under sources/youtube/... and candidate layer under inbox/youtube-candidates/...; never promote external video speech as Connor’s belief.

Local gap verified

/root/youtube-memory/src/youtube_memory/core.py currently jumps from youtube-transcript-api captions directly to audio download + faster-whisper. It does not yet have the subtitle-only yt-dlp layer or cookie CLI args.

Decision matrix

FailureClassify asNext move
“Sign in to confirm you’re not a bot”bot_check / bad egressPO token, cookies, clean egress, local worker
RequestBlocked in youtube-transcript-apicloud IP blockedproxy/local worker; do not keep retrying same VPS
caption list exists but VTT is emptymirror unusabletreat as blocked, continue strategy chain
403 on media formatsGVS/PO token issuePO-token provider + client variant
age restricted/account requiredauth_requiredcookies from dedicated account
captions unavailable but audio worksno_captionsfaster-whisper / whisper.cpp

Recommended build order

  1. Add subtitle-only yt-dlp fallback to youtube-memory.
  2. Add --cookies / --cookies-from-browser CLI flags and pass them to subtitle/audio yt-dlp calls.
  3. Add structured blocked artifacts for every failed video, not just stdout errors.
  4. Add conservative yt-dlp audio options: low concurrency, sleeps, retries, 16 kHz mono conversion.
  5. Add optional PO-token provider profile: bgutil + mweb.
  6. Add a trusted local relay worker for residential/browser-auth extraction.
  7. Add batch/playlist retry queue and GBrain candidate review for transcripts.

Evidence notes

Repo metadata was checked via GitHub API on 2026-05-15. Local pipeline inspection verified /root/youtube-memory tests pass and confirmed current code paths. yt-dlp and youtube-transcript-api both produced bot/cloud-IP failures for the original target video from the VPS. The recommendation intentionally favors local-first/open-source tools over freemium transcript-credit APIs.