YouTube transcript & audio extraction that actually works for Hermes
The failed video was not a normal “no captions” case. It was a cloud-IP / bot-gate case. The answer is not another freemium transcript API; it is a tiered local-first extraction chain with explicit blocked-state handling, browser-auth/local-worker escape hatches, and local Whisper transcription when captions fail.
Executive answer
Use yt-dlp as the primary substrate, not a transcript-only scraper. It is the only option that cleanly spans captions, auto-subs, cookies, proxies, client variants, audio download, retries, and diagnostics. Keep youtube-transcript-api as the fast Python captions path. Add PO-token/provider experiments and a trusted local browser/desktop relay for the cases where a VPS simply cannot pass YouTube’s bot checks.
Best next patch
Add a yt-dlp --write-auto-subs subtitle layer plus --cookies / --cookies-from-browser args to /root/youtube-memory.
Ranked repos / tools
Best production foundation. Handles both transcript retrieval and audio fallback. It still fails on bad cloud IPs, but it gives the right controls and diagnostics.
Excellent fast path for structured transcript segments. Its own docs say cloud providers are frequently blocked; use with proxies/local egress when needed.
Best Node/InnerTube basis for a custom gateway or local worker. More code than yt-dlp, but useful if you want a browser/session-aware service.
Potential Python fallback, more current than pytube. Not as robust operationally as yt-dlp and weaker cookie/proxy story.
Simple Node transcript package. Useful for lightweight scripts; not enough to solve bot-gated VPS runs.
Good interface inspiration for Hermes, but it wraps transcript fetching rather than solving cloud-IP/cookie/proxy control.
Not recommended as the transcript/audio core. Node ecosystem is fragmented; YouTube.js is the better Node bet.
Simple but not a production answer for current YouTube gating.
The robust extraction chain
--write-subs --write-auto-subs --skip-download before downloading audio.web_safari, android_vr, and web_embedded; do not blast player_client=all.mweb. PO tokens may help but are not magic.--cookies and --cookies-from-browser; prefer dedicated/throwaway account and rate limits.ASR fallbacks
faster-whisper is already aligned with your Python stack and installed for audio-memory. whisper.cpp is a good native/binary fallback for a low-dependency worker.
Concrete commands to add/test
Subtitle-only yt-dlp layer
yt-dlp \ --skip-download \ --write-subs --write-auto-subs \ --sub-langs "en.*,en" \ --sub-format "vtt/json3/best" \ -o "$CACHE/%(id)s/%(id)s.%(ext)s" \ "$URL"
Cookies args
yt-dlp --cookies /secure/youtube.cookies.txt --skip-download --write-auto-subs "$URL" yt-dlp --cookies-from-browser chrome -x --audio-format wav "$URL"
PO token provider experiment
python3 -m pip install -U bgutil-ytdlp-pot-provider docker run --name bgutil-provider -d --restart unless-stopped --init \ -p 127.0.0.1:4416:4416 brainicism/bgutil-ytdlp-pot-provider yt-dlp \ --extractor-args "youtube:player_client=mweb;fetch_pot=auto;youtubepot-bgutilhttp:base_url=http://127.0.0.1:4416" \ -f "ba/b" -x --audio-format wav \ --postprocessor-args "ffmpeg:-ac 1 -ar 16000" \ -o "$CACHE/%(id)s.%(ext)s" "$URL"
What to patch in /root/youtube-memory
- core.py: add
download_subtitles_with_ytdlp(), VTT/JSON3 parsing, auth arg builder, blocked/error artifact writer, and failure classifier. - cli.py: add
--cookies,--cookies-from-browser,--auth-profile,--prefer-subtitles,--no-audio,--keep-audio. - tests: cover fallback order: captions → yt-dlp subtitles → audio → blocked artifact.
- manifest: store strategy attempted, stderr class, next recovery path, artifact paths, transcript hash.
- GBrain: keep raw evidence under
sources/youtube/...and candidate layer underinbox/youtube-candidates/...; never promote external video speech as Connor’s belief.
Local gap verified
/root/youtube-memory/src/youtube_memory/core.py currently jumps from youtube-transcript-api captions directly to audio download + faster-whisper. It does not yet have the subtitle-only yt-dlp layer or cookie CLI args.
Decision matrix
| Failure | Classify as | Next move |
|---|---|---|
| “Sign in to confirm you’re not a bot” | bot_check / bad egress | PO token, cookies, clean egress, local worker |
RequestBlocked in youtube-transcript-api | cloud IP blocked | proxy/local worker; do not keep retrying same VPS |
| caption list exists but VTT is empty | mirror unusable | treat as blocked, continue strategy chain |
| 403 on media formats | GVS/PO token issue | PO-token provider + client variant |
| age restricted/account required | auth_required | cookies from dedicated account |
| captions unavailable but audio works | no_captions | faster-whisper / whisper.cpp |
Recommended build order
- Add subtitle-only yt-dlp fallback to
youtube-memory. - Add
--cookies/--cookies-from-browserCLI flags and pass them to subtitle/audio yt-dlp calls. - Add structured blocked artifacts for every failed video, not just stdout errors.
- Add conservative yt-dlp audio options: low concurrency, sleeps, retries, 16 kHz mono conversion.
- Add optional PO-token provider profile:
bgutil+mweb. - Add a trusted local relay worker for residential/browser-auth extraction.
- Add batch/playlist retry queue and GBrain candidate review for transcripts.
Evidence notes
Repo metadata was checked via GitHub API on 2026-05-15. Local pipeline inspection verified /root/youtube-memory tests pass and confirmed current code paths. yt-dlp and youtube-transcript-api both produced bot/cloud-IP failures for the original target video from the VPS. The recommendation intentionally favors local-first/open-source tools over freemium transcript-credit APIs.