Compare commits

...

68 Commits

Author SHA1 Message Date
coletdjnz
26fbb09502
clarify class name requirements 2025-05-16 22:38:15 +12:00
coletdjnz
83edf67155
clean up docs 2025-05-16 21:01:07 +12:00
coletdjnz
2602a47685
Add never fetch pot policy 2025-05-16 20:54:54 +12:00
coletdjnz
f7b821f490
Merge remote-tracking branch 'upstream/master' into feat/youtube/pot-provider-framework 2025-05-16 20:43:44 +12:00
coletdjnz
5506357315
improve docs 2025-05-16 20:42:57 +12:00
coletdjnz
50d12bbc5e
cleanup some long lines 2025-05-16 20:30:06 +12:00
coletdjnz
6cdeec4332
flatten extractor arg docs 2025-05-16 20:16:02 +12:00
coletdjnz
fcdf2b732f
typing cleanup 2025-05-16 20:13:25 +12:00
coletdjnz
8045fed380
Revert back to pot_trace = true / false extractor arg 2025-05-16 19:17:23 +12:00
Subrat Lima
586b557b12 [ie/jiosaavn:artist] Add extractor (#12803)
Closes #10823

Authored by: subrat-lima
2025-05-11 03:01:13 -05:00
Subrat Lima
317f4b8006 [ie/jiosaavn:show:playlist] Add extractor (#12803)
Closes #12766

Authored by: subrat-lima
2025-05-11 03:01:13 -05:00
Subrat Lima
6839276496 [ie/jiosaavn:show] Add extractor (#12803)
Closes #12766

Authored by: subrat-lima
2025-05-11 03:01:13 -05:00
bashonly
cbcfe6378d
[ie/sprout] Remove extractor (#13149)
Authored by: bashonly
2025-05-10 23:22:53 +00:00
bashonly
7dbb47f84f
[ie/cartoonnetwork] Remove extractor (#13148)
Authored by: bashonly
2025-05-10 23:22:38 +00:00
bashonly
464c84fedf
[ie/amcnetworks] Fix extractor (#13147)
Authored by: bashonly
2025-05-10 23:15:12 +00:00
doe1080
7a7b85c901
[ie/niconico:live] Fix extractor (#13045)
Authored by: doe1080
2025-05-10 22:46:28 +00:00
v3DJG6GL
d880e06080
[ie/playsuisse] Improve metadata extraction (#12466)
Authored by: v3DJG6GL
2025-05-10 22:37:04 +00:00
bashonly
ded11ebc9a
[ie/youtube] Extract media_type for all videos (#13136)
Authored by: bashonly
2025-05-10 22:33:57 +00:00
diman8
ea8498ed53
[ie/SVTPage] Fix extractor (#12957)
Closes #13142
Authored by: diman8
2025-05-10 08:53:59 +00:00
bashonly
b26bc32579
[ie/nytimesarticle] Fix extraction (#13104)
Closes #13098
Authored by: bashonly
2025-05-06 20:32:41 +00:00
bashonly
f123cc83b3
[ie/wat.tv] Improve error handling (#13111)
Closes #8191
Authored by: bashonly
2025-05-05 15:03:07 +00:00
bashonly
0feec6dc13
[ie/youtube] Add web_embedded client for age-restricted videos (#13089)
Authored by: bashonly
2025-05-03 20:11:40 +00:00
bashonly
1d0f6539c4
[ie/bitchute] Fix extractor (#13081)
Closes #13080
Authored by: bashonly
2025-05-03 19:31:33 +00:00
bashonly
17cf9088d0
[build] Bump PyInstaller to v6.13.0 (#13082)
Ref: https://github.com/yt-dlp/yt-dlp/issues/10294

Authored by: bashonly
2025-05-03 17:10:31 +00:00
bashonly
9064d2482d
[build] Bump run-on-arch-action to v3 (#13088)
Authored by: bashonly
2025-05-03 17:08:24 +00:00
Abdulmohsen
8f303afb43
[ie/youtube] Fix --live-from-start support for premieres (#13079)
Closes #8543
Authored by: arabcoders
2025-05-03 15:23:28 +00:00
bashonly
5328eda882
[ie/weverse] Fix live extraction (#13084)
Closes #12883
Authored by: bashonly
2025-05-03 07:19:52 +00:00
github-actions[bot]
b77e5a553a Release 2025.04.30
Created by: bashonly

:ci skip all
2025-04-30 23:24:48 +00:00
sepro
505b400795
[cleanup] Misc (#12844)
Authored by: seproDev, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-04-30 23:01:25 +00:00
bashonly
74fc2ae12c
[ie/youtube] Do not strictly deprioritize missing_pot formats (#13061)
Deprioritization was redundant; they're already hidden behind an extractor-arg

Authored by: bashonly
2025-04-30 22:51:40 +00:00
InvalidUsernameException
7be14109a6
[ie/zdf] Fix extractors (#12779)
Closes #12647
Authored by: InvalidUsernameException, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-04-30 22:27:42 +00:00
bashonly
61c9a938b3
[ie/youtube] Cache signature timestamps (#13047)
Closes #12825
Authored by: bashonly
2025-04-30 01:15:17 +00:00
bashonly
fd8394bc50
[ie/youtube] Improve warning for SABR-only/SSAP player responses (#13049)
Ref: https://github.com/yt-dlp/yt-dlp/issues/12482

Authored by: bashonly
2025-04-30 01:13:35 +00:00
bashonly
22ac81a069
[ie/vimeo] Extract from mobile API (#13034)
Closes #12974
Authored by: bashonly
2025-04-29 16:45:54 +00:00
doe1080
25cd7c1ecb
[ie/niconico] Fix login support (#13008)
Authored by: doe1080
2025-04-28 22:42:01 +00:00
bashonly
28f04e8a5e
[ie/reddit] Support --ignore-no-formats-error (#12993)
Closes #12987
Authored by: bashonly
2025-04-28 22:31:34 +00:00
sepro
a3e91df30a
[ie/TV2DK] Fix extractor (#12945)
Closes #10334
Authored by: seproDev, bashonly

Co-authored-by: bashonly <bashonly@protonmail.com>
2025-04-29 00:21:54 +02:00
bashonly
80736b9c90
[ie/bpb] Fix formats extraction (#13015)
Closes #13011
Authored by: bashonly
2025-04-28 22:20:39 +00:00
Sergei Zharkov
1ae6bff564
[ie/twitch:clips] Fix uploader metadata extraction (#13022)
Fix 61046c31612b30c749cbdae934b7fe26abe659d7

Authored by: 1271
2025-04-28 22:19:14 +00:00
sepro
b37ff4de5b
[ie/linkedin:events] Add extractor (#12926)
Authored by: seproDev, bashonly

Co-authored-by: bashonly <bashonly@protonmail.com>
2025-04-28 22:58:30 +02:00
Simon Sawicki
3690e91265
[ci] Add file mode test to code check (#13036)
Authored by: Grub4K
2025-04-28 21:21:06 +02:00
coletdjnz
8cb08028f5
[ie/youtube] Detect and warn when account cookies are rotated (#13014)
Related: https://github.com/yt-dlp/yt-dlp/issues/8227

Authored by: coletdjnz
2025-04-27 12:16:34 +12:00
bashonly
1cf39ddf3d
[ie/twitter] Fix extraction when logged-in (#13024)
Closes #13010
Authored by: bashonly
2025-04-26 22:39:29 +00:00
bashonly
c2d6659d10
[ie/youtube] Detect player JS variants for any locale (#13003)
Authored by: bashonly
2025-04-26 22:08:34 +00:00
coletdjnz
26feac3dd1
[ie/youtube] Add context to video request rate limit error (#12958)
Related: https://github.com/yt-dlp/yt-dlp/issues/11426

Authored by: coletdjnz
2025-04-25 16:11:07 +12:00
doe1080
70599e53b7
[ie/twitter:spaces] Improve metadata extraction (#12911)
Authored by: doe1080
2025-04-25 03:42:17 +00:00
doe1080
8d127b18f8 [fd/NiconicoDmc] Remove downloader (#12916)
Authored by: doe1080
2025-04-24 15:20:25 -05:00
doe1080
7d05aa99c6 [ie/niconico] Remove DMC formats support (#12916)
Authored by: doe1080
2025-04-24 15:20:25 -05:00
bashonly
36da6360e1
[ie/mlbtv] Fix device ID caching (#12980)
Authored by: bashonly
2025-04-24 19:18:22 +00:00
bashonly
e7e3b7a55c
[ie/dacast] Support tokenized URLs (#12979)
Authored by: bashonly
2025-04-24 19:10:34 +00:00
D Trombett
dce8234624
[ie/RaiPlay] Fix DRM detection (#12971)
Closes #12969
Authored by: DTrombett
2025-04-24 18:26:35 +00:00
sepro
2381881fe5
[ie/vk] Fix uploader extraction (#12985)
Closes #12967
Authored by: seproDev
2025-04-23 14:31:20 +00:00
Sergey B (Troex Nevelin)
741fd809bc
[ie/GetCourseRu] Fix extractors (#12943)
Closes #12941
Authored by: troex
2025-04-23 00:14:42 +00:00
bashonly
34a061a295
[ie/generic] Fix MPD extraction for file:// URLs (#12978)
Fix 5086d4aed6aeb3908c62f49e2d8f74cc0cb05110
Authored by: bashonly
2025-04-23 00:06:35 +00:00
bashonly
9032f98136
[ie/cda] Fix formats extraction (#12975)
Closes #12962
Authored by: bashonly
2025-04-23 00:00:41 +00:00
bashonly
de271a06fd
[ie/twitcasting] Fix livestream extraction (#12977)
Closes #12966
Authored by: bashonly
2025-04-22 23:54:41 +00:00
bashonly
d596824c2f
[ie/vimeo] Fix API extraction (#12976)
Closes #12974
Authored by: bashonly
2025-04-22 23:47:38 +00:00
sepro
88eb1e7a9a
Add --preset-alias option (#12839)
Authored by: seproDev, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2025-04-19 22:08:34 +02:00
sepro
f5a37ea40e
[ie/loco] Fix extractor (#12934)
Closes #12930
Authored by: seproDev
2025-04-19 02:02:09 +02:00
Florentin Le Moal
f07ee91c71
[ie/rtve] Rework extractors (#10388)
Closes #1346, Closes #5756
Authored by: meGAmeS1, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-04-19 01:47:14 +02:00
fries1234
ed8ad1b4d6
[ie/tvw:tvchannels] Add extractor (#12721)
Authored by: fries1234
2025-04-19 01:35:47 +02:00
Florentin Le Moal
839d643253
[ie/AtresPlayer] Rework extractor (#11424)
Closes #996, Closes #1165
Authored by: meGAmeS1, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-04-18 22:12:31 +02:00
香芋奶茶
f5736bb35b
[ie/AbemaTV] Fix thumbnail extraction (#12859)
Closes #12858
Authored by: Kiritomo
2025-04-18 21:12:27 +02:00
sepro
9d26daa04a
[ie/panopto] Fix formats extraction (#12925)
Closes #11042
Authored by: seproDev
2025-04-18 21:09:41 +02:00
sepro
73a26f9ee6
[ie/linkedin] Support feed URLs (#12927)
Closes #6104
Authored by: seproDev
2025-04-18 21:08:13 +02:00
sepro
4e69a626cc
[ie/tvp:vod] Improve _VALID_URL (#12923)
Closes #12917
Authored by: seproDev
2025-04-18 21:05:01 +02:00
pj47x
77aa15e98f
[ie/manyvids] Fix extractor (#10907)
Closes #8268
Authored by: pj47x
2025-04-18 18:38:58 +00:00
Michał Walenciak
cb271d445b
[ie/CDAFolder] Extend _VALID_URL (#12919)
Closes #12918
Authored by: Kicer86, fireattack

Co-authored-by: fireattack <human.peng@gmail.com>
2025-04-18 18:32:38 +00:00
65 changed files with 2800 additions and 1952 deletions

View File

@ -192,7 +192,7 @@ jobs:
with: with:
path: ./repo path: ./repo
- name: Virtualized Install, Prepare & Build - name: Virtualized Install, Prepare & Build
uses: yt-dlp/run-on-arch-action@v2 uses: yt-dlp/run-on-arch-action@v3
with: with:
# Ref: https://github.com/uraimo/run-on-arch-action/issues/55 # Ref: https://github.com/uraimo/run-on-arch-action/issues/55
env: | env: |
@ -411,7 +411,7 @@ jobs:
run: | # Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds run: | # Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
python devscripts/install_deps.py -o --include build python devscripts/install_deps.py -o --include build
python devscripts/install_deps.py --include curl-cffi python devscripts/install_deps.py --include curl-cffi
python -m pip install -U "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-6.11.1-py3-none-any.whl" python -m pip install -U "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-6.13.0-py3-none-any.whl"
- name: Prepare - name: Prepare
run: | run: |
@ -460,7 +460,7 @@ jobs:
run: | run: |
python devscripts/install_deps.py -o --include build python devscripts/install_deps.py -o --include build
python devscripts/install_deps.py python devscripts/install_deps.py
python -m pip install -U "https://yt-dlp.github.io/Pyinstaller-Builds/i686/pyinstaller-6.11.1-py3-none-any.whl" python -m pip install -U "https://yt-dlp.github.io/Pyinstaller-Builds/i686/pyinstaller-6.13.0-py3-none-any.whl"
- name: Prepare - name: Prepare
run: | run: |

View File

@ -6,7 +6,7 @@ on:
- devscripts/** - devscripts/**
- test/** - test/**
- yt_dlp/**.py - yt_dlp/**.py
- '!yt_dlp/extractor/*.py' - '!yt_dlp/extractor/**.py'
- yt_dlp/extractor/__init__.py - yt_dlp/extractor/__init__.py
- yt_dlp/extractor/common.py - yt_dlp/extractor/common.py
- yt_dlp/extractor/extractors.py - yt_dlp/extractor/extractors.py
@ -16,7 +16,7 @@ on:
- devscripts/** - devscripts/**
- test/** - test/**
- yt_dlp/**.py - yt_dlp/**.py
- '!yt_dlp/extractor/*.py' - '!yt_dlp/extractor/**.py'
- yt_dlp/extractor/__init__.py - yt_dlp/extractor/__init__.py
- yt_dlp/extractor/common.py - yt_dlp/extractor/common.py
- yt_dlp/extractor/extractors.py - yt_dlp/extractor/extractors.py

View File

@ -38,3 +38,5 @@ jobs:
run: ruff check --output-format github . run: ruff check --output-format github .
- name: Run autopep8 - name: Run autopep8
run: autopep8 --diff . run: autopep8 --diff .
- name: Check file mode
run: git ls-files --format="%(objectmode) %(path)" yt_dlp/ | ( ! grep -v "^100644" )

View File

@ -760,3 +760,13 @@ vallovic
arabcoders arabcoders
mireq mireq
mlabeeb03 mlabeeb03
1271
CasperMcFadden95
Kicer86
Kiritomo
leeblackc
meGAmeS1
NeonMan
pj47x
troex
WouterGordts

View File

@ -4,6 +4,85 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.04.30
#### Important changes
- **New option `--preset-alias`/`-t` has been added**
This provides convenient predefined aliases for common use cases. Available presets include `mp4`, `mp3`, `mkv`, `aac`, and `sleep`. See [the README](https://github.com/yt-dlp/yt-dlp/blob/master/README.md#preset-aliases) for more details.
#### Core changes
- [Add `--preset-alias` option](https://github.com/yt-dlp/yt-dlp/commit/88eb1e7a9a2720ac89d653c0d0e40292388823bb) ([#12839](https://github.com/yt-dlp/yt-dlp/issues/12839)) by [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **utils**
- `_yield_json_ld`: [Make function less fatal](https://github.com/yt-dlp/yt-dlp/commit/45f01de00e1bc076b7f676a669736326178647b1) ([#12855](https://github.com/yt-dlp/yt-dlp/issues/12855)) by [seproDev](https://github.com/seproDev)
- `url_or_none`: [Support WebSocket URLs](https://github.com/yt-dlp/yt-dlp/commit/a473e592337edb8ca40cde52c1fcaee261c54df9) ([#12848](https://github.com/yt-dlp/yt-dlp/issues/12848)) by [doe1080](https://github.com/doe1080)
#### Extractor changes
- **abematv**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/f5736bb35bde62348caebf7b188668655e316deb) ([#12859](https://github.com/yt-dlp/yt-dlp/issues/12859)) by [Kiritomo](https://github.com/Kiritomo)
- **atresplayer**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/839d64325356310e6de6cd9cad28fb546619ca63) ([#11424](https://github.com/yt-dlp/yt-dlp/issues/11424)) by [meGAmeS1](https://github.com/meGAmeS1), [seproDev](https://github.com/seproDev)
- **bpb**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/80736b9c90818adee933a155079b8535bc06819f) ([#13015](https://github.com/yt-dlp/yt-dlp/issues/13015)) by [bashonly](https://github.com/bashonly)
- **cda**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/9032f981362ea0be90626fab51ec37934feded6d) ([#12975](https://github.com/yt-dlp/yt-dlp/issues/12975)) by [bashonly](https://github.com/bashonly)
- **cdafolder**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/cb271d445bc2d866c9a3404b1d8f59bcb77447df) ([#12919](https://github.com/yt-dlp/yt-dlp/issues/12919)) by [fireattack](https://github.com/fireattack), [Kicer86](https://github.com/Kicer86)
- **crowdbunker**: [Make format extraction non-fatal](https://github.com/yt-dlp/yt-dlp/commit/4ebf41309d04a6e196944f1c0f5f0154cff0055a) ([#12836](https://github.com/yt-dlp/yt-dlp/issues/12836)) by [seproDev](https://github.com/seproDev)
- **dacast**: [Support tokenized URLs](https://github.com/yt-dlp/yt-dlp/commit/e7e3b7a55c456da4a5a812b4fefce4dce8e6a616) ([#12979](https://github.com/yt-dlp/yt-dlp/issues/12979)) by [bashonly](https://github.com/bashonly)
- **dzen.ru**: [Rework extractors](https://github.com/yt-dlp/yt-dlp/commit/a3f2b54c2535d862de6efa9cfaa6ca9a2b2f7dd6) ([#12852](https://github.com/yt-dlp/yt-dlp/issues/12852)) by [seproDev](https://github.com/seproDev)
- **generic**: [Fix MPD extraction for `file://` URLs](https://github.com/yt-dlp/yt-dlp/commit/34a061a295d156934417c67ee98070b94943006b) ([#12978](https://github.com/yt-dlp/yt-dlp/issues/12978)) by [bashonly](https://github.com/bashonly)
- **getcourseru**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/741fd809bc4d301c19b53877692ae510334a6750) ([#12943](https://github.com/yt-dlp/yt-dlp/issues/12943)) by [troex](https://github.com/troex)
- **ivoox**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/7faa18b83dcfc74a1a1e2034e6b0369c495ca645) ([#12768](https://github.com/yt-dlp/yt-dlp/issues/12768)) by [NeonMan](https://github.com/NeonMan), [seproDev](https://github.com/seproDev)
- **kika**: [Add playlist extractor](https://github.com/yt-dlp/yt-dlp/commit/3c1c75ecb8ab352f422b59af46fff2be992e4115) ([#12832](https://github.com/yt-dlp/yt-dlp/issues/12832)) by [1100101](https://github.com/1100101)
- **linkedin**
- [Support feed URLs](https://github.com/yt-dlp/yt-dlp/commit/73a26f9ee68610e33c0b4407b77355f2ab7afd0e) ([#12927](https://github.com/yt-dlp/yt-dlp/issues/12927)) by [seproDev](https://github.com/seproDev)
- events: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/b37ff4de5baf4e4e70c6a0ec34e136a279ad20af) ([#12926](https://github.com/yt-dlp/yt-dlp/issues/12926)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- **loco**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f5a37ea40e20865b976ffeeff13eeae60292eb23) ([#12934](https://github.com/yt-dlp/yt-dlp/issues/12934)) by [seproDev](https://github.com/seproDev)
- **lrtradio**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/74e90dd9b8f9c1a5c48a2515126654f4d398d687) ([#12801](https://github.com/yt-dlp/yt-dlp/issues/12801)) by [subrat-lima](https://github.com/subrat-lima)
- **manyvids**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/77aa15e98f34c4ad425aabf39dd1ee37b48f772c) ([#10907](https://github.com/yt-dlp/yt-dlp/issues/10907)) by [pj47x](https://github.com/pj47x)
- **mixcloud**: [Refactor extractor](https://github.com/yt-dlp/yt-dlp/commit/db6d1f145ad583e0220637726029f8f2fa6200a0) ([#12830](https://github.com/yt-dlp/yt-dlp/issues/12830)) by [seproDev](https://github.com/seproDev), [WouterGordts](https://github.com/WouterGordts)
- **mlbtv**: [Fix device ID caching](https://github.com/yt-dlp/yt-dlp/commit/36da6360e130197df927ee93409519ce3f4075f5) ([#12980](https://github.com/yt-dlp/yt-dlp/issues/12980)) by [bashonly](https://github.com/bashonly)
- **niconico**
- [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/25cd7c1ecbb6cbf21dd3a6e59608e4af94715ecc) ([#13008](https://github.com/yt-dlp/yt-dlp/issues/13008)) by [doe1080](https://github.com/doe1080)
- [Remove DMC formats support](https://github.com/yt-dlp/yt-dlp/commit/7d05aa99c65352feae1cd9a3ff8784b64bfe382a) ([#12916](https://github.com/yt-dlp/yt-dlp/issues/12916)) by [doe1080](https://github.com/doe1080)
- live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1d45e30537bf83e069184a440703e4c43b2e0198) ([#12809](https://github.com/yt-dlp/yt-dlp/issues/12809)) by [Snack-X](https://github.com/Snack-X)
- **panopto**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/9d26daa04ad5108257bc5e30f7f040c7f1fe7a5a) ([#12925](https://github.com/yt-dlp/yt-dlp/issues/12925)) by [seproDev](https://github.com/seproDev)
- **parti**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/425017531fbc3369becb5a44013e26f26efabf45) ([#12769](https://github.com/yt-dlp/yt-dlp/issues/12769)) by [benfaerber](https://github.com/benfaerber)
- **raiplay**: [Fix DRM detection](https://github.com/yt-dlp/yt-dlp/commit/dce82346245e35a46fda836ca2089805d2347935) ([#12971](https://github.com/yt-dlp/yt-dlp/issues/12971)) by [DTrombett](https://github.com/DTrombett)
- **reddit**: [Support `--ignore-no-formats-error`](https://github.com/yt-dlp/yt-dlp/commit/28f04e8a5e383ff531db646190b4be45554610d6) ([#12993](https://github.com/yt-dlp/yt-dlp/issues/12993)) by [bashonly](https://github.com/bashonly)
- **royalive**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/e1847535e28788414a25546a45bebcada2f34558) ([#12817](https://github.com/yt-dlp/yt-dlp/issues/12817)) by [CasperMcFadden95](https://github.com/CasperMcFadden95)
- **rtve**: [Rework extractors](https://github.com/yt-dlp/yt-dlp/commit/f07ee91c71920ab1187a7ea756720e81aa406a9d) ([#10388](https://github.com/yt-dlp/yt-dlp/issues/10388)) by [meGAmeS1](https://github.com/meGAmeS1), [seproDev](https://github.com/seproDev)
- **rumble**: [Improve format extraction](https://github.com/yt-dlp/yt-dlp/commit/58d0c83457b93b3c9a81eb6bc5a4c65f25e949df) ([#12838](https://github.com/yt-dlp/yt-dlp/issues/12838)) by [seproDev](https://github.com/seproDev)
- **tokfmpodcast**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/91832111a12d87499294a0f430829b8c2254c339) ([#12842](https://github.com/yt-dlp/yt-dlp/issues/12842)) by [selfisekai](https://github.com/selfisekai)
- **tv2dk**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/a3e91df30a45943f40759d2c1e0b6c2ca4b2a263) ([#12945](https://github.com/yt-dlp/yt-dlp/issues/12945)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- **tvp**: vod: [Improve `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/4e69a626cce51428bc1d66dc606a56d9498b03a5) ([#12923](https://github.com/yt-dlp/yt-dlp/issues/12923)) by [seproDev](https://github.com/seproDev)
- **tvw**: tvchannels: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/ed8ad1b4d6b9d7a1426ff5192ff924f3371e4721) ([#12721](https://github.com/yt-dlp/yt-dlp/issues/12721)) by [fries1234](https://github.com/fries1234)
- **twitcasting**: [Fix livestream extraction](https://github.com/yt-dlp/yt-dlp/commit/de271a06fd6d20d4f55597ff7f90e4d913de0a52) ([#12977](https://github.com/yt-dlp/yt-dlp/issues/12977)) by [bashonly](https://github.com/bashonly)
- **twitch**: clips: [Fix uploader metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/1ae6bff564a65af41e94f1a4727892471ecdd05a) ([#13022](https://github.com/yt-dlp/yt-dlp/issues/13022)) by [1271](https://github.com/1271)
- **twitter**
- [Fix extraction when logged-in](https://github.com/yt-dlp/yt-dlp/commit/1cf39ddf3d10b6512daa7dd139e5f6c0dc548bbc) ([#13024](https://github.com/yt-dlp/yt-dlp/issues/13024)) by [bashonly](https://github.com/bashonly)
- spaces: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/70599e53b736bb75922b737e6e0d4f76e419bb20) ([#12911](https://github.com/yt-dlp/yt-dlp/issues/12911)) by [doe1080](https://github.com/doe1080)
- **vimeo**: [Extract from mobile API](https://github.com/yt-dlp/yt-dlp/commit/22ac81a0692019ac833cf282e4ef99718e9ef3fa) ([#13034](https://github.com/yt-dlp/yt-dlp/issues/13034)) by [bashonly](https://github.com/bashonly)
- **vk**
- [Fix chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/5361a7c6e2933c919716e0cb1e3116c28c40419f) ([#12821](https://github.com/yt-dlp/yt-dlp/issues/12821)) by [seproDev](https://github.com/seproDev)
- [Fix uploader extraction](https://github.com/yt-dlp/yt-dlp/commit/2381881fe58a723853350a6ab750a5efc9f10c85) ([#12985](https://github.com/yt-dlp/yt-dlp/issues/12985)) by [seproDev](https://github.com/seproDev)
- **youtube**
- [Add context to video request rate limit error](https://github.com/yt-dlp/yt-dlp/commit/26feac3dd142536ad08ad1ed731378cb88e63602) ([#12958](https://github.com/yt-dlp/yt-dlp/issues/12958)) by [coletdjnz](https://github.com/coletdjnz)
- [Add extractor arg to skip "initial_data" request](https://github.com/yt-dlp/yt-dlp/commit/ed6c6d7eefbc78fa72e4e60ad6edaa3ee2acc715) ([#12865](https://github.com/yt-dlp/yt-dlp/issues/12865)) by [leeblackc](https://github.com/leeblackc)
- [Add warning on video captcha challenge](https://github.com/yt-dlp/yt-dlp/commit/f484c51599a6cd01eb078ea7dc9bbba942967774) ([#12939](https://github.com/yt-dlp/yt-dlp/issues/12939)) by [coletdjnz](https://github.com/coletdjnz)
- [Cache signature timestamps](https://github.com/yt-dlp/yt-dlp/commit/61c9a938b390b8334ee3a879fe2d93f714e30138) ([#13047](https://github.com/yt-dlp/yt-dlp/issues/13047)) by [bashonly](https://github.com/bashonly)
- [Detect and warn when account cookies are rotated](https://github.com/yt-dlp/yt-dlp/commit/8cb08028f5be2acb9835ce1670b196b9b077052f) ([#13014](https://github.com/yt-dlp/yt-dlp/issues/13014)) by [coletdjnz](https://github.com/coletdjnz)
- [Detect player JS variants for any locale](https://github.com/yt-dlp/yt-dlp/commit/c2d6659d1069f8cff97e1fd61d1c59e949e1e63d) ([#13003](https://github.com/yt-dlp/yt-dlp/issues/13003)) by [bashonly](https://github.com/bashonly)
- [Do not strictly deprioritize `missing_pot` formats](https://github.com/yt-dlp/yt-dlp/commit/74fc2ae12c24eb6b4e02c6360c89bd05f3c8f740) ([#13061](https://github.com/yt-dlp/yt-dlp/issues/13061)) by [bashonly](https://github.com/bashonly)
- [Improve warning for SABR-only/SSAP player responses](https://github.com/yt-dlp/yt-dlp/commit/fd8394bc50301ac5e930aa65aa71ab1b8372b8ab) ([#13049](https://github.com/yt-dlp/yt-dlp/issues/13049)) by [bashonly](https://github.com/bashonly)
- tab: [Extract continuation from empty page](https://github.com/yt-dlp/yt-dlp/commit/72ba4879304c2082fecbb472e6cc05ee2d154a3b) ([#12938](https://github.com/yt-dlp/yt-dlp/issues/12938)) by [coletdjnz](https://github.com/coletdjnz)
- **zdf**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/7be14109a6bd493a2e881da4f9e30adaf3e7e5d5) ([#12779](https://github.com/yt-dlp/yt-dlp/issues/12779)) by [bashonly](https://github.com/bashonly), [InvalidUsernameException](https://github.com/InvalidUsernameException)
#### Downloader changes
- **niconicodmc**: [Remove downloader](https://github.com/yt-dlp/yt-dlp/commit/8d127b18f81131453eaba05d3bb810d9b73adb75) ([#12916](https://github.com/yt-dlp/yt-dlp/issues/12916)) by [doe1080](https://github.com/doe1080)
#### Networking changes
- [Add PATCH request shortcut](https://github.com/yt-dlp/yt-dlp/commit/ceab4d5ed63a1f135a1816fe967c9d9a1ec7e6e8) ([#12884](https://github.com/yt-dlp/yt-dlp/issues/12884)) by [doe1080](https://github.com/doe1080)
#### Misc. changes
- **ci**: [Add file mode test to code check](https://github.com/yt-dlp/yt-dlp/commit/3690e91265d1d0bbeffaf6a9b8cc9baded1367bd) ([#13036](https://github.com/yt-dlp/yt-dlp/issues/13036)) by [Grub4K](https://github.com/Grub4K)
- **cleanup**: Miscellaneous: [505b400](https://github.com/yt-dlp/yt-dlp/commit/505b400795af557bdcfd9d4fa7e9133b26ef431c) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2025.03.31 ### 2025.03.31
#### Core changes #### Core changes

View File

@ -386,6 +386,12 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
recursive options. As a safety measure, each recursive options. As a safety measure, each
alias may be triggered a maximum of 100 alias may be triggered a maximum of 100
times. This option can be used multiple times times. This option can be used multiple times
-t, --preset-alias PRESET Applies a predefined set of options. e.g.
--preset-alias mp3. The following presets
are available: mp3, aac, mp4, mkv, sleep.
See the "Preset Aliases" section at the end
for more info. This option can be used
multiple times
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. To --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. To
@ -1098,6 +1104,23 @@ Make chapter entries for, or remove various segments (sponsor,
can use this option multiple times to give can use this option multiple times to give
arguments for different extractors arguments for different extractors
## Preset Aliases:
-t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format
mp3
-t aac -f
'ba[acodec^=aac]/ba[acodec^=mp4a.40.]/ba/b'
-x --audio-format aac
-t mp4 --merge-output-format mp4 --remux-video mp4
-S vcodec:h264,lang,quality,res,fps,hdr:12,a
codec:aac
-t mkv --merge-output-format mkv --remux-video mkv
-t sleep --sleep-subtitles 5 --sleep-requests 0.75
--sleep-interval 10 --max-sleep-interval 20
# CONFIGURATION # CONFIGURATION
You can configure yt-dlp by placing any supported command line option in a configuration file. The configuration is loaded from the following locations: You can configure yt-dlp by placing any supported command line option in a configuration file. The configuration is loaded from the following locations:
@ -1769,7 +1792,7 @@ The following extractors use this feature:
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios` * `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1781,15 +1804,12 @@ The following extractors use this feature:
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage` * `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID) * `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual` * `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
##### PO Token settings
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request) * `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
* `pot_log_level`: PO Token provider log level. One of `TRACE`, `DEBUG`, `INFO`, `WARNING`, `ERROR`. Default is `DEBUG` if `-v` is used, otherwise `INFO` * `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default)
* `fetch_pot`: Policy to use for fetching a PO Token from providers. `always` to always try fetch a PO Token regardless if the client requires one for the given context. `when_required` to only fetch a PO Token if the client requires one for the given context (default) * `fetch_pot`: Policy to use for fetching a PO Token from providers. `always` to always try fetch a PO Token regardless if the client requires one for the given context, `never` to never fetch a PO Token, or `auto` to only fetch a PO Token if the client requires one for the given context (default)
###### youtubepot-webpo #### youtubepot-webpo
* `bind_to_visitor_id`: Whether to use the Visitor ID instead of Visitor Data for caching WebPO tokens. Either `true` or `false` (default `true`) * `bind_to_visitor_id`: Whether to use the Visitor ID instead of Visitor Data for caching WebPO tokens. Either `true` or `false` (default `true`)
#### youtubetab (YouTube playlists, channels, feeds, etc.) #### youtubetab (YouTube playlists, channels, feeds, etc.)
@ -1807,9 +1827,6 @@ The following extractors use this feature:
#### vikichannel #### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers` * `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
#### niconico
* `segment_duration`: Segment duration in milliseconds for HLS-DMC formats. Use it at your own risk since this feature **may result in your account termination.**
#### youtubewebarchive #### youtubewebarchive
* `check_all`: Try to check more at the cost of more requests. One or more of `thumbnails`, `captures` * `check_all`: Try to check more at the cost of more requests. One or more of `thumbnails`, `captures`
@ -2161,7 +2178,7 @@ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
* **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection than what is possible by simply using `--format` ([examples](#format-selection-examples)) * **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection than what is possible by simply using `--format` ([examples](#format-selection-examples))
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. Note that NicoNico livestreams are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details. * **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
* **YouTube improvements**: * **YouTube improvements**:
* Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefixes (`ytsearch:`, `ytsearchdate:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`) * Supports Clips, Stories (`ytstories:<channel UCID>`), Search (including filters)**\***, YouTube Music Search, Channel-specific search, Search prefixes (`ytsearch:`, `ytsearchdate:`)**\***, Mixes, and Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`, `:ytnotif`)

View File

@ -245,5 +245,14 @@
"when": "76ac023ff02f06e8c003d104f02a03deeddebdcd", "when": "76ac023ff02f06e8c003d104f02a03deeddebdcd",
"short": "[ie/youtube:tab] Improve shorts title extraction (#11997)", "short": "[ie/youtube:tab] Improve shorts title extraction (#11997)",
"authors": ["bashonly", "d3d9"] "authors": ["bashonly", "d3d9"]
},
{
"action": "add",
"when": "88eb1e7a9a2720ac89d653c0d0e40292388823bb",
"short": "[priority] **New option `--preset-alias`/`-t` has been added**\nThis provides convenient predefined aliases for common use cases. Available presets include `mp4`, `mp3`, `mkv`, `aac`, and `sleep`. See [the README](https://github.com/yt-dlp/yt-dlp/blob/master/README.md#preset-aliases) for more details."
},
{
"action": "remove",
"when": "d596824c2f8428362c072518856065070616e348"
} }
] ]

View File

@ -82,7 +82,7 @@ test = [
"pytest-rerunfailures~=14.0", "pytest-rerunfailures~=14.0",
] ]
pyinstaller = [ pyinstaller = [
"pyinstaller>=6.11.1", # Windows temp cleanup fixed in 6.11.1 "pyinstaller>=6.13.0", # Windows temp cleanup fixed in 6.13.0
] ]
[project.urls] [project.urls]

View File

@ -394,6 +394,8 @@ The only reliable way to check if a site is supported is to try it.
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw**: (**Currently broken**) - **dw**: (**Currently broken**)
- **dw:article**: (**Currently broken**) - **dw:article**: (**Currently broken**)
- **dzen.ru**: Дзен (dzen) formerly Яндекс.Дзен (Yandex Zen)
- **dzen.ru:channel**
- **EaglePlatform** - **EaglePlatform**
- **EbaumsWorld** - **EbaumsWorld**
- **Ebay** - **Ebay**
@ -634,6 +636,7 @@ The only reliable way to check if a site is supported is to try it.
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Ivoox**
- **IVXPlayer** - **IVXPlayer**
- **iwara**: [*iwara*](## "netrc machine") - **iwara**: [*iwara*](## "netrc machine")
- **iwara:playlist**: [*iwara*](## "netrc machine") - **iwara:playlist**: [*iwara*](## "netrc machine")
@ -671,6 +674,7 @@ The only reliable way to check if a site is supported is to try it.
- **Kicker** - **Kicker**
- **KickStarter** - **KickStarter**
- **Kika**: KiKA.de - **Kika**: KiKA.de
- **KikaPlaylist**
- **kinja:embed** - **kinja:embed**
- **KinoPoisk** - **KinoPoisk**
- **Kommunetv** - **Kommunetv**
@ -723,6 +727,7 @@ The only reliable way to check if a site is supported is to try it.
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
- **LinkedIn**: [*linkedin*](## "netrc machine") - **LinkedIn**: [*linkedin*](## "netrc machine")
- **linkedin:events**: [*linkedin*](## "netrc machine")
- **linkedin:learning**: [*linkedin*](## "netrc machine") - **linkedin:learning**: [*linkedin*](## "netrc machine")
- **linkedin:learning:course**: [*linkedin*](## "netrc machine") - **linkedin:learning:course**: [*linkedin*](## "netrc machine")
- **Liputan6** - **Liputan6**
@ -738,6 +743,7 @@ The only reliable way to check if a site is supported is to try it.
- **loom** - **loom**
- **loom:folder** - **loom:folder**
- **LoveHomePorn** - **LoveHomePorn**
- **LRTRadio**
- **LRTStream** - **LRTStream**
- **LRTVOD** - **LRTVOD**
- **LSMLREmbed** - **LSMLREmbed**
@ -759,7 +765,7 @@ The only reliable way to check if a site is supported is to try it.
- **ManotoTV**: Manoto TV (Episode) - **ManotoTV**: Manoto TV (Episode)
- **ManotoTVLive**: Manoto TV (Live) - **ManotoTVLive**: Manoto TV (Live)
- **ManotoTVShow**: Manoto TV (Show) - **ManotoTVShow**: Manoto TV (Show)
- **ManyVids**: (**Currently broken**) - **ManyVids**
- **MaoriTV** - **MaoriTV**
- **Markiza**: (**Currently broken**) - **Markiza**: (**Currently broken**)
- **MarkizaPage**: (**Currently broken**) - **MarkizaPage**: (**Currently broken**)
@ -946,7 +952,7 @@ The only reliable way to check if a site is supported is to try it.
- **nickelodeonru** - **nickelodeonru**
- **niconico**: [*niconico*](## "netrc machine") ニコニコ動画 - **niconico**: [*niconico*](## "netrc machine") ニコニコ動画
- **niconico:history**: NicoNico user history or likes. Requires cookies. - **niconico:history**: NicoNico user history or likes. Requires cookies.
- **niconico:live**: ニコニコ生放送 - **niconico:live**: [*niconico*](## "netrc machine") ニコニコ生放送
- **niconico:playlist** - **niconico:playlist**
- **niconico:series** - **niconico:series**
- **niconico:tag**: NicoNico video tag URLs - **niconico:tag**: NicoNico video tag URLs
@ -1053,6 +1059,8 @@ The only reliable way to check if a site is supported is to try it.
- **Parler**: Posts on parler.com - **Parler**: Posts on parler.com
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Parlview**: (**Currently broken**) - **Parlview**: (**Currently broken**)
- **parti:livestream**
- **parti:video**
- **patreon** - **patreon**
- **patreon:campaign** - **patreon:campaign**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
@ -1227,6 +1235,7 @@ The only reliable way to check if a site is supported is to try it.
- **RoosterTeeth**: [*roosterteeth*](## "netrc machine") - **RoosterTeeth**: [*roosterteeth*](## "netrc machine")
- **RoosterTeethSeries**: [*roosterteeth*](## "netrc machine") - **RoosterTeethSeries**: [*roosterteeth*](## "netrc machine")
- **RottenTomatoes** - **RottenTomatoes**
- **RoyaLive**
- **Rozhlas** - **Rozhlas**
- **RozhlasVltava** - **RozhlasVltava**
- **RTBF**: [*rtbf*](## "netrc machine") (**Currently broken**) - **RTBF**: [*rtbf*](## "netrc machine") (**Currently broken**)
@ -1247,9 +1256,8 @@ The only reliable way to check if a site is supported is to try it.
- **RTVCKaltura** - **RTVCKaltura**
- **RTVCPlay** - **RTVCPlay**
- **RTVCPlayEmbed** - **RTVCPlayEmbed**
- **rtve.es:alacarta**: RTVE a la carta - **rtve.es:alacarta**: RTVE a la carta and Play
- **rtve.es:audio**: RTVE audio - **rtve.es:audio**: RTVE audio
- **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television** - **rtve.es:television**
- **rtvslo.si** - **rtvslo.si**
@ -1562,7 +1570,8 @@ The only reliable way to check if a site is supported is to try it.
- **tvp:vod:series** - **tvp:vod:series**
- **TVPlayer** - **TVPlayer**
- **TVPlayHome** - **TVPlayHome**
- **Tvw** - **tvw**
- **tvw:tvchannels**
- **Tweakers** - **Tweakers**
- **TwitCasting** - **TwitCasting**
- **TwitCastingLive** - **TwitCastingLive**
@ -1821,14 +1830,12 @@ The only reliable way to check if a site is supported is to try it.
- **ZattooLive**: [*zattoo*](## "netrc machine") - **ZattooLive**: [*zattoo*](## "netrc machine")
- **ZattooMovies**: [*zattoo*](## "netrc machine") - **ZattooMovies**: [*zattoo*](## "netrc machine")
- **ZattooRecordings**: [*zattoo*](## "netrc machine") - **ZattooRecordings**: [*zattoo*](## "netrc machine")
- **ZDF** - **zdf**
- **ZDFChannel** - **zdf:channel**
- **Zee5**: [*zee5*](## "netrc machine") - **Zee5**: [*zee5*](## "netrc machine")
- **zee5:series** - **zee5:series**
- **ZeeNews**: (**Currently broken**) - **ZeeNews**: (**Currently broken**)
- **ZenPorn** - **ZenPorn**
- **ZenYandex**
- **ZenYandexChannel**
- **ZetlandDKArticle** - **ZetlandDKArticle**
- **Zhihu** - **Zhihu**
- **zingmp3**: zingmp3.vn - **zingmp3**: zingmp3.vn

View File

@ -30,7 +30,7 @@ from .hls import HlsFD
from .http import HttpFD from .http import HttpFD
from .ism import IsmFD from .ism import IsmFD
from .mhtml import MhtmlFD from .mhtml import MhtmlFD
from .niconico import NiconicoDmcFD, NiconicoLiveFD from .niconico import NiconicoLiveFD
from .rtmp import RtmpFD from .rtmp import RtmpFD
from .rtsp import RtspFD from .rtsp import RtspFD
from .websocket import WebSocketFragmentFD from .websocket import WebSocketFragmentFD
@ -50,7 +50,6 @@ PROTOCOL_MAP = {
'http_dash_segments_generator': DashSegmentsFD, 'http_dash_segments_generator': DashSegmentsFD,
'ism': IsmFD, 'ism': IsmFD,
'mhtml': MhtmlFD, 'mhtml': MhtmlFD,
'niconico_dmc': NiconicoDmcFD,
'niconico_live': NiconicoLiveFD, 'niconico_live': NiconicoLiveFD,
'fc2_live': FC2LiveFD, 'fc2_live': FC2LiveFD,
'websocket_frag': WebSocketFragmentFD, 'websocket_frag': WebSocketFragmentFD,
@ -67,7 +66,6 @@ def shorten_protocol_name(proto, simplify=False):
'rtmp_ffmpeg': 'rtmpF', 'rtmp_ffmpeg': 'rtmpF',
'http_dash_segments': 'dash', 'http_dash_segments': 'dash',
'http_dash_segments_generator': 'dashG', 'http_dash_segments_generator': 'dashG',
'niconico_dmc': 'dmc',
'websocket_frag': 'WSfrag', 'websocket_frag': 'WSfrag',
} }
if simplify: if simplify:

View File

@ -2,60 +2,12 @@ import json
import threading import threading
import time import time
from . import get_suitable_downloader
from .common import FileDownloader from .common import FileDownloader
from .external import FFmpegFD from .external import FFmpegFD
from ..networking import Request from ..networking import Request
from ..utils import DownloadError, str_or_none, try_get from ..utils import DownloadError, str_or_none, try_get
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
def real_download(self, filename, info_dict):
from ..extractor.niconico import NiconicoIE
self.to_screen(f'[{self.FD_NAME}] Downloading from DMC')
ie = NiconicoIE(self.ydl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = get_suitable_downloader(info_dict, params=self.params)(self.ydl, self.params)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
request = Request(heartbeat_url, heartbeat_data)
def heartbeat():
try:
self.ydl.urlopen(request).read()
except Exception:
self.to_screen(f'[{self.FD_NAME}] Heartbeat failed')
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success
class NiconicoLiveFD(FileDownloader): class NiconicoLiveFD(FileDownloader):
""" Downloads niconico live without being stopped """ """ Downloads niconico live without being stopped """

View File

@ -338,7 +338,6 @@ from .canalc2 import Canalc2IE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalsurmas import CanalsurmasIE from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE from .caracoltv import CaracolTvPlayIE
from .cartoonnetwork import CartoonNetworkIE
from .cbc import ( from .cbc import (
CBCIE, CBCIE,
CBCGemIE, CBCGemIE,
@ -929,7 +928,10 @@ from .jiocinema import (
) )
from .jiosaavn import ( from .jiosaavn import (
JioSaavnAlbumIE, JioSaavnAlbumIE,
JioSaavnArtistIE,
JioSaavnPlaylistIE, JioSaavnPlaylistIE,
JioSaavnShowIE,
JioSaavnShowPlaylistIE,
JioSaavnSongIE, JioSaavnSongIE,
) )
from .joj import JojIE from .joj import JojIE
@ -1042,6 +1044,7 @@ from .limelight import (
LimelightMediaIE, LimelightMediaIE,
) )
from .linkedin import ( from .linkedin import (
LinkedInEventsIE,
LinkedInIE, LinkedInIE,
LinkedInLearningCourseIE, LinkedInLearningCourseIE,
LinkedInLearningIE, LinkedInLearningIE,
@ -1783,7 +1786,6 @@ from .rtvcplay import (
from .rtve import ( from .rtve import (
RTVEALaCartaIE, RTVEALaCartaIE,
RTVEAudioIE, RTVEAudioIE,
RTVEInfantilIE,
RTVELiveIE, RTVELiveIE,
RTVETelevisionIE, RTVETelevisionIE,
) )
@ -1964,7 +1966,6 @@ from .spreaker import (
SpreakerShowIE, SpreakerShowIE,
) )
from .springboardplatform import SpringboardPlatformIE from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
from .sproutvideo import ( from .sproutvideo import (
SproutVideoIE, SproutVideoIE,
VidsIoIE, VidsIoIE,
@ -2237,7 +2238,10 @@ from .tvplay import (
TVPlayIE, TVPlayIE,
) )
from .tvplayer import TVPlayerIE from .tvplayer import TVPlayerIE
from .tvw import TvwIE from .tvw import (
TvwIE,
TvwTvChannelsIE,
)
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE from .twentythreevideo import TwentyThreeVideoIE

View File

@ -21,6 +21,7 @@ from ..utils import (
int_or_none, int_or_none,
time_seconds, time_seconds,
traverse_obj, traverse_obj,
update_url,
update_url_query, update_url_query,
) )
@ -417,6 +418,10 @@ class AbemaTVIE(AbemaTVBaseIE):
'is_live': is_live, 'is_live': is_live,
'availability': availability, 'availability': availability,
}) })
if thumbnail := update_url(self._og_search_thumbnail(webpage, default=''), query=None):
info['thumbnails'] = [{'url': thumbnail}]
return info return info

View File

@ -1,32 +1,24 @@
import re from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from .theplatform import ThePlatformIE from ..utils.traversal import traverse_obj
from ..utils import (
int_or_none,
parse_age_limit,
try_get,
update_url_query,
)
class AMCNetworksIE(ThePlatformIE): # XXX: Do not subclass from concrete IE class AMCNetworksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/]+)+)/[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/?#]+)+)/[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.bbcamerica.com/shows/the-graham-norton-show/videos/tina-feys-adorable-airline-themed-family-dinner--51631', 'url': 'https://www.amc.com/shows/dark-winds/videos/dark-winds-a-look-at-season-3--1072027',
'info_dict': { 'info_dict': {
'id': '4Lq1dzOnZGt0', 'id': '6369261343112',
'ext': 'mp4', 'ext': 'mp4',
'title': "The Graham Norton Show - Season 28 - Tina Fey's Adorable Airline-Themed Family Dinner", 'title': 'Dark Winds: A Look at Season 3',
'description': "It turns out child stewardesses are very generous with the wine! All-new episodes of 'The Graham Norton Show' premiere Fridays at 11/10c on BBC America.", 'uploader_id': '6240731308001',
'upload_date': '20201120', 'duration': 176.427,
'timestamp': 1605904350, 'thumbnail': r're:https://[^/]+\.boltdns\.net/.+/image\.jpg',
'uploader': 'AMCN', 'tags': [],
'timestamp': 1740414792,
'upload_date': '20250224',
}, },
'params': { 'params': {'skip_download': 'm3u8'},
# m3u8 download
'skip_download': True,
},
'skip': '404 Not Found',
}, { }, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge', 'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True, 'only_matching': True,
@ -52,96 +44,18 @@ class AMCNetworksIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1', 'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True, 'only_matching': True,
}] }]
_REQUESTOR_ID_MAP = {
'amc': 'AMC',
'bbcamerica': 'BBCA',
'ifc': 'IFC',
'sundancetv': 'SUNDANCE',
'wetv': 'WETV',
}
def _real_extract(self, url): def _real_extract(self, url):
site, display_id = self._match_valid_url(url).groups() display_id = self._match_id(url)
requestor_id = self._REQUESTOR_ID_MAP[site] webpage = self._download_webpage(url, display_id)
page_data = self._download_json( initial_data = self._search_json(
f'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/{requestor_id.lower()}/url/{display_id}', r'window\.initialData\s*=\s*JSON\.parse\(String\.raw`', webpage, 'initial data', display_id)
display_id)['data'] video_id = traverse_obj(initial_data, ('initialData', 'properties', 'videoId', {str}))
properties = page_data.get('properties') or {} if not video_id: # All locked videos are now DRM-protected
query = { self.report_drm(display_id)
'mbr': 'true', account_id = initial_data['config']['brightcove']['accountId']
'manifest': 'm3u', player_id = initial_data['config']['brightcove']['playerId']
}
video_player_count = 0 return self.url_result(
try: f'https://players.brightcove.net/{account_id}/{player_id}_default/index.html?videoId={video_id}',
for v in page_data['children']: BrightcoveNewIE, video_id)
if v.get('type') == 'video-player':
release_pid = v['properties']['currentVideo']['meta']['releasePid']
tp_path = 'M_UwQC/' + release_pid
media_url = 'https://link.theplatform.com/s/' + tp_path
video_player_count += 1
except KeyError:
pass
if video_player_count > 1:
self.report_warning(
f'The JSON data has {video_player_count} video players. Only one will be extracted')
# Fall back to videoPid if releasePid not found.
# TODO: Fall back to videoPid if releasePid manifest uses DRM.
if not video_player_count:
tp_path = 'M_UwQC/media/' + properties['videoPid']
media_url = 'https://link.theplatform.com/s/' + tp_path
theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating'])
video_category = properties.get('videoCategory')
if video_category and video_category.endswith('-Auth'):
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
thumbnails = []
thumbnail_urls = [properties.get('imageDesktop')]
if 'thumbnail' in info:
thumbnail_urls.append(info.pop('thumbnail'))
for thumbnail_url in thumbnail_urls:
if not thumbnail_url:
continue
mobj = re.search(r'(\d+)x(\d+)', thumbnail_url)
thumbnails.append({
'url': thumbnail_url,
'width': int(mobj.group(1)) if mobj else None,
'height': int(mobj.group(2)) if mobj else None,
})
info.update({
'age_limit': parse_age_limit(rating),
'formats': formats,
'id': video_id,
'subtitles': subtitles,
'thumbnails': thumbnails,
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = next(iter(ns_keys))
episode = theplatform_metadata.get(ns + '$episodeTitle') or None
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
series = theplatform_metadata.get(ns + '$show') or None
info.update({
'episode': episode,
'episode_number': episode_number,
'season_number': season_number,
'series': series,
})
return info

View File

@ -1,64 +1,105 @@
import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..networking.exceptions import HTTPError from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_age_limit,
url_or_none,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import traverse_obj
class AtresPlayerIE(InfoExtractor): class AtresPlayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?atresplayer\.com/[^/]+/[^/]+/[^/]+/[^/]+/(?P<display_id>.+?)_(?P<id>[0-9a-f]{24})' _VALID_URL = r'https?://(?:www\.)?atresplayer\.com/(?:[^/?#]+/){4}(?P<display_id>.+?)_(?P<id>[0-9a-f]{24})'
_NETRC_MACHINE = 'atresplayer' _NETRC_MACHINE = 'atresplayer'
_TESTS = [ _TESTS = [{
{ 'url': 'https://www.atresplayer.com/lasexta/programas/el-objetivo/clips/mbappe-describe-como-entrenador-a-carlo-ancelotti-sabe-cuando-tiene-que-ser-padre-jefe-amigo-entrenador_67f2dfb2fb6ab0e4c7203849/',
'url': 'https://www.atresplayer.com/antena3/series/pequenas-coincidencias/temporada-1/capitulo-7-asuntos-pendientes_5d4aa2c57ed1a88fc715a615/', 'info_dict': {
'info_dict': { 'ext': 'mp4',
'id': '5d4aa2c57ed1a88fc715a615', 'id': '67f2dfb2fb6ab0e4c7203849',
'ext': 'mp4', 'display_id': 'md5:c203f8d4e425ed115ba56a1c6e4b3e6c',
'title': 'Capítulo 7: Asuntos pendientes', 'title': 'Mbappé describe como entrenador a Carlo Ancelotti: "Sabe cuándo tiene que ser padre, jefe, amigo, entrenador..."',
'description': 'md5:7634cdcb4d50d5381bedf93efb537fbc', 'channel': 'laSexta',
'duration': 3413, 'duration': 31,
}, 'thumbnail': 'https://imagenes.atresplayer.com/atp/clipping/cmsimages02/2025/04/06/B02DBE1E-D59B-4683-8404-1A9595D15269/1920x1080.jpg',
'skip': 'This video is only available for registered users', 'tags': ['Entrevista informativa', 'Actualidad', 'Debate informativo', 'Política', 'Economía', 'Sociedad', 'Cara a cara', 'Análisis', 'Más periodismo'],
'series': 'El Objetivo',
'season': 'Temporada 12',
'timestamp': 1743970079,
'upload_date': '20250406',
}, },
{ }, {
'url': 'https://www.atresplayer.com/lasexta/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_5ad08edf986b2855ed47adc4/', 'url': 'https://www.atresplayer.com/antena3/programas/el-hormiguero/clips/revive-la-entrevista-completa-a-miguel-bose-en-el-hormiguero_67f836baa4a5b0e4147ca59a/',
'only_matching': True, 'info_dict': {
'ext': 'mp4',
'id': '67f836baa4a5b0e4147ca59a',
'display_id': 'revive-la-entrevista-completa-a-miguel-bose-en-el-hormiguero',
'title': 'Revive la entrevista completa a Miguel Bosé en El Hormiguero',
'description': 'md5:c6d2b591408d45a7bc2986dfb938eb72',
'channel': 'Antena 3',
'duration': 2556,
'thumbnail': 'https://imagenes.atresplayer.com/atp/clipping/cmsimages02/2025/04/10/9076395F-F1FD-48BE-9F18-540DBA10EBAD/1920x1080.jpg',
'tags': ['Entrevista', 'Variedades', 'Humor', 'Entretenimiento', 'Te sigo', 'Buen rollo', 'Cara a cara'],
'series': 'El Hormiguero ',
'season': 'Temporada 14',
'timestamp': 1744320111,
'upload_date': '20250410',
}, },
{ }, {
'url': 'https://www.atresplayer.com/antena3/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_5ad51046986b2886722ccdea/', 'url': 'https://www.atresplayer.com/flooxer/series/biara-proyecto-lazarus/temporada-1/capitulo-3-supervivientes_67a6038b64ceca00070f4f69/',
'only_matching': True, 'info_dict': {
'ext': 'mp4',
'id': '67a6038b64ceca00070f4f69',
'display_id': 'capitulo-3-supervivientes',
'title': 'Capítulo 3: Supervivientes',
'description': 'md5:65b231f20302f776c2b0dd24594599a1',
'channel': 'Flooxer',
'duration': 1196,
'thumbnail': 'https://imagenes.atresplayer.com/atp/clipping/cmsimages01/2025/02/14/17CF90D3-FE67-40C5-A941-7825B3E13992/1920x1080.jpg',
'tags': ['Juvenil', 'Terror', 'Piel de gallina', 'Te sigo', 'Un break', 'Del tirón'],
'series': 'BIARA: Proyecto Lázarus',
'season': 'Temporada 1',
'season_number': 1,
'episode': 'Episode 3',
'episode_number': 3,
'timestamp': 1743095191,
'upload_date': '20250327',
}, },
] }, {
'url': 'https://www.atresplayer.com/lasexta/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_5ad08edf986b2855ed47adc4/',
'only_matching': True,
}, {
'url': 'https://www.atresplayer.com/antena3/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_5ad51046986b2886722ccdea/',
'only_matching': True,
}]
_API_BASE = 'https://api.atresplayer.com/' _API_BASE = 'https://api.atresplayer.com/'
def _perform_login(self, username, password): def _perform_login(self, username, password):
self._request_webpage(
self._API_BASE + 'login', None, 'Downloading login page')
try: try:
target_url = self._download_json( self._download_webpage(
'https://account.atresmedia.com/api/login', None, 'https://account.atresplayer.com/auth/v1/login', None,
'Logging in', headers={ 'Logging in', 'Failed to log in', data=urlencode_postdata({
'Content-Type': 'application/x-www-form-urlencoded',
}, data=urlencode_postdata({
'username': username, 'username': username,
'password': password, 'password': password,
}))['targetUrl'] }))
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 400: if isinstance(e.cause, HTTPError) and e.cause.status == 400:
raise ExtractorError('Invalid username and/or password', expected=True) raise ExtractorError('Invalid username and/or password', expected=True)
raise raise
self._request_webpage(target_url, None, 'Following Target URL')
def _real_extract(self, url): def _real_extract(self, url):
display_id, video_id = self._match_valid_url(url).groups() display_id, video_id = self._match_valid_url(url).groups()
metadata_url = self._download_json(
self._API_BASE + 'client/v1/url', video_id, 'Downloading API endpoint data',
query={'href': urllib.parse.urlparse(url).path})['href']
metadata = self._download_json(metadata_url, video_id)
try: try:
episode = self._download_json( video_data = self._download_json(metadata['urlVideo'], video_id, 'Downloading video data')
self._API_BASE + 'client/v1/player/episode/' + video_id, video_id)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403: if isinstance(e.cause, HTTPError) and e.cause.status == 403:
error = self._parse_json(e.cause.response.read(), None) error = self._parse_json(e.cause.response.read(), None)
@ -67,37 +108,45 @@ class AtresPlayerIE(InfoExtractor):
raise ExtractorError(error['error_description'], expected=True) raise ExtractorError(error['error_description'], expected=True)
raise raise
title = episode['titulo']
formats = [] formats = []
subtitles = {} subtitles = {}
for source in episode.get('sources', []): for source in traverse_obj(video_data, ('sources', lambda _, v: url_or_none(v['src']))):
src = source.get('src') src_url = source['src']
if not src:
continue
src_type = source.get('type') src_type = source.get('type')
if src_type == 'application/vnd.apple.mpegurl': if src_type in ('application/vnd.apple.mpegurl', 'application/hls+legacy', 'application/hls+hevc'):
formats, subtitles = self._extract_m3u8_formats( fmts, subs = self._extract_m3u8_formats_and_subtitles(
src, video_id, 'mp4', 'm3u8_native', src_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
m3u8_id='hls', fatal=False) elif src_type in ('application/dash+xml', 'application/dash+hevc'):
elif src_type == 'application/dash+xml': fmts, subs = self._extract_mpd_formats_and_subtitles(
formats, subtitles = self._extract_mpd_formats( src_url, video_id, mpd_id='dash', fatal=False)
src, video_id, mpd_id='dash', fatal=False) else:
continue
heartbeat = episode.get('heartbeat') or {} formats.extend(fmts)
omniture = episode.get('omniture') or {} self._merge_subtitles(subs, target=subtitles)
get_meta = lambda x: heartbeat.get(x) or omniture.get(x)
return { return {
'display_id': display_id, 'display_id': display_id,
'id': video_id, 'id': video_id,
'title': title,
'description': episode.get('descripcion'),
'thumbnail': episode.get('imgPoster'),
'duration': int_or_none(episode.get('duration')),
'formats': formats, 'formats': formats,
'channel': get_meta('channel'),
'season': get_meta('season'),
'episode_number': int_or_none(get_meta('episodeNumber')),
'subtitles': subtitles, 'subtitles': subtitles,
**traverse_obj(video_data, {
'title': ('titulo', {str}),
'description': ('descripcion', {str}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('imgPoster', {url_or_none}, {lambda v: f'{v}1920x1080.jpg'}),
'age_limit': ('ageRating', {parse_age_limit}),
}),
**traverse_obj(metadata, {
'title': ('title', {str}),
'description': ('description', {str}),
'duration': ('duration', {int_or_none}),
'tags': ('tags', ..., 'title', {str}),
'age_limit': ('ageRating', {parse_age_limit}),
'series': ('format', 'title', {str}),
'season': ('currentSeason', 'title', {str}),
'season_number': ('currentSeason', 'seasonNumber', {int_or_none}),
'episode_number': ('numberOfEpisode', {int_or_none}),
'timestamp': ('publicationDate', {int_or_none(scale=1000)}),
'channel': ('channel', 'title', {str}),
}),
} }

View File

@ -1,30 +1,32 @@
import functools import functools
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
clean_html, clean_html,
extract_attributes, determine_ext,
format_field,
get_element_by_class, get_element_by_class,
get_element_by_id,
get_element_html_by_class,
get_elements_html_by_class, get_elements_html_by_class,
int_or_none, int_or_none,
orderedSet, orderedSet,
parse_count, parse_count,
parse_duration, parse_duration,
traverse_obj, parse_iso8601,
unified_strdate, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import traverse_obj
class BitChuteIE(InfoExtractor): class BitChuteIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|old)\.)?bitchute\.com/(?:video|embed|torrent/[^/]+)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:(?:www|old)\.)?bitchute\.com/(?:video|embed|torrent/[^/?#]+)/(?P<id>[^/?#&]+)'
_EMBED_REGEX = [rf'<(?:script|iframe)[^>]+\bsrc=(["\'])(?P<url>{_VALID_URL})'] _EMBED_REGEX = [rf'<(?:script|iframe)[^>]+\bsrc=(["\'])(?P<url>{_VALID_URL})']
_TESTS = [{ _TESTS = [{
'url': 'https://www.bitchute.com/video/UGlrF9o9b-Q/', 'url': 'https://www.bitchute.com/video/UGlrF9o9b-Q/',
@ -34,12 +36,17 @@ class BitChuteIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'This is the first video on #BitChute !', 'title': 'This is the first video on #BitChute !',
'description': 'md5:a0337e7b1fe39e32336974af8173a034', 'description': 'md5:a0337e7b1fe39e32336974af8173a034',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg$',
'uploader': 'BitChute', 'uploader': 'BitChute',
'upload_date': '20170103', 'upload_date': '20170103',
'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/', 'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/',
'channel': 'BitChute', 'channel': 'BitChute',
'channel_url': 'https://www.bitchute.com/channel/bitchute/', 'channel_url': 'https://www.bitchute.com/channel/bitchute/',
'uploader_id': 'I5NgtHZn9vPj',
'channel_id': '1VBwRfyNcKdX',
'view_count': int,
'duration': 16.0,
'timestamp': 1483425443,
}, },
}, { }, {
# test case: video with different channel and uploader # test case: video with different channel and uploader
@ -49,13 +56,18 @@ class BitChuteIE(InfoExtractor):
'id': 'Yti_j9A-UZ4', 'id': 'Yti_j9A-UZ4',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Israel at War | Full Measure', 'title': 'Israel at War | Full Measure',
'description': 'md5:38cf7bc6f42da1a877835539111c69ef', 'description': 'md5:e60198b89971966d6030d22b3268f08f',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg$',
'uploader': 'sharylattkisson', 'uploader': 'sharylattkisson',
'upload_date': '20231106', 'upload_date': '20231106',
'uploader_url': 'https://www.bitchute.com/profile/9K0kUWA9zmd9/', 'uploader_url': 'https://www.bitchute.com/profile/9K0kUWA9zmd9/',
'channel': 'Full Measure with Sharyl Attkisson', 'channel': 'Full Measure with Sharyl Attkisson',
'channel_url': 'https://www.bitchute.com/channel/sharylattkisson/', 'channel_url': 'https://www.bitchute.com/channel/sharylattkisson/',
'uploader_id': '9K0kUWA9zmd9',
'channel_id': 'NpdxoCRv3ZLb',
'view_count': int,
'duration': 554.0,
'timestamp': 1699296106,
}, },
}, { }, {
# video not downloadable in browser, but we can recover it # video not downloadable in browser, but we can recover it
@ -66,25 +78,21 @@ class BitChuteIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'filesize': 71537926, 'filesize': 71537926,
'title': 'STYXHEXENHAMMER666 - Election Fraud, Clinton 2020, EU Armies, and Gun Control', 'title': 'STYXHEXENHAMMER666 - Election Fraud, Clinton 2020, EU Armies, and Gun Control',
'description': 'md5:228ee93bd840a24938f536aeac9cf749', 'description': 'md5:2029c7c212ccd4b040f52bb2d036ef4e',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg$',
'uploader': 'BitChute', 'uploader': 'BitChute',
'upload_date': '20181113', 'upload_date': '20181113',
'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/', 'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/',
'channel': 'BitChute', 'channel': 'BitChute',
'channel_url': 'https://www.bitchute.com/channel/bitchute/', 'channel_url': 'https://www.bitchute.com/channel/bitchute/',
'uploader_id': 'I5NgtHZn9vPj',
'channel_id': '1VBwRfyNcKdX',
'view_count': int,
'duration': 1701.0,
'tags': ['bitchute'],
'timestamp': 1542130287,
}, },
'params': {'check_formats': None}, 'params': {'check_formats': None},
}, {
# restricted video
'url': 'https://www.bitchute.com/video/WEnQU7XGcTdl/',
'info_dict': {
'id': 'WEnQU7XGcTdl',
'ext': 'mp4',
'title': 'Impartial Truth - Ein Letzter Appell an die Vernunft',
},
'params': {'skip_download': True},
'skip': 'Georestricted in DE',
}, { }, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/', 'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
'only_matching': True, 'only_matching': True,
@ -96,11 +104,8 @@ class BitChuteIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_GEO_BYPASS = False _GEO_BYPASS = False
_UPLOADER_URL_TMPL = 'https://www.bitchute.com/profile/%s/'
_HEADERS = { _CHANNEL_URL_TMPL = 'https://www.bitchute.com/channel/%s/'
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.57 Safari/537.36',
'Referer': 'https://www.bitchute.com/',
}
def _check_format(self, video_url, video_id): def _check_format(self, video_url, video_id):
urls = orderedSet( urls = orderedSet(
@ -112,7 +117,7 @@ class BitChuteIE(InfoExtractor):
for url in urls: for url in urls:
try: try:
response = self._request_webpage( response = self._request_webpage(
HEADRequest(url), video_id=video_id, note=f'Checking {url}', headers=self._HEADERS) HEADRequest(url), video_id=video_id, note=f'Checking {url}')
except ExtractorError as e: except ExtractorError as e:
self.to_screen(f'{video_id}: URL is invalid, skipping: {e.cause}') self.to_screen(f'{video_id}: URL is invalid, skipping: {e.cause}')
continue continue
@ -121,54 +126,79 @@ class BitChuteIE(InfoExtractor):
'filesize': int_or_none(response.headers.get('Content-Length')), 'filesize': int_or_none(response.headers.get('Content-Length')),
} }
def _raise_if_restricted(self, webpage): def _call_api(self, endpoint, data, display_id, fatal=True):
page_title = clean_html(get_element_by_class('page-title', webpage)) or '' note = endpoint.rpartition('/')[2]
if re.fullmatch(r'(?:Channel|Video) Restricted', page_title): try:
reason = clean_html(get_element_by_id('page-detail', webpage)) or page_title return self._download_json(
self.raise_geo_restricted(reason) f'https://api.bitchute.com/api/beta/{endpoint}', display_id,
f'Downloading {note} API JSON', f'Unable to download {note} API JSON',
@staticmethod data=json.dumps(data).encode(),
def _make_url(html): headers={
path = extract_attributes(get_element_html_by_class('spa', html) or '').get('href') 'Accept': 'application/json',
return urljoin('https://www.bitchute.com', path) 'Content-Type': 'application/json',
})
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403:
errors = '. '.join(traverse_obj(e.cause.response.read().decode(), (
{json.loads}, 'errors', lambda _, v: v['context'] == 'reason', 'message', {str})))
if errors and 'location' in errors:
# Can always be fatal since the video/media call will reach this code first
self.raise_geo_restricted(errors)
if fatal:
raise
self.report_warning(e.msg)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( data = {'video_id': video_id}
f'https://old.bitchute.com/video/{video_id}', video_id, headers=self._HEADERS) media_url = self._call_api('video/media', data, video_id)['media_url']
self._raise_if_restricted(webpage)
publish_date = clean_html(get_element_by_class('video-publish-date', webpage))
entries = self._parse_html5_media_entries(url, webpage, video_id)
formats = [] formats = []
for format_ in traverse_obj(entries, (0, 'formats', ...)): if determine_ext(media_url) == 'm3u8':
formats.extend(
self._extract_m3u8_formats(media_url, video_id, 'mp4', m3u8_id='hls', live=True))
else:
if self.get_param('check_formats') is not False: if self.get_param('check_formats') is not False:
format_.update(self._check_format(format_.pop('url'), video_id) or {}) if fmt := self._check_format(media_url, video_id):
if 'url' not in format_: formats.append(fmt)
continue else:
formats.append(format_) formats.append({'url': media_url})
if not formats: if not formats:
self.raise_no_formats( self.raise_no_formats(
'Video is unavailable. Please make sure this video is playable in the browser ' 'Video is unavailable. Please make sure this video is playable in the browser '
'before reporting this issue.', expected=True, video_id=video_id) 'before reporting this issue.', expected=True, video_id=video_id)
details = get_element_by_class('details', webpage) or '' video = self._call_api('video', data, video_id, fatal=False)
uploader_html = get_element_html_by_class('creator', details) or '' channel = None
channel_html = get_element_html_by_class('name', details) or '' if channel_id := traverse_obj(video, ('channel', 'channel_id', {str})):
channel = self._call_api('channel', {'channel_id': channel_id}, video_id, fatal=False)
return { return {
**traverse_obj(video, {
'title': ('video_name', {str}),
'description': ('description', {str}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'channel': ('channel', 'channel_name', {str}),
'channel_id': ('channel', 'channel_id', {str}),
'channel_url': ('channel', 'channel_url', {urljoin('https://www.bitchute.com/')}),
'uploader_id': ('profile_id', {str}),
'uploader_url': ('profile_id', {format_field(template=self._UPLOADER_URL_TMPL)}, filter),
'timestamp': ('date_published', {parse_iso8601}),
'duration': ('duration', {parse_duration}),
'tags': ('hashtags', ..., {str}, filter, all, filter),
'view_count': ('view_count', {int_or_none}),
'is_live': ('state_id', {lambda x: x == 'live'}),
}),
**traverse_obj(channel, {
'channel': ('channel_name', {str}),
'channel_id': ('channel_id', {str}),
'channel_url': ('url_slug', {format_field(template=self._CHANNEL_URL_TMPL)}, filter),
'uploader': ('profile_name', {str}),
'uploader_id': ('profile_id', {str}),
'uploader_url': ('profile_id', {format_field(template=self._UPLOADER_URL_TMPL)}, filter),
}),
'id': video_id, 'id': video_id,
'title': self._html_extract_title(webpage) or self._og_search_title(webpage),
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader': clean_html(uploader_html),
'uploader_url': self._make_url(uploader_html),
'channel': clean_html(channel_html),
'channel_url': self._make_url(channel_html),
'upload_date': unified_strdate(self._search_regex(
r'at \d+:\d+ UTC on (.+?)\.', publish_date, 'upload date', fatal=False)),
'formats': formats, 'formats': formats,
} }
@ -190,7 +220,7 @@ class BitChuteChannelIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'This is the first video on #BitChute !', 'title': 'This is the first video on #BitChute !',
'description': 'md5:a0337e7b1fe39e32336974af8173a034', 'description': 'md5:a0337e7b1fe39e32336974af8173a034',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg$',
'uploader': 'BitChute', 'uploader': 'BitChute',
'upload_date': '20170103', 'upload_date': '20170103',
'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/', 'uploader_url': 'https://www.bitchute.com/profile/I5NgtHZn9vPj/',
@ -198,6 +228,9 @@ class BitChuteChannelIE(InfoExtractor):
'channel_url': 'https://www.bitchute.com/channel/bitchute/', 'channel_url': 'https://www.bitchute.com/channel/bitchute/',
'duration': 16, 'duration': 16,
'view_count': int, 'view_count': int,
'uploader_id': 'I5NgtHZn9vPj',
'channel_id': '1VBwRfyNcKdX',
'timestamp': 1483425443,
}, },
}, },
], ],
@ -213,6 +246,7 @@ class BitChuteChannelIE(InfoExtractor):
'title': 'Bruce MacDonald and "The Light of Darkness"', 'title': 'Bruce MacDonald and "The Light of Darkness"',
'description': 'md5:747724ef404eebdfc04277714f81863e', 'description': 'md5:747724ef404eebdfc04277714f81863e',
}, },
'skip': '404 Not Found',
}, { }, {
'url': 'https://old.bitchute.com/playlist/wV9Imujxasw9/', 'url': 'https://old.bitchute.com/playlist/wV9Imujxasw9/',
'only_matching': True, 'only_matching': True,

View File

@ -7,6 +7,7 @@ from ..utils import (
join_nonempty, join_nonempty,
js_to_json, js_to_json,
mimetype2ext, mimetype2ext,
parse_resolution,
unified_strdate, unified_strdate,
url_or_none, url_or_none,
urljoin, urljoin,
@ -110,24 +111,23 @@ class BpbIE(InfoExtractor):
return attributes return attributes
@staticmethod def _process_source(self, source):
def _process_source(source):
url = url_or_none(source['src']) url = url_or_none(source['src'])
if not url: if not url:
return None return None
source_type = source.get('type', '') source_type = source.get('type', '')
extension = mimetype2ext(source_type) extension = mimetype2ext(source_type)
is_video = source_type.startswith('video') note = self._search_regex(r'[_-]([a-z]+)\.[\da-z]+(?:$|\?)', url, 'note', default=None)
note = url.rpartition('.')[0].rpartition('_')[2] if is_video else None
return { return {
'url': url, 'url': url,
'ext': extension, 'ext': extension,
'vcodec': None if is_video else 'none', 'vcodec': None if source_type.startswith('video') else 'none',
'quality': 10 if note == 'high' else 0, 'quality': 10 if note == 'high' else 0,
'format_note': note, 'format_note': note,
'format_id': join_nonempty(extension, note), 'format_id': join_nonempty(extension, note),
**parse_resolution(source.get('label')),
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,59 +0,0 @@
from .turner import TurnerBaseIE
from ..utils import int_or_none
class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = {
'url': 'https://www.cartoonnetwork.com/video/ben-10/how-to-draw-upgrade-episode.html',
'info_dict': {
'id': '6e3375097f63874ebccec7ef677c1c3845fa850e',
'ext': 'mp4',
'title': 'How to Draw Upgrade',
'description': 'md5:2061d83776db7e8be4879684eefe8c0f',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
def find_field(global_re, name, content_re=None, value_re='[^"]+', fatal=False):
metadata_re = ''
if content_re:
metadata_re = r'|video_metadata\.content_' + content_re
return self._search_regex(
rf'(?:_cnglobal\.currentVideo\.{global_re}{metadata_re})\s*=\s*"({value_re})";',
webpage, name, fatal=fatal)
media_id = find_field('mediaId', 'media id', 'id', '[0-9a-f]{40}', True)
title = find_field('episodeTitle', 'title', '(?:episodeName|name)', fatal=True)
info = self._extract_ngtv_info(
media_id, {'networkId': 'cartoonnetwork'}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': find_field('authType', 'auth type') != 'unauth',
})
series = find_field(
'propertyName', 'series', 'showName') or self._html_search_meta('partOfSeries', webpage)
info.update({
'id': media_id,
'display_id': display_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'series': series,
'episode': title,
})
for field in ('season', 'episode'):
field_name = field + 'Number'
info[field + '_number'] = int_or_none(find_field(
field_name, field + ' number', value_re=r'\d+') or self._html_search_meta(field_name, webpage))
return info

View File

@ -13,16 +13,17 @@ from ..compat import compat_ord
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
multipart_encode, multipart_encode,
parse_duration, parse_duration,
traverse_obj,
try_call, try_call,
try_get, url_or_none,
urljoin, urljoin,
) )
from ..utils.traversal import traverse_obj
class CDAIE(InfoExtractor): class CDAIE(InfoExtractor):
@ -290,34 +291,47 @@ class CDAIE(InfoExtractor):
if not video or 'file' not in video: if not video or 'file' not in video:
self.report_warning(f'Unable to extract {version} version information') self.report_warning(f'Unable to extract {version} version information')
return return
if video['file'].startswith('uggc'):
video['file'] = codecs.decode(video['file'], 'rot_13')
if video['file'].endswith('adc.mp4'):
video['file'] = video['file'].replace('adc.mp4', '.mp4')
elif not video['file'].startswith('http'):
video['file'] = decrypt_file(video['file'])
video_quality = video.get('quality') video_quality = video.get('quality')
qualities = video.get('qualities', {}) qualities = video.get('qualities', {})
video_quality = next((k for k, v in qualities.items() if v == video_quality), video_quality) video_quality = next((k for k, v in qualities.items() if v == video_quality), video_quality)
info_dict['formats'].append({ if video.get('file'):
'url': video['file'], if video['file'].startswith('uggc'):
'format_id': video_quality, video['file'] = codecs.decode(video['file'], 'rot_13')
'height': int_or_none(video_quality[:-1]), if video['file'].endswith('adc.mp4'):
}) video['file'] = video['file'].replace('adc.mp4', '.mp4')
elif not video['file'].startswith('http'):
video['file'] = decrypt_file(video['file'])
info_dict['formats'].append({
'url': video['file'],
'format_id': video_quality,
'height': int_or_none(video_quality[:-1]),
})
for quality, cda_quality in qualities.items(): for quality, cda_quality in qualities.items():
if quality == video_quality: if quality == video_quality:
continue continue
data = {'jsonrpc': '2.0', 'method': 'videoGetLink', 'id': 2, data = {'jsonrpc': '2.0', 'method': 'videoGetLink', 'id': 2,
'params': [video_id, cda_quality, video.get('ts'), video.get('hash2'), {}]} 'params': [video_id, cda_quality, video.get('ts'), video.get('hash2'), {}]}
data = json.dumps(data).encode() data = json.dumps(data).encode()
video_url = self._download_json( response = self._download_json(
f'https://www.cda.pl/video/{video_id}', video_id, headers={ f'https://www.cda.pl/video/{video_id}', video_id, headers={
'Content-Type': 'application/json', 'Content-Type': 'application/json',
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
}, data=data, note=f'Fetching {quality} url', }, data=data, note=f'Fetching {quality} url',
errnote=f'Failed to fetch {quality} url', fatal=False) errnote=f'Failed to fetch {quality} url', fatal=False)
if try_get(video_url, lambda x: x['result']['status']) == 'ok': if (
video_url = try_get(video_url, lambda x: x['result']['resp']) traverse_obj(response, ('result', 'status')) != 'ok'
or not traverse_obj(response, ('result', 'resp', {url_or_none}))
):
continue
video_url = response['result']['resp']
ext = determine_ext(video_url)
if ext == 'mpd':
info_dict['formats'].extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
elif ext == 'm3u8':
info_dict['formats'].extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
info_dict['formats'].append({ info_dict['formats'].append({
'url': video_url, 'url': video_url,
'format_id': quality, 'format_id': quality,
@ -353,7 +367,7 @@ class CDAIE(InfoExtractor):
class CDAFolderIE(InfoExtractor): class CDAFolderIE(InfoExtractor):
_MAX_PAGE_SIZE = 36 _MAX_PAGE_SIZE = 36
_VALID_URL = r'https?://(?:www\.)?cda\.pl/(?P<channel>\w+)/folder/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?cda\.pl/(?P<channel>[\w-]+)/folder/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'https://www.cda.pl/domino264/folder/31188385', 'url': 'https://www.cda.pl/domino264/folder/31188385',
@ -378,6 +392,9 @@ class CDAFolderIE(InfoExtractor):
'title': 'TESTY KOSMETYKÓW', 'title': 'TESTY KOSMETYKÓW',
}, },
'playlist_mincount': 139, 'playlist_mincount': 139,
}, {
'url': 'https://www.cda.pl/FILMY-SERIALE-ANIME-KRESKOWKI-BAJKI/folder/18493422',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -9,6 +9,7 @@ from ..utils import (
ExtractorError, ExtractorError,
classproperty, classproperty,
float_or_none, float_or_none,
parse_qs,
traverse_obj, traverse_obj,
url_or_none, url_or_none,
) )
@ -91,11 +92,15 @@ class DacastVODIE(DacastBaseIE):
# Rotates every so often, but hardcode a fallback in case of JS change/breakage before rotation # Rotates every so often, but hardcode a fallback in case of JS change/breakage before rotation
return self._search_regex( return self._search_regex(
r'\bUSP_SIGNING_SECRET\s*=\s*(["\'])(?P<secret>(?:(?!\1).)+)', player_js, r'\bUSP_SIGNING_SECRET\s*=\s*(["\'])(?P<secret>(?:(?!\1).)+)', player_js,
'usp signing secret', group='secret', fatal=False) or 'odnInCGqhvtyRTtIiddxtuRtawYYICZP' 'usp signing secret', group='secret', fatal=False) or 'hGDtqMKYVeFdofrAfFmBcrsakaZELajI'
def _real_extract(self, url): def _real_extract(self, url):
user_id, video_id = self._match_valid_url(url).group('user_id', 'id') user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
query = {'contentId': f'{user_id}-vod-{video_id}', 'provider': 'universe'} query = {
'contentId': f'{user_id}-vod-{video_id}',
'provider': 'universe',
**traverse_obj(url, ({parse_qs}, 'uss_token', {'signedKey': -1})),
}
info = self._download_json(self._API_INFO_URL, video_id, query=query, fatal=False) info = self._download_json(self._API_INFO_URL, video_id, query=query, fatal=False)
access = self._download_json( access = self._download_json(
'https://playback.dacast.com/content/access', video_id, 'https://playback.dacast.com/content/access', video_id,

View File

@ -1,9 +1,15 @@
from .zdf import ZDFBaseIE from .zdf import ZDFBaseIE
from ..utils import (
int_or_none,
merge_dicts,
parse_iso8601,
)
from ..utils.traversal import require, traverse_obj
class DreiSatIE(ZDFBaseIE): class DreiSatIE(ZDFBaseIE):
IE_NAME = '3sat' IE_NAME = '3sat'
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html' _VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/?#]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'https://www.3sat.de/dokumentation/reise/traumziele-suedostasiens-die-philippinen-und-vietnam-102.html', 'url': 'https://www.3sat.de/dokumentation/reise/traumziele-suedostasiens-die-philippinen-und-vietnam-102.html',
'info_dict': { 'info_dict': {
@ -12,40 +18,59 @@ class DreiSatIE(ZDFBaseIE):
'title': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam', 'title': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'description': 'md5:26329ce5197775b596773b939354079d', 'description': 'md5:26329ce5197775b596773b939354079d',
'duration': 2625.0, 'duration': 2625.0,
'thumbnail': 'https://www.3sat.de/assets/traumziele-suedostasiens-die-philippinen-und-vietnam-100~2400x1350?cb=1699870351148', 'thumbnail': 'https://www.3sat.de/assets/traumziele-suedostasiens-die-philippinen-und-vietnam-100~original?cb=1699870351148',
'episode': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam', 'episode': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'episode_id': 'POS_cc7ff51c-98cf-4d12-b99d-f7a551de1c95', 'episode_id': 'POS_cc7ff51c-98cf-4d12-b99d-f7a551de1c95',
'timestamp': 1738593000, 'timestamp': 1747920900,
'upload_date': '20250203', 'upload_date': '20250522',
}, },
}, { }, {
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html 'url': 'https://www.3sat.de/film/ab-18/ab-18---mein-fremdes-ich-100.html',
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html', 'md5': 'f92638413a11d759bdae95c9d8ec165c',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'info_dict': { 'info_dict': {
'id': '141007_ab18_10wochensommer_film', 'id': '221128_mein_fremdes_ich2_ab18',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer', 'title': 'Ab 18! - Mein fremdes Ich',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26', 'description': 'md5:cae0c0b27b7426d62ca0dda181738bf0',
'duration': 2660, 'duration': 2625.0,
'timestamp': 1608604200, 'thumbnail': 'https://www.3sat.de/assets/ab-18---mein-fremdes-ich-106~original?cb=1666081865812',
'upload_date': '20201222', 'episode': 'Ab 18! - Mein fremdes Ich',
'episode_id': 'POS_6225d1ca-a0d5-45e3-870b-e783ee6c8a3f',
'timestamp': 1695081600,
'upload_date': '20230919',
}, },
'skip': '410 Gone',
}, { }, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html', 'url': 'https://www.3sat.de/gesellschaft/37-grad-leben/aus-dem-leben-gerissen-102.html',
'md5': 'a903eaf8d1fd635bd3317cd2ad87ec84',
'info_dict': { 'info_dict': {
'id': '140913_sendung_schweizweit', 'id': '250323_0903_sendung_sgl',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Waidmannsheil', 'title': 'Plötzlich ohne dich',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817', 'description': 'md5:380cc10659289dd91510ad8fa717c66b',
'timestamp': 1410623100, 'duration': 1620.0,
'upload_date': '20140913', 'thumbnail': 'https://www.3sat.de/assets/37-grad-leben-106~original?cb=1645537156810',
'episode': 'Plötzlich ohne dich',
'episode_id': 'POS_faa7a93c-c0f2-4d51-823f-ce2ac3ee191b',
'timestamp': 1743162540,
'upload_date': '20250328',
}, },
'params': { }, {
'skip_download': True, # Video with chapters
'url': 'https://www.3sat.de/kultur/buchmesse/dein-buch-das-beste-von-der-leipziger-buchmesse-2025-teil-1-100.html',
'md5': '6b95790ce52e75f0d050adcdd2711ee6',
'info_dict': {
'id': '250330_dein_buch1_bum',
'ext': 'mp4',
'title': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1',
'description': 'md5:bae51bfc22f15563ce3acbf97d2e8844',
'duration': 5399.0,
'thumbnail': 'https://www.3sat.de/assets/buchmesse-kerkeling-100~original?cb=1743329640903',
'chapters': 'count:24',
'episode': 'dein buch - Das Beste von der Leipziger Buchmesse 2025 - Teil 1',
'episode_id': 'POS_1ef236cc-b390-401e-acd0-4fb4b04315fb',
'timestamp': 1743327000,
'upload_date': '20250330',
}, },
'skip': '404 Not Found',
}, { }, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html # Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html', 'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
@ -58,11 +83,42 @@ class DreiSatIE(ZDFBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player = self._search_json(
r'data-zdfplayer-jsb=(["\'])', webpage, 'player JSON', video_id)
player_url = player['content']
api_token = f'Bearer {player["apiToken"]}'
webpage = self._download_webpage(url, video_id, fatal=False) content = self._call_api(player_url, video_id, 'video metadata', api_token)
if webpage:
player = self._extract_player(webpage, url, fatal=False)
if player:
return self._extract_regular(url, player, video_id)
return self._extract_mobile(video_id) video_target = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = traverse_obj(video_target, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template'),
{str}, any, {require('ptmd path')}))
ptmd_url = self._expand_ptmd_template(player_url, ptmd_path)
aspect_ratio = self._parse_aspect_ratio(video_target.get('aspectRatio'))
info = self._extract_ptmd(ptmd_url, video_id, api_token, aspect_ratio)
return merge_dicts(info, {
**traverse_obj(content, {
'title': (('title', 'teaserHeadline'), {str}, any),
'episode': (('title', 'teaserHeadline'), {str}, any),
'description': (('leadParagraph', 'teasertext'), {str}, any),
'timestamp': ('editorialDate', {parse_iso8601}),
}),
**traverse_obj(video_target, {
'duration': ('duration', {int_or_none}),
'chapters': ('streamAnchorTag', {self._extract_chapters}),
}),
'thumbnails': self._extract_thumbnails(traverse_obj(content, ('teaserImageRef', 'layouts', {dict}))),
**traverse_obj(content, ('programmeItem', 0, 'http://zdf.de/rels/target', {
'series_id': ('http://zdf.de/rels/cmdm/series', 'seriesUuid', {str}),
'series': ('http://zdf.de/rels/cmdm/series', 'seriesTitle', {str}),
'season': ('http://zdf.de/rels/cmdm/season', 'seasonTitle', {str}),
'season_number': ('http://zdf.de/rels/cmdm/season', 'seasonNumber', {int_or_none}),
'season_id': ('http://zdf.de/rels/cmdm/season', 'seasonUuid', {str}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode_id': ('contentId', {str}),
})),
})

View File

@ -16,7 +16,6 @@ from ..utils import (
MEDIA_EXTENSIONS, MEDIA_EXTENSIONS,
ExtractorError, ExtractorError,
UnsupportedError, UnsupportedError,
base_url,
determine_ext, determine_ext,
determine_protocol, determine_protocol,
dict_get, dict_get,
@ -38,6 +37,7 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
unsmuggle_url, unsmuggle_url,
update_url,
update_url_query, update_url_query,
url_or_none, url_or_none,
urlhandle_detect_ext, urlhandle_detect_ext,
@ -2538,12 +2538,13 @@ class GenericIE(InfoExtractor):
return self.playlist_result( return self.playlist_result(
self._parse_xspf( self._parse_xspf(
doc, video_id, xspf_url=url, doc, video_id, xspf_url=url,
xspf_base_url=full_response.url), xspf_base_url=new_url),
video_id) video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles( info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles(
doc, doc,
mpd_base_url=base_url(full_response.url), # Do not use yt_dlp.utils.base_url here since it will raise on file:// URLs
mpd_base_url=update_url(new_url, query=None, fragment=None).rpartition('/')[0],
mpd_url=url) mpd_url=url)
info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None
self._extra_manifest_info(info_dict, url) self._extra_manifest_info(info_dict, url)

View File

@ -8,7 +8,7 @@ from ..utils.traversal import traverse_obj
class GetCourseRuPlayerIE(InfoExtractor): class GetCourseRuPlayerIE(InfoExtractor):
_VALID_URL = r'https?://player02\.getcourse\.ru/sign-player/?\?(?:[^#]+&)?json=[^#&]+' _VALID_URL = r'https?://(?:player02\.getcourse\.ru|cf-api-2\.vhcdn\.com)/sign-player/?\?(?:[^#]+&)?json=[^#&]+'
_EMBED_REGEX = [rf'<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL}[^\'"]*)'] _EMBED_REGEX = [rf'<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL}[^\'"]*)']
_TESTS = [{ _TESTS = [{
'url': 'http://player02.getcourse.ru/sign-player/?json=eyJ2aWRlb19oYXNoIjoiMTkwYmRmOTNmMWIyOTczNTMwOTg1M2E3YTE5ZTI0YjMiLCJ1c2VyX2lkIjozNTk1MjUxODMsInN1Yl9sb2dpbl91c2VyX2lkIjpudWxsLCJsZXNzb25faWQiOm51bGwsImlwIjoiNDYuMTQyLjE4Mi4yNDciLCJnY19ob3N0IjoiYWNhZGVteW1lbC5vbmxpbmUiLCJ0aW1lIjoxNzA1NDQ5NjQyLCJwYXlsb2FkIjoidV8zNTk1MjUxODMiLCJ1aV9sYW5ndWFnZSI6InJ1IiwiaXNfaGF2ZV9jdXN0b21fc3R5bGUiOnRydWV9&s=354ad2c993d95d5ac629e3133d6cefea&vh-static-feature=zigzag', 'url': 'http://player02.getcourse.ru/sign-player/?json=eyJ2aWRlb19oYXNoIjoiMTkwYmRmOTNmMWIyOTczNTMwOTg1M2E3YTE5ZTI0YjMiLCJ1c2VyX2lkIjozNTk1MjUxODMsInN1Yl9sb2dpbl91c2VyX2lkIjpudWxsLCJsZXNzb25faWQiOm51bGwsImlwIjoiNDYuMTQyLjE4Mi4yNDciLCJnY19ob3N0IjoiYWNhZGVteW1lbC5vbmxpbmUiLCJ0aW1lIjoxNzA1NDQ5NjQyLCJwYXlsb2FkIjoidV8zNTk1MjUxODMiLCJ1aV9sYW5ndWFnZSI6InJ1IiwiaXNfaGF2ZV9jdXN0b21fc3R5bGUiOnRydWV9&s=354ad2c993d95d5ac629e3133d6cefea&vh-static-feature=zigzag',
@ -20,6 +20,16 @@ class GetCourseRuPlayerIE(InfoExtractor):
'duration': 1693, 'duration': 1693,
}, },
'skip': 'JWT expired', 'skip': 'JWT expired',
}, {
'url': 'https://cf-api-2.vhcdn.com/sign-player/?json=example',
'info_dict': {
'id': '435735291',
'title': '8afd7c489952108e00f019590f3711f3',
'ext': 'mp4',
'thumbnail': 'https://preview-htz.vhcdn.com/preview/8afd7c489952108e00f019590f3711f3/preview.jpg?version=1682170973&host=vh-72',
'duration': 777,
},
'skip': 'JWT expired',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -168,7 +178,7 @@ class GetCourseRuIE(InfoExtractor):
playlist_id = self._search_regex( playlist_id = self._search_regex(
r'window\.(?:lessonId|gcsObjectId)\s*=\s*(\d+)', webpage, 'playlist id', default=display_id) r'window\.(?:lessonId|gcsObjectId)\s*=\s*(\d+)', webpage, 'playlist id', default=display_id)
title = self._og_search_title(webpage) or self._html_extract_title(webpage) title = self._og_search_title(webpage, default=None) or self._html_extract_title(webpage)
return self.playlist_from_matches( return self.playlist_from_matches(
re.findall(GetCourseRuPlayerIE._EMBED_REGEX[0], webpage), re.findall(GetCourseRuPlayerIE._EMBED_REGEX[0], webpage),

View File

@ -1,23 +1,33 @@
import functools import functools
import itertools
import math import math
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
InAdvancePagedList, InAdvancePagedList,
ISO639Utils,
OnDemandPagedList,
clean_html, clean_html,
int_or_none, int_or_none,
js_to_json,
make_archive_id, make_archive_id,
orderedSet,
smuggle_url, smuggle_url,
unified_strdate,
unified_timestamp,
unsmuggle_url, unsmuggle_url,
url_basename, url_basename,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin,
variadic,
) )
from ..utils.traversal import traverse_obj from ..utils.traversal import traverse_obj
class JioSaavnBaseIE(InfoExtractor): class JioSaavnBaseIE(InfoExtractor):
_URL_BASE_RE = r'https?://(?:www\.)?(?:jio)?saavn\.com'
_API_URL = 'https://www.jiosaavn.com/api.php' _API_URL = 'https://www.jiosaavn.com/api.php'
_VALID_BITRATES = {'16', '32', '64', '128', '320'} _VALID_BITRATES = {'16', '32', '64', '128', '320'}
@ -30,16 +40,20 @@ class JioSaavnBaseIE(InfoExtractor):
f'Valid bitrates are: {", ".join(sorted(self._VALID_BITRATES, key=int))}') f'Valid bitrates are: {", ".join(sorted(self._VALID_BITRATES, key=int))}')
return requested_bitrates return requested_bitrates
def _extract_formats(self, song_data): def _extract_formats(self, item_data):
# Show/episode JSON data has a slightly different structure than song JSON data
if media_url := traverse_obj(item_data, ('more_info', 'encrypted_media_url', {str})):
item_data.setdefault('encrypted_media_url', media_url)
for bitrate in self.requested_bitrates: for bitrate in self.requested_bitrates:
media_data = self._download_json( media_data = self._download_json(
self._API_URL, song_data['id'], self._API_URL, item_data['id'],
f'Downloading format info for {bitrate}', f'Downloading format info for {bitrate}',
fatal=False, data=urlencode_postdata({ fatal=False, data=urlencode_postdata({
'__call': 'song.generateAuthToken', '__call': 'song.generateAuthToken',
'_format': 'json', '_format': 'json',
'bitrate': bitrate, 'bitrate': bitrate,
'url': song_data['encrypted_media_url'], 'url': item_data['encrypted_media_url'],
})) }))
if not traverse_obj(media_data, ('auth_url', {url_or_none})): if not traverse_obj(media_data, ('auth_url', {url_or_none})):
self.report_warning(f'Unable to extract format info for {bitrate}') self.report_warning(f'Unable to extract format info for {bitrate}')
@ -53,24 +67,6 @@ class JioSaavnBaseIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
} }
def _extract_song(self, song_data, url=None):
info = traverse_obj(song_data, {
'id': ('id', {str}),
'title': ('song', {clean_html}),
'album': ('album', {clean_html}),
'thumbnail': ('image', {url_or_none}, {lambda x: re.sub(r'-\d+x\d+\.', '-500x500.', x)}),
'duration': ('duration', {int_or_none}),
'view_count': ('play_count', {int_or_none}),
'release_year': ('year', {int_or_none}),
'artists': ('primary_artists', {lambda x: x.split(', ') if x else None}),
'webpage_url': ('perma_url', {url_or_none}),
})
if webpage_url := info.get('webpage_url') or url:
info['display_id'] = url_basename(webpage_url)
info['_old_archive_ids'] = [make_archive_id(JioSaavnSongIE, info['display_id'])]
return info
def _call_api(self, type_, token, note='API', params={}): def _call_api(self, type_, token, note='API', params={}):
return self._download_json( return self._download_json(
self._API_URL, token, f'Downloading {note} JSON', f'Unable to download {note} JSON', self._API_URL, token, f'Downloading {note} JSON', f'Unable to download {note} JSON',
@ -84,19 +80,89 @@ class JioSaavnBaseIE(InfoExtractor):
**params, **params,
}) })
def _yield_songs(self, playlist_data): @staticmethod
for song_data in traverse_obj(playlist_data, ('songs', lambda _, v: v['id'] and v['perma_url'])): def _extract_song(song_data, url=None):
song_info = self._extract_song(song_data) info = traverse_obj(song_data, {
url = smuggle_url(song_info['webpage_url'], { 'id': ('id', {str}),
'id': song_data['id'], 'title': (('song', 'title'), {clean_html}, any),
'encrypted_media_url': song_data['encrypted_media_url'], 'album': ((None, 'more_info'), 'album', {clean_html}, any),
}) 'duration': ((None, 'more_info'), 'duration', {int_or_none}, any),
yield self.url_result(url, JioSaavnSongIE, url_transparent=True, **song_info) 'channel': ((None, 'more_info'), 'label', {str}, any),
'channel_id': ((None, 'more_info'), 'label_id', {str}, any),
'channel_url': ((None, 'more_info'), 'label_url', {urljoin('https://www.jiosaavn.com/')}, any),
'release_date': ((None, 'more_info'), 'release_date', {unified_strdate}, any),
'release_year': ('year', {int_or_none}),
'thumbnail': ('image', {url_or_none}, {lambda x: re.sub(r'-\d+x\d+\.', '-500x500.', x)}),
'view_count': ('play_count', {int_or_none}),
'language': ('language', {lambda x: ISO639Utils.short2long(x.casefold()) or 'und'}),
'webpage_url': ('perma_url', {url_or_none}),
'artists': ('more_info', 'artistMap', 'primary_artists', ..., 'name', {str}, filter, all),
})
if webpage_url := info.get('webpage_url') or url:
info['display_id'] = url_basename(webpage_url)
info['_old_archive_ids'] = [make_archive_id(JioSaavnSongIE, info['display_id'])]
if primary_artists := traverse_obj(song_data, ('primary_artists', {lambda x: x.split(', ') if x else None})):
info['artists'].extend(primary_artists)
if featured_artists := traverse_obj(song_data, ('featured_artists', {str}, filter)):
info['artists'].extend(featured_artists.split(', '))
info['artists'] = orderedSet(info['artists']) or None
return info
@staticmethod
def _extract_episode(episode_data, url=None):
info = JioSaavnBaseIE._extract_song(episode_data, url)
info.pop('_old_archive_ids', None)
info.update(traverse_obj(episode_data, {
'description': ('more_info', 'description', {str}),
'timestamp': ('more_info', 'release_time', {unified_timestamp}),
'series': ('more_info', 'show_title', {str}),
'series_id': ('more_info', 'show_id', {str}),
'season': ('more_info', 'season_title', {str}),
'season_number': ('more_info', 'season_no', {int_or_none}),
'season_id': ('more_info', 'season_id', {str}),
'episode_number': ('more_info', 'episode_number', {int_or_none}),
'cast': ('starring', {lambda x: x.split(', ') if x else None}),
}))
return info
def _extract_jiosaavn_result(self, url, endpoint, response_key, parse_func):
url, smuggled_data = unsmuggle_url(url)
data = traverse_obj(smuggled_data, ({
'id': ('id', {str}),
'encrypted_media_url': ('encrypted_media_url', {str}),
}))
if 'id' in data and 'encrypted_media_url' in data:
result = {'id': data['id']}
else:
# only extract metadata if this is not a url_transparent result
data = self._call_api(endpoint, self._match_id(url))[response_key][0]
result = parse_func(data, url)
result['formats'] = list(self._extract_formats(data))
return result
def _yield_items(self, playlist_data, keys=None, parse_func=None):
"""Subclasses using this method must set _ENTRY_IE"""
if parse_func is None:
parse_func = self._extract_song
for item_data in traverse_obj(playlist_data, (
*variadic(keys, (str, bytes, dict, set)), lambda _, v: v['id'] and v['perma_url'],
)):
info = parse_func(item_data)
url = smuggle_url(info['webpage_url'], traverse_obj(item_data, {
'id': ('id', {str}),
'encrypted_media_url': ((None, 'more_info'), 'encrypted_media_url', {str}, any),
}))
yield self.url_result(url, self._ENTRY_IE, url_transparent=True, **info)
class JioSaavnSongIE(JioSaavnBaseIE): class JioSaavnSongIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:song' IE_NAME = 'jiosaavn:song'
_VALID_URL = r'https?://(?:www\.)?(?:jiosaavn\.com/song/[^/?#]+/|saavn\.com/s/song/(?:[^/?#]+/){3})(?P<id>[^/?#]+)' _VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'(?:/song/[^/?#]+/|/s/song/(?:[^/?#]+/){3})(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.jiosaavn.com/song/leja-re/OQsEfQFVUXk', 'url': 'https://www.jiosaavn.com/song/leja-re/OQsEfQFVUXk',
'md5': '3b84396d15ed9e083c3106f1fa589c04', 'md5': '3b84396d15ed9e083c3106f1fa589c04',
@ -106,12 +172,38 @@ class JioSaavnSongIE(JioSaavnBaseIE):
'ext': 'm4a', 'ext': 'm4a',
'title': 'Leja Re', 'title': 'Leja Re',
'album': 'Leja Re', 'album': 'Leja Re',
'thumbnail': r're:https?://c.saavncdn.com/258/Leja-Re-Hindi-2018-20181124024539-500x500.jpg', 'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 205, 'duration': 205,
'view_count': int, 'view_count': int,
'release_year': 2018, 'release_year': 2018,
'artists': ['Sandesh Shandilya', 'Dhvani Bhanushali', 'Tanishk Bagchi'], 'artists': ['Sandesh Shandilya', 'Dhvani Bhanushali', 'Tanishk Bagchi'],
'_old_archive_ids': ['jiosaavnsong OQsEfQFVUXk'], '_old_archive_ids': ['jiosaavnsong OQsEfQFVUXk'],
'channel': 'T-Series',
'language': 'hin',
'channel_id': '34297',
'channel_url': 'https://www.jiosaavn.com/label/t-series-albums/6DLuXO3VoTo_',
'release_date': '20181124',
},
}, {
'url': 'https://www.jiosaavn.com/song/chuttamalle/P1FfWjZkQ0Q',
'md5': '96296c58d6ce488a417ef0728fd2d680',
'info_dict': {
'id': 'O94kBTtw',
'display_id': 'P1FfWjZkQ0Q',
'ext': 'm4a',
'title': 'Chuttamalle',
'album': 'Devara Part 1 - Telugu',
'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 222,
'view_count': int,
'release_year': 2024,
'artists': 'count:3',
'_old_archive_ids': ['jiosaavnsong P1FfWjZkQ0Q'],
'channel': 'T-Series',
'language': 'tel',
'channel_id': '34297',
'channel_url': 'https://www.jiosaavn.com/label/t-series-albums/6DLuXO3VoTo_',
'release_date': '20240926',
}, },
}, { }, {
'url': 'https://www.saavn.com/s/song/hindi/Saathiya/O-Humdum-Suniyo-Re/KAMiazoCblU', 'url': 'https://www.saavn.com/s/song/hindi/Saathiya/O-Humdum-Suniyo-Re/KAMiazoCblU',
@ -119,26 +211,51 @@ class JioSaavnSongIE(JioSaavnBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url) return self._extract_jiosaavn_result(url, 'song', 'songs', self._extract_song)
song_data = traverse_obj(smuggled_data, ({
'id': ('id', {str}),
'encrypted_media_url': ('encrypted_media_url', {str}),
}))
if 'id' in song_data and 'encrypted_media_url' in song_data:
result = {'id': song_data['id']}
else:
# only extract metadata if this is not a url_transparent result
song_data = self._call_api('song', self._match_id(url))['songs'][0]
result = self._extract_song(song_data, url)
result['formats'] = list(self._extract_formats(song_data)) class JioSaavnShowIE(JioSaavnBaseIE):
return result IE_NAME = 'jiosaavn:show'
_VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'/shows/[^/?#]+/(?P<id>[^/?#]{11,})/?(?:$|[?#])'
_TESTS = [{
'url': 'https://www.jiosaavn.com/shows/non-food-ways-to-boost-your-energy/XFMcKICOCgc_',
'md5': '0733cd254cfe74ef88bea1eaedcf1f4f',
'info_dict': {
'id': 'qqzh3RKZ',
'display_id': 'XFMcKICOCgc_',
'ext': 'mp3',
'title': 'Non-Food Ways To Boost Your Energy',
'description': 'md5:26e7129644b5c6aada32b8851c3997c8',
'episode': 'Episode 1',
'timestamp': 1640563200,
'series': 'Holistic Lifestyle With Neha Ranglani',
'series_id': '52397',
'season': 'Holistic Lifestyle With Neha Ranglani',
'season_number': 1,
'season_id': '61273',
'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 311,
'view_count': int,
'release_year': 2021,
'language': 'eng',
'channel': 'Saavn OG',
'channel_id': '1953876',
'episode_number': 1,
'upload_date': '20211227',
'release_date': '20211227',
},
}, {
'url': 'https://www.jiosaavn.com/shows/himesh-reshammiya/Kr8fmfSN4vo_',
'only_matching': True,
}]
def _real_extract(self, url):
return self._extract_jiosaavn_result(url, 'episode', 'episodes', self._extract_episode)
class JioSaavnAlbumIE(JioSaavnBaseIE): class JioSaavnAlbumIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:album' IE_NAME = 'jiosaavn:album'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/album/[^/?#]+/(?P<id>[^/?#]+)' _VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'/album/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.jiosaavn.com/album/96/buIOjYZDrNA_', 'url': 'https://www.jiosaavn.com/album/96/buIOjYZDrNA_',
'info_dict': { 'info_dict': {
@ -147,18 +264,19 @@ class JioSaavnAlbumIE(JioSaavnBaseIE):
}, },
'playlist_count': 10, 'playlist_count': 10,
}] }]
_ENTRY_IE = JioSaavnSongIE
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
album_data = self._call_api('album', display_id) album_data = self._call_api('album', display_id)
return self.playlist_result( return self.playlist_result(
self._yield_songs(album_data), display_id, traverse_obj(album_data, ('title', {str}))) self._yield_items(album_data, 'songs'), display_id, traverse_obj(album_data, ('title', {str})))
class JioSaavnPlaylistIE(JioSaavnBaseIE): class JioSaavnPlaylistIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:playlist' IE_NAME = 'jiosaavn:playlist'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/(?:s/playlist/(?:[^/?#]+/){2}|featured/[^/?#]+/)(?P<id>[^/?#]+)' _VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'/(?:s/playlist/(?:[^/?#]+/){2}|featured/[^/?#]+/)(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-english/LlJ8ZWT1ibN5084vKHRj2Q__', 'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-english/LlJ8ZWT1ibN5084vKHRj2Q__',
'info_dict': { 'info_dict': {
@ -172,15 +290,16 @@ class JioSaavnPlaylistIE(JioSaavnBaseIE):
'id': 'DVR,pFUOwyXqIp77B1JF,A__', 'id': 'DVR,pFUOwyXqIp77B1JF,A__',
'title': 'Mood Hindi', 'title': 'Mood Hindi',
}, },
'playlist_mincount': 801, 'playlist_mincount': 750,
}, { }, {
'url': 'https://www.jiosaavn.com/featured/taaza-tunes/Me5RridRfDk_', 'url': 'https://www.jiosaavn.com/featured/taaza-tunes/Me5RridRfDk_',
'info_dict': { 'info_dict': {
'id': 'Me5RridRfDk_', 'id': 'Me5RridRfDk_',
'title': 'Taaza Tunes', 'title': 'Taaza Tunes',
}, },
'playlist_mincount': 301, 'playlist_mincount': 50,
}] }]
_ENTRY_IE = JioSaavnSongIE
_PAGE_SIZE = 50 _PAGE_SIZE = 50
def _fetch_page(self, token, page): def _fetch_page(self, token, page):
@ -189,7 +308,7 @@ class JioSaavnPlaylistIE(JioSaavnBaseIE):
def _entries(self, token, first_page_data, page): def _entries(self, token, first_page_data, page):
page_data = first_page_data if not page else self._fetch_page(token, page + 1) page_data = first_page_data if not page else self._fetch_page(token, page + 1)
yield from self._yield_songs(page_data) yield from self._yield_items(page_data, 'songs')
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -199,3 +318,95 @@ class JioSaavnPlaylistIE(JioSaavnBaseIE):
return self.playlist_result(InAdvancePagedList( return self.playlist_result(InAdvancePagedList(
functools.partial(self._entries, display_id, playlist_data), functools.partial(self._entries, display_id, playlist_data),
total_pages, self._PAGE_SIZE), display_id, traverse_obj(playlist_data, ('listname', {str}))) total_pages, self._PAGE_SIZE), display_id, traverse_obj(playlist_data, ('listname', {str})))
class JioSaavnShowPlaylistIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:show:playlist'
_VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'/shows/(?P<show>[^#/?]+)/(?P<season>\d+)/[^/?#]+'
_TESTS = [{
'url': 'https://www.jiosaavn.com/shows/talking-music/1/PjReFP-Sguk_',
'info_dict': {
'id': 'talking-music-1',
'title': 'Talking Music',
},
'playlist_mincount': 11,
}]
_ENTRY_IE = JioSaavnShowIE
_PAGE_SIZE = 10
def _fetch_page(self, show_id, season_id, page):
return self._call_api('show', show_id, f'show page {page}', {
'p': page,
'__call': 'show.getAllEpisodes',
'show_id': show_id,
'season_number': season_id,
'api_version': '4',
'sort_order': 'desc',
})
def _entries(self, show_id, season_id, page):
page_data = self._fetch_page(show_id, season_id, page + 1)
yield from self._yield_items(page_data, keys=None, parse_func=self._extract_episode)
def _real_extract(self, url):
show_slug, season_id = self._match_valid_url(url).group('show', 'season')
playlist_id = f'{show_slug}-{season_id}'
webpage = self._download_webpage(url, playlist_id)
show_info = self._search_json(
r'window\.__INITIAL_DATA__\s*=', webpage, 'initial data',
playlist_id, transform_source=js_to_json)['showView']
show_id = show_info['current_id']
entries = OnDemandPagedList(functools.partial(self._entries, show_id, season_id), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id, traverse_obj(show_info, ('show', 'title', 'text', {str})))
class JioSaavnArtistIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:artist'
_VALID_URL = JioSaavnBaseIE._URL_BASE_RE + r'/artist/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/artist/krsna-songs/rYLBEve2z3U_',
'info_dict': {
'id': 'rYLBEve2z3U_',
'title': 'KR$NA',
},
'playlist_mincount': 38,
}, {
'url': 'https://www.jiosaavn.com/artist/sanam-puri-songs/SkNEv3qRhDE_',
'info_dict': {
'id': 'SkNEv3qRhDE_',
'title': 'Sanam Puri',
},
'playlist_mincount': 51,
}]
_ENTRY_IE = JioSaavnSongIE
_PAGE_SIZE = 50
def _fetch_page(self, artist_id, page):
return self._call_api('artist', artist_id, f'artist page {page + 1}', {
'p': page,
'n_song': self._PAGE_SIZE,
'n_album': self._PAGE_SIZE,
'sub_type': '',
'includeMetaTags': '',
'api_version': '4',
'category': 'alphabetical',
'sort_order': 'asc',
})
def _entries(self, artist_id, first_page):
for page in itertools.count():
playlist_data = first_page if not page else self._fetch_page(artist_id, page)
if not traverse_obj(playlist_data, ('topSongs', ..., {dict})):
break
yield from self._yield_items(playlist_data, 'topSongs')
def _real_extract(self, url):
artist_id = self._match_id(url)
first_page = self._fetch_page(artist_id, 0)
return self.playlist_result(
self._entries(artist_id, first_page), artist_id,
traverse_obj(first_page, ('name', {str})))

View File

@ -1,4 +1,5 @@
import itertools import itertools
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -9,12 +10,12 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
srt_subtitles_timecode, srt_subtitles_timecode,
traverse_obj,
try_get, try_get,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_elements, require, traverse_obj
class LinkedInBaseIE(InfoExtractor): class LinkedInBaseIE(InfoExtractor):
@ -82,7 +83,10 @@ class LinkedInLearningBaseIE(LinkedInBaseIE):
class LinkedInIE(LinkedInBaseIE): class LinkedInIE(LinkedInBaseIE):
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/posts/[^/?#]+-(?P<id>\d+)-\w{4}/?(?:[?#]|$)' _VALID_URL = [
r'https?://(?:www\.)?linkedin\.com/posts/[^/?#]+-(?P<id>\d+)-\w{4}/?(?:[?#]|$)',
r'https?://(?:www\.)?linkedin\.com/feed/update/urn:li:activity:(?P<id>\d+)',
]
_TESTS = [{ _TESTS = [{
'url': 'https://www.linkedin.com/posts/mishalkhawaja_sendinblueviews-toronto-digitalmarketing-ugcPost-6850898786781339649-mM20', 'url': 'https://www.linkedin.com/posts/mishalkhawaja_sendinblueviews-toronto-digitalmarketing-ugcPost-6850898786781339649-mM20',
'info_dict': { 'info_dict': {
@ -106,6 +110,9 @@ class LinkedInIE(LinkedInBaseIE):
'like_count': int, 'like_count': int,
'subtitles': 'mincount:1', 'subtitles': 'mincount:1',
}, },
}, {
'url': 'https://www.linkedin.com/feed/update/urn:li:activity:7016901149999955968/?utm_source=share&utm_medium=member_desktop',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -271,3 +278,110 @@ class LinkedInLearningCourseIE(LinkedInLearningBaseIE):
entries, course_slug, entries, course_slug,
course_data.get('title'), course_data.get('title'),
course_data.get('description')) course_data.get('description'))
class LinkedInEventsIE(LinkedInBaseIE):
IE_NAME = 'linkedin:events'
_VALID_URL = r'https?://(?:www\.)?linkedin\.com/events/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.linkedin.com/events/7084656651378536448/comments/',
'info_dict': {
'id': '7084656651378536448',
'ext': 'mp4',
'title': '#37 Aprende a hacer una entrevista en inglés para tu próximo trabajo remoto',
'description': '¡Agarra para anotar que se viene tremendo evento!',
'duration': 1765,
'timestamp': 1689113772,
'upload_date': '20230711',
'release_timestamp': 1689174012,
'release_date': '20230712',
'live_status': 'was_live',
},
}, {
'url': 'https://www.linkedin.com/events/27-02energyfreedombyenergyclub7295762520814874625/comments/',
'info_dict': {
'id': '27-02energyfreedombyenergyclub7295762520814874625',
'ext': 'mp4',
'title': '27.02 Energy Freedom by Energy Club',
'description': 'md5:1292e6f31df998914c293787a02c3b91',
'duration': 6420,
'timestamp': 1739445333,
'upload_date': '20250213',
'release_timestamp': 1740657620,
'release_date': '20250227',
'live_status': 'was_live',
},
}]
def _real_initialize(self):
if not self._get_cookies('https://www.linkedin.com/').get('li_at'):
self.raise_login_required()
def _real_extract(self, url):
event_id = self._match_id(url)
webpage = self._download_webpage(url, event_id)
base_data = traverse_obj(webpage, (
{find_elements(tag='code', attr='style', value='display: none')}, ..., {json.loads}, 'included', ...))
meta_data = traverse_obj(base_data, (
lambda _, v: v['$type'] == 'com.linkedin.voyager.dash.events.ProfessionalEvent', any)) or {}
live_status = {
'PAST': 'was_live',
'ONGOING': 'is_live',
'FUTURE': 'is_upcoming',
}.get(meta_data.get('lifecycleState'))
if live_status == 'is_upcoming':
player_data = {}
if event_time := traverse_obj(meta_data, ('displayEventTime', {str})):
message = f'This live event is scheduled for {event_time}'
else:
message = 'This live event has not yet started'
self.raise_no_formats(message, expected=True, video_id=event_id)
else:
# TODO: Add support for audio-only live events
player_data = traverse_obj(base_data, (
lambda _, v: v['$type'] == 'com.linkedin.videocontent.VideoPlayMetadata',
any, {require('video player data')}))
formats, subtitles = [], {}
for prog_fmts in traverse_obj(player_data, ('progressiveStreams', ..., {dict})):
for fmt_url in traverse_obj(prog_fmts, ('streamingLocations', ..., 'url', {url_or_none})):
formats.append({
'url': fmt_url,
**traverse_obj(prog_fmts, {
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
'tbr': ('bitRate', {int_or_none(scale=1000)}),
'filesize': ('size', {int_or_none}),
'ext': ('mediaType', {mimetype2ext}),
}),
})
for m3u8_url in traverse_obj(player_data, (
'adaptiveStreams', lambda _, v: v['protocol'] == 'HLS', 'masterPlaylists', ..., 'url', {url_or_none},
)):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, event_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
return {
'id': event_id,
'formats': formats,
'subtitles': subtitles,
'live_status': live_status,
**traverse_obj(meta_data, {
'title': ('name', {str}),
'description': ('description', 'text', {str}),
'timestamp': ('createdAt', {int_or_none(scale=1000)}),
# timeRange.start is available when the stream is_upcoming
'release_timestamp': ('timeRange', 'start', {int_or_none(scale=1000)}),
}),
**traverse_obj(player_data, {
'duration': ('duration', {int_or_none(scale=1000)}),
# liveStreamCreatedAt is only available when the stream is_live or was_live
'release_timestamp': ('liveStreamCreatedAt', {int_or_none(scale=1000)}),
}),
}

View File

@ -1,5 +1,9 @@
import json
import random
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none, url_or_none from ..utils import int_or_none, jwt_decode_hs256, try_call, url_or_none
from ..utils.traversal import require, traverse_obj from ..utils.traversal import require, traverse_obj
@ -55,13 +59,81 @@ class LocoIE(InfoExtractor):
'upload_date': '20250226', 'upload_date': '20250226',
'modified_date': '20250226', 'modified_date': '20250226',
}, },
}, {
# Requires video authorization
'url': 'https://loco.com/stream/ac854641-ae0f-497c-a8ea-4195f6d8cc53',
'md5': '0513edf85c1e65c9521f555f665387d5',
'info_dict': {
'id': 'ac854641-ae0f-497c-a8ea-4195f6d8cc53',
'ext': 'mp4',
'title': 'DUAS CONTAS DESAFIANTE, RUSH TOP 1 NO BRASIL!',
'description': 'md5:aa77818edd6fe00dd4b6be75cba5f826',
'uploader_id': '7Y9JNAZC3Q',
'channel': 'ayellol',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'duration': 1229,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/f5aa678b-6d04-45d9-a89a-859af0a8028f.jpg',
'tags': ['Gameplay', 'Carry'],
'series': 'League of Legends',
'timestamp': 1741182253,
'upload_date': '20250305',
'modified_timestamp': 1741182419,
'modified_date': '20250305',
},
}] }]
# From _app.js
_CLIENT_ID = 'TlwKp1zmF6eKFpcisn3FyR18WkhcPkZtzwPVEEC3'
_CLIENT_SECRET = 'Kp7tYlUN7LXvtcSpwYvIitgYcLparbtsQSe5AdyyCdiEJBP53Vt9J8eB4AsLdChIpcO2BM19RA3HsGtqDJFjWmwoonvMSG3ZQmnS8x1YIM8yl82xMXZGbE3NKiqmgBVU'
def _is_jwt_expired(self, token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _get_access_token(self, video_id):
access_token = try_call(lambda: self._get_cookies('https://loco.com')['access_token'].value)
if access_token and not self._is_jwt_expired(access_token):
return access_token
access_token = traverse_obj(self._download_json(
'https://api.getloconow.com/v3/user/device_profile/', video_id,
'Downloading access token', fatal=False, data=json.dumps({
'platform': 7,
'client_id': self._CLIENT_ID,
'client_secret': self._CLIENT_SECRET,
'model': 'Mozilla',
'os_name': 'Win32',
'os_ver': '5.0 (Windows)',
'app_ver': '5.0 (Windows)',
}).encode(), headers={
'Content-Type': 'application/json;charset=utf-8',
'DEVICE-ID': ''.join(random.choices('0123456789abcdef', k=32)) + 'live',
'X-APP-LANG': 'en',
'X-APP-LOCALE': 'en-US',
'X-CLIENT-ID': self._CLIENT_ID,
'X-CLIENT-SECRET': self._CLIENT_SECRET,
'X-PLATFORM': '7',
}), 'access_token')
if access_token and not self._is_jwt_expired(access_token):
self._set_cookie('.loco.com', 'access_token', access_token)
return access_token
def _real_extract(self, url): def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id') video_type, video_id = self._match_valid_url(url).group('type', 'id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
stream = traverse_obj(self._search_nextjs_data(webpage, video_id), ( stream = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', ('liveStreamData', 'stream'), {dict}, any, {require('stream info')})) 'props', 'pageProps', ('liveStreamData', 'stream', 'liveStream'), {dict}, any, {require('stream info')}))
if access_token := self._get_access_token(video_id):
self._request_webpage(
'https://drm.loco.com/v1/streams/playback/', video_id,
'Downloading video authorization', fatal=False, headers={
'authorization': access_token,
}, query={
'stream_uid': stream['uid'],
})
return { return {
'formats': self._extract_m3u8_formats(stream['conf']['hls'], video_id), 'formats': self._extract_m3u8_formats(stream['conf']['hls'], video_id),

View File

@ -1,31 +1,38 @@
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
extract_attributes,
int_or_none, int_or_none,
str_to_int, join_nonempty,
parse_count,
parse_duration,
parse_iso8601,
url_or_none, url_or_none,
urlencode_postdata,
) )
from ..utils.traversal import traverse_obj
class ManyVidsIE(InfoExtractor): class ManyVidsIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)' _VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# preview video # preview video
'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/', 'url': 'https://www.manyvids.com/Video/530341/mv-tips-tricks',
'md5': '03f11bb21c52dd12a05be21a5c7dcc97', 'md5': '738dc723f7735ee9602f7ea352a6d058',
'info_dict': { 'info_dict': {
'id': '133957', 'id': '530341-preview',
'ext': 'mp4', 'ext': 'mp4',
'title': 'everthing about me (Preview)', 'title': 'MV Tips & Tricks (Preview)',
'uploader': 'ellyxxix', 'description': r're:I will take you on a tour around .{1313}$',
'thumbnail': r're:https://cdn5\.manyvids\.com/php_uploads/video_images/DestinyDiaz/.+\.jpg',
'uploader': 'DestinyDiaz',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'release_timestamp': 1508419904,
'tags': ['AdultSchool', 'BBW', 'SFW', 'TeacherFetish'],
'release_date': '20171019',
'duration': 3167.0,
}, },
'expected_warnings': ['Only extracting preview'],
}, { }, {
# full video # full video
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/', 'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
@ -34,129 +41,68 @@ class ManyVidsIE(InfoExtractor):
'id': '935718', 'id': '935718',
'ext': 'mp4', 'ext': 'mp4',
'title': 'MY FACE REVEAL', 'title': 'MY FACE REVEAL',
'description': 'md5:ec5901d41808b3746fed90face161612', 'description': r're:Today is the day!! I am finally taking off my mask .{445}$',
'thumbnail': r're:https://ods\.manyvids\.com/1001061960/3aa5397f2a723ec4597e344df66ab845/screenshots/.+\.jpg',
'uploader': 'Sarah Calanthe', 'uploader': 'Sarah Calanthe',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'release_date': '20181110',
'tags': ['EyeContact', 'Interviews', 'MaskFetish', 'MouthFetish', 'Redhead'],
'release_timestamp': 1541851200,
'duration': 224.0,
}, },
}] }]
_API_BASE = 'https://www.manyvids.com/bff/store/video'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(f'{self._API_BASE}/{video_id}/private', video_id)['data']
formats, preview_only = [], True
real_url = f'https://www.manyvids.com/video/{video_id}/gtm.js' for format_id, path in [
try: ('preview', ['teaser', 'filepath']),
webpage = self._download_webpage(real_url, video_id) ('transcoded', ['transcodedFilepath']),
except Exception: ('filepath', ['filepath']),
# probably useless fallback ]:
webpage = self._download_webpage(url, video_id) format_url = traverse_obj(video_data, (*path, {url_or_none}))
if not format_url:
info = self._search_regex(
r'''(<div\b[^>]*\bid\s*=\s*(['"])pageMetaDetails\2[^>]*>)''',
webpage, 'meta details', default='')
info = extract_attributes(info)
player = self._search_regex(
r'''(<div\b[^>]*\bid\s*=\s*(['"])rmpPlayerStream\2[^>]*>)''',
webpage, 'player details', default='')
player = extract_attributes(player)
video_urls_and_ids = (
(info.get('data-meta-video'), 'video'),
(player.get('data-video-transcoded'), 'transcoded'),
(player.get('data-video-filepath'), 'filepath'),
(self._og_search_video_url(webpage, secure=False, default=None), 'og_video'),
)
def txt_or_none(s, default=None):
return (s.strip() or default) if isinstance(s, str) else default
uploader = txt_or_none(info.get('data-meta-author'))
def mung_title(s):
if uploader:
s = re.sub(rf'^\s*{re.escape(uploader)}\s+[|-]', '', s)
return txt_or_none(s)
title = (
mung_title(info.get('data-meta-title'))
or self._html_search_regex(
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
webpage, 'title', default=None)
or self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True))
title = re.sub(r'\s*[|-]\s+ManyVids\s*$', '', title) or title
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
title += ' (Preview)'
mv_token = self._search_regex(
r'data-mvtoken=(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'mv token', default=None, group='value')
if mv_token:
# Sets some cookies
self._download_webpage(
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
video_id, note='Setting format cookies', fatal=False,
data=urlencode_postdata({
'mvtoken': mv_token,
'vid': video_id,
}), headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
})
formats = []
for v_url, fmt in video_urls_and_ids:
v_url = url_or_none(v_url)
if not v_url:
continue continue
if determine_ext(v_url) == 'm3u8': if determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id=format_id))
v_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls'))
else: else:
formats.append({ formats.append({
'url': v_url, 'url': format_url,
'format_id': fmt, 'format_id': format_id,
'preference': -10 if format_id == 'preview' else None,
'quality': 10 if format_id == 'filepath' else None,
'height': int_or_none(
self._search_regex(r'_(\d{2,3}[02468])_', format_url, 'height', default=None)),
}) })
if format_id != 'preview':
preview_only = False
self._remove_duplicate_formats(formats) metadata = traverse_obj(
self._download_json(f'{self._API_BASE}/{video_id}', video_id, fatal=False), 'data')
title = traverse_obj(metadata, ('title', {clean_html}))
for f in formats: if preview_only:
if f.get('height') is None: title = join_nonempty(title, '(Preview)', delim=' ')
f['height'] = int_or_none( video_id += '-preview'
self._search_regex(r'_(\d{2,3}[02468])_', f['url'], 'video height', default=None)) self.report_warning(
if '/preview/' in f['url']: f'Only extracting preview. Video may be paid or subscription only. {self._login_hint()}')
f['format_id'] = '_'.join(filter(None, (f.get('format_id'), 'preview')))
f['preference'] = -10
if 'transcoded' in f['format_id']:
f['preference'] = f.get('preference', -1) - 1
def get_likes():
likes = self._search_regex(
rf'''(<a\b[^>]*\bdata-id\s*=\s*(['"]){video_id}\2[^>]*>)''',
webpage, 'likes', default='')
likes = extract_attributes(likes)
return int_or_none(likes.get('data-likes'))
def get_views():
return str_to_int(self._html_search_regex(
r'''(?s)<span\b[^>]*\bclass\s*=["']views-wrapper\b[^>]+>.+?<span\b[^>]+>\s*(\d[\d,.]*)\s*</span>''',
webpage, 'view count', default=None))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'description': txt_or_none(info.get('data-meta-description')), **traverse_obj(metadata, {
'uploader': txt_or_none(info.get('data-meta-author')), 'description': ('description', {clean_html}),
'thumbnail': ( 'uploader': ('model', 'displayName', {clean_html}),
url_or_none(info.get('data-meta-image')) 'thumbnail': (('screenshot', 'thumbnail'), {url_or_none}, any),
or url_or_none(player.get('data-video-screenshot'))), 'view_count': ('views', {parse_count}),
'view_count': get_views(), 'like_count': ('likes', {parse_count}),
'like_count': get_likes(), 'release_timestamp': ('launchDate', {parse_iso8601}),
'duration': ('videoDuration', {parse_duration}),
'tags': ('tagList', ..., 'label', {str}, filter, all, filter),
}),
} }

View File

@ -365,13 +365,15 @@ mutation initPlaybackSession(
'All videos are only available to registered users', method='password') 'All videos are only available to registered users', method='password')
def _set_device_id(self, username): def _set_device_id(self, username):
if not self._device_id: if self._device_id:
self._device_id = self.cache.load( return
self._NETRC_MACHINE, 'device_ids', default={}).get(username) device_id_cache = self.cache.load(self._NETRC_MACHINE, 'device_ids', default={})
self._device_id = device_id_cache.get(username)
if self._device_id: if self._device_id:
return return
self._device_id = str(uuid.uuid4()) self._device_id = str(uuid.uuid4())
self.cache.store(self._NETRC_MACHINE, 'device_ids', {username: self._device_id}) device_id_cache[username] = self._device_id
self.cache.store(self._NETRC_MACHINE, 'device_ids', device_id_cache)
def _perform_login(self, username, password): def _perform_login(self, username, password):
try: try:

View File

@ -16,7 +16,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty, parse_bitrate,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_qs, parse_qs,
@ -24,8 +24,6 @@ from ..utils import (
qualities, qualities,
remove_start, remove_start,
str_or_none, str_or_none,
traverse_obj,
try_get,
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
@ -34,13 +32,70 @@ from ..utils import (
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, traverse_obj
class NiconicoIE(InfoExtractor): class NiconicoBaseIE(InfoExtractor):
_GEO_BYPASS = False
_GEO_COUNTRIES = ['JP']
_LOGIN_BASE = 'https://account.nicovideo.jp'
_NETRC_MACHINE = 'niconico'
@property
def is_logged_in(self):
return bool(self._get_cookies('https://www.nicovideo.jp').get('user_session'))
def _raise_login_error(self, message, expected=True):
raise ExtractorError(f'Unable to login: {message}', expected=expected)
def _perform_login(self, username, password):
if self.is_logged_in:
return
self._request_webpage(
f'{self._LOGIN_BASE}/login', None, 'Requesting session cookies')
webpage = self._download_webpage(
f'{self._LOGIN_BASE}/login/redirector', None,
'Logging in', 'Unable to log in', headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': f'{self._LOGIN_BASE}/login',
}, data=urlencode_postdata({
'mail_tel': username,
'password': password,
}))
if self.is_logged_in:
return
elif err_msg := traverse_obj(webpage, (
{find_element(cls='notice error')}, {find_element(cls='notice__text')}, {clean_html},
)):
self._raise_login_error(err_msg or 'Invalid username or password')
elif 'oneTimePw' in webpage:
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage, 'post url', group='url')
mfa, urlh = self._download_webpage_handle(
urljoin(self._LOGIN_BASE, post_url), None,
'Performing MFA', 'Unable to complete MFA', headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, data=urlencode_postdata({
'otp': self._get_tfa_info('6 digit number shown on app'),
}))
if self.is_logged_in:
return
elif 'error-code' in parse_qs(urlh.url):
err_msg = traverse_obj(mfa, ({find_element(cls='pageMainMsg')}, {clean_html}))
self._raise_login_error(err_msg or 'MFA session expired')
elif 'formError' in mfa:
err_msg = traverse_obj(mfa, (
{find_element(cls='formError')}, {find_element(tag='div')}, {clean_html}))
self._raise_login_error(err_msg or 'MFA challenge failed')
self._raise_login_error('Unexpected login error', expected=False)
class NiconicoIE(NiconicoBaseIE):
IE_NAME = 'niconico' IE_NAME = 'niconico'
IE_DESC = 'ニコニコ動画' IE_DESC = 'ニコニコ動画'
_GEO_COUNTRIES = ['JP']
_GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
'url': 'http://www.nicovideo.jp/watch/sm22312215', 'url': 'http://www.nicovideo.jp/watch/sm22312215',
@ -180,229 +235,6 @@ class NiconicoIE(InfoExtractor):
}] }]
_VALID_URL = r'https?://(?:(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch|nico\.ms)/(?P<id>(?:[a-z]{2})?[0-9]+)' _VALID_URL = r'https?://(?:(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch|nico\.ms)/(?P<id>(?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico'
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0',
'X-Niconico-Language': 'en-us',
'Referer': 'https://www.nicovideo.jp/',
'Origin': 'https://www.nicovideo.jp',
}
def _perform_login(self, username, password):
login_ok = True
login_form_strs = {
'mail_tel': username,
'password': password,
}
self._request_webpage(
'https://account.nicovideo.jp/login', None,
note='Acquiring Login session')
page = self._download_webpage(
'https://account.nicovideo.jp/login/redirector?show_button_twitter=1&site=niconico&show_button_facebook=1', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(login_form_strs),
headers={
'Referer': 'https://account.nicovideo.jp/login',
'Content-Type': 'application/x-www-form-urlencoded',
})
if 'oneTimePw' in page:
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', page, 'post url', group='url')
page = self._download_webpage(
urljoin('https://account.nicovideo.jp', post_url), None,
note='Performing MFA', errnote='Unable to complete MFA',
data=urlencode_postdata({
'otp': self._get_tfa_info('6 digits code'),
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
if 'oneTimePw' in page or 'formError' in page:
err_msg = self._html_search_regex(
r'formError["\']+>(.*?)</div>', page, 'form_error',
default='There\'s an error but the message can\'t be parsed.',
flags=re.DOTALL)
self.report_warning(f'Unable to log in: MFA challenge failed, "{err_msg}"')
return False
login_ok = 'class="notice error"' not in page
if not login_ok:
self.report_warning('Unable to log in: bad username or password')
return login_ok
def _get_heartbeat_info(self, info_dict):
video_id, video_src_id, audio_src_id = info_dict['url'].split(':')[1].split('/')
dmc_protocol = info_dict['expected_protocol']
api_data = (
info_dict.get('_api_data')
or self._parse_json(
self._html_search_regex(
'data-api-data="([^"]+)"',
self._download_webpage('https://www.nicovideo.jp/watch/' + video_id, video_id),
'API data', default='{}'),
video_id))
session_api_data = try_get(api_data, lambda x: x['media']['delivery']['movie']['session'])
session_api_endpoint = try_get(session_api_data, lambda x: x['urls'][0])
def ping():
tracking_id = traverse_obj(api_data, ('media', 'delivery', 'trackingId'))
if tracking_id:
tracking_url = update_url_query('https://nvapi.nicovideo.jp/v1/2ab0cbaa/watch', {'t': tracking_id})
watch_request_response = self._download_json(
tracking_url, video_id,
note='Acquiring permission for downloading video', fatal=False,
headers=self._API_HEADERS)
if traverse_obj(watch_request_response, ('meta', 'status')) != 200:
self.report_warning('Failed to acquire permission for playing video. Video download may fail.')
yesno = lambda x: 'yes' if x else 'no'
if dmc_protocol == 'http':
protocol = 'http'
protocol_parameters = {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_data['urls'][0]['isSsl']),
'use_well_known_port': yesno(session_api_data['urls'][0]['isWellKnownPort']),
},
}
elif dmc_protocol == 'hls':
protocol = 'm3u8'
segment_duration = try_get(self._configuration_arg('segment_duration'), lambda x: int(x[0])) or 6000
parsed_token = self._parse_json(session_api_data['token'], video_id)
encryption = traverse_obj(api_data, ('media', 'delivery', 'encryption'))
protocol_parameters = {
'hls_parameters': {
'segment_duration': segment_duration,
'transfer_preset': '',
'use_ssl': yesno(session_api_data['urls'][0]['isSsl']),
'use_well_known_port': yesno(session_api_data['urls'][0]['isWellKnownPort']),
},
}
if 'hls_encryption' in parsed_token and encryption:
protocol_parameters['hls_parameters']['encryption'] = {
parsed_token['hls_encryption']: {
'encrypted_key': encryption['encryptedKey'],
'key_uri': encryption['keyUri'],
},
}
else:
protocol = 'm3u8_native'
else:
raise ExtractorError(f'Unsupported DMC protocol: {dmc_protocol}')
session_response = self._download_json(
session_api_endpoint['url'], video_id,
query={'_format': 'json'},
headers={'Content-Type': 'application/json'},
note='Downloading JSON metadata for {}'.format(info_dict['format_id']),
data=json.dumps({
'session': {
'client_info': {
'player_id': session_api_data.get('playerId'),
},
'content_auth': {
'auth_type': try_get(session_api_data, lambda x: x['authTypes'][session_api_data['protocols'][0]]),
'content_key_timeout': session_api_data.get('contentKeyTimeout'),
'service_id': 'nicovideo',
'service_user_id': session_api_data.get('serviceUserId'),
},
'content_id': session_api_data.get('contentId'),
'content_src_id_sets': [{
'content_src_ids': [{
'src_id_to_mux': {
'audio_src_ids': [audio_src_id],
'video_src_ids': [video_src_id],
},
}],
}],
'content_type': 'movie',
'content_uri': '',
'keep_method': {
'heartbeat': {
'lifetime': session_api_data.get('heartbeatLifetime'),
},
},
'priority': session_api_data['priority'],
'protocol': {
'name': 'http',
'parameters': {
'http_parameters': {
'parameters': protocol_parameters,
},
},
},
'recipe_id': session_api_data.get('recipeId'),
'session_operation_auth': {
'session_operation_auth_by_signature': {
'signature': session_api_data.get('signature'),
'token': session_api_data.get('token'),
},
},
'timing_constraint': 'unlimited',
},
}).encode())
info_dict['url'] = session_response['data']['session']['content_uri']
info_dict['protocol'] = protocol
# get heartbeat info
heartbeat_info_dict = {
'url': session_api_endpoint['url'] + '/' + session_response['data']['session']['id'] + '?_format=json&_method=PUT',
'data': json.dumps(session_response['data']),
# interval, convert milliseconds to seconds, then halve to make a buffer.
'interval': float_or_none(session_api_data.get('heartbeatLifetime'), scale=3000),
'ping': ping,
}
return info_dict, heartbeat_info_dict
def _extract_format_for_quality(self, video_id, audio_quality, video_quality, dmc_protocol):
if not audio_quality.get('isAvailable') or not video_quality.get('isAvailable'):
return None
format_id = '-'.join(
[remove_start(s['id'], 'archive_') for s in (video_quality, audio_quality)] + [dmc_protocol])
vid_qual_label = traverse_obj(video_quality, ('metadata', 'label'))
return {
'url': 'niconico_dmc:{}/{}/{}'.format(video_id, video_quality['id'], audio_quality['id']),
'format_id': format_id,
'format_note': join_nonempty('DMC', vid_qual_label, dmc_protocol.upper(), delim=' '),
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
'acodec': 'aac',
'vcodec': 'h264',
**traverse_obj(audio_quality, ('metadata', {
'abr': ('bitrate', {float_or_none(scale=1000)}),
'asr': ('samplingRate', {int_or_none}),
})),
**traverse_obj(video_quality, ('metadata', {
'vbr': ('bitrate', {float_or_none(scale=1000)}),
'height': ('resolution', 'height', {int_or_none}),
'width': ('resolution', 'width', {int_or_none}),
})),
'quality': -2 if 'low' in video_quality['id'] else None,
'protocol': 'niconico_dmc',
'expected_protocol': dmc_protocol, # XXX: This is not a documented field
'http_headers': {
'Origin': 'https://www.nicovideo.jp',
'Referer': 'https://www.nicovideo.jp/watch/' + video_id,
},
}
def _yield_dmc_formats(self, api_data, video_id):
dmc_data = traverse_obj(api_data, ('media', 'delivery', 'movie'))
audios = traverse_obj(dmc_data, ('audios', ..., {dict}))
videos = traverse_obj(dmc_data, ('videos', ..., {dict}))
protocols = traverse_obj(dmc_data, ('session', 'protocols', ..., {str}))
if not all((audios, videos, protocols)):
return
for audio_quality, video_quality, protocol in itertools.product(audios, videos, protocols):
if fmt := self._extract_format_for_quality(video_id, audio_quality, video_quality, protocol):
yield fmt
def _yield_dms_formats(self, api_data, video_id): def _yield_dms_formats(self, api_data, video_id):
fmt_filter = lambda _, v: v['isAvailable'] and v['id'] fmt_filter = lambda _, v: v['isAvailable'] and v['id']
@ -485,8 +317,8 @@ class NiconicoIE(InfoExtractor):
'needs_premium': ('isPremium', {bool}), 'needs_premium': ('isPremium', {bool}),
'needs_subscription': ('isAdmission', {bool}), 'needs_subscription': ('isAdmission', {bool}),
})) or {'needs_auth': True})) })) or {'needs_auth': True}))
formats = [*self._yield_dmc_formats(api_data, video_id),
*self._yield_dms_formats(api_data, video_id)] formats = list(self._yield_dms_formats(api_data, video_id))
if not formats: if not formats:
fail_msg = clean_html(self._html_search_regex( fail_msg = clean_html(self._html_search_regex(
r'<p[^>]+\bclass="fail-message"[^>]*>(?P<msg>.+?)</p>', r'<p[^>]+\bclass="fail-message"[^>]*>(?P<msg>.+?)</p>',
@ -921,7 +753,7 @@ class NiconicoUserIE(InfoExtractor):
return self.playlist_result(self._entries(list_id), list_id) return self.playlist_result(self._entries(list_id), list_id)
class NiconicoLiveIE(InfoExtractor): class NiconicoLiveIE(NiconicoBaseIE):
IE_NAME = 'niconico:live' IE_NAME = 'niconico:live'
IE_DESC = 'ニコニコ生放送' IE_DESC = 'ニコニコ生放送'
_VALID_URL = r'https?://(?:sp\.)?live2?\.nicovideo\.jp/(?:watch|gate)/(?P<id>lv\d+)' _VALID_URL = r'https?://(?:sp\.)?live2?\.nicovideo\.jp/(?:watch|gate)/(?P<id>lv\d+)'
@ -953,8 +785,6 @@ class NiconicoLiveIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_KNOWN_LATENCY = ('high', 'low')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(f'https://live.nicovideo.jp/watch/{video_id}', video_id) webpage, urlh = self._download_webpage_handle(f'https://live.nicovideo.jp/watch/{video_id}', video_id)
@ -970,22 +800,19 @@ class NiconicoLiveIE(InfoExtractor):
}) })
hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.') hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.')
latency = try_get(self._configuration_arg('latency'), lambda x: x[0])
if latency not in self._KNOWN_LATENCY:
latency = 'high'
ws = self._request_webpage( ws = self._request_webpage(
Request(ws_url, headers={'Origin': f'https://{hostname}'}), Request(ws_url, headers={'Origin': f'https://{hostname}'}),
video_id=video_id, note='Connecting to WebSocket server') video_id=video_id, note='Connecting to WebSocket server')
self.write_debug('[debug] Sending HLS server request') self.write_debug('Sending HLS server request')
ws.send(json.dumps({ ws.send(json.dumps({
'type': 'startWatching', 'type': 'startWatching',
'data': { 'data': {
'stream': { 'stream': {
'quality': 'abr', 'quality': 'abr',
'protocol': 'hls+fmp4', 'protocol': 'hls',
'latency': latency, 'latency': 'high',
'accessRightMethod': 'single_cookie', 'accessRightMethod': 'single_cookie',
'chasePlay': False, 'chasePlay': False,
}, },
@ -1049,18 +876,29 @@ class NiconicoLiveIE(InfoExtractor):
for cookie in cookies: for cookie in cookies:
self._set_cookie( self._set_cookie(
cookie['domain'], cookie['name'], cookie['value'], cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie['expires']), path=cookie['path'], secure=cookie['secure']) expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure'])
fmt_common = {
'live_latency': 'high',
'origin': hostname,
'protocol': 'niconico_live',
'video_id': video_id,
'ws': ws,
}
q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr'
a_map = {96: 'audio_low', 192: 'audio_high'}
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True) formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt, q in zip(formats, reversed(qualities[1:])): for fmt in formats:
fmt.update({ if fmt.get('acodec') == 'none':
'format_id': q, fmt['format_id'] = next(q_iter, fmt['format_id'])
'protocol': 'niconico_live', elif fmt.get('vcodec') == 'none':
'ws': ws, abr = parse_bitrate(fmt['url'].lower())
'video_id': video_id, fmt.update({
'live_latency': latency, 'abr': abr,
'origin': hostname, 'format_id': a_map.get(abr, fmt['format_id']),
}) })
fmt.update(fmt_common)
return { return {
'id': video_id, 'id': video_id,

View File

@ -181,6 +181,7 @@ class NYTimesArticleIE(NYTimesBaseIE):
'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg', 'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg',
'duration': 119.0, 'duration': 119.0,
}, },
'skip': 'HTTP Error 500: Internal Server Error',
}, { }, {
# article with audio and no video # article with audio and no video
'url': 'https://www.nytimes.com/2023/09/29/health/mosquitoes-genetic-engineering.html', 'url': 'https://www.nytimes.com/2023/09/29/health/mosquitoes-genetic-engineering.html',
@ -190,13 +191,14 @@ class NYTimesArticleIE(NYTimesBaseIE):
'ext': 'mp3', 'ext': 'mp3',
'title': 'The Gamble: Can Genetically Modified Mosquitoes End Disease?', 'title': 'The Gamble: Can Genetically Modified Mosquitoes End Disease?',
'description': 'md5:9ff8b47acbaf7f3ca8c732f5c815be2e', 'description': 'md5:9ff8b47acbaf7f3ca8c732f5c815be2e',
'timestamp': 1695960700, 'timestamp': 1696008129,
'upload_date': '20230929', 'upload_date': '20230929',
'creator': 'Stephanie Nolen, Natalija Gormalova', 'creators': ['Stephanie Nolen', 'Natalija Gormalova'],
'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg', 'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg',
'duration': 1322, 'duration': 1322,
}, },
}, { }, {
# lede_media_block already has sourceId
'url': 'https://www.nytimes.com/2023/11/29/business/dealbook/kamala-harris-biden-voters.html', 'url': 'https://www.nytimes.com/2023/11/29/business/dealbook/kamala-harris-biden-voters.html',
'md5': '3eb5ddb1d6f86254fe4f233826778737', 'md5': '3eb5ddb1d6f86254fe4f233826778737',
'info_dict': { 'info_dict': {
@ -207,7 +209,7 @@ class NYTimesArticleIE(NYTimesBaseIE):
'timestamp': 1701290997, 'timestamp': 1701290997,
'upload_date': '20231129', 'upload_date': '20231129',
'uploader': 'By The New York Times', 'uploader': 'By The New York Times',
'creator': 'Katie Rogers', 'creators': ['Katie Rogers'],
'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg', 'thumbnail': r're:https?://\w+\.nyt.com/images/.*\.jpg',
'duration': 97.631, 'duration': 97.631,
}, },
@ -222,10 +224,22 @@ class NYTimesArticleIE(NYTimesBaseIE):
'title': 'Drunk and Asleep on the Job: Air Traffic Controllers Pushed to the Brink', 'title': 'Drunk and Asleep on the Job: Air Traffic Controllers Pushed to the Brink',
'description': 'md5:549e5a5e935bf7d048be53ba3d2c863d', 'description': 'md5:549e5a5e935bf7d048be53ba3d2c863d',
'upload_date': '20231202', 'upload_date': '20231202',
'creator': 'Emily Steel, Sydney Ember', 'creators': ['Emily Steel', 'Sydney Ember'],
'timestamp': 1701511264, 'timestamp': 1701511264,
}, },
'playlist_count': 3, 'playlist_count': 3,
}, {
# lede_media_block does not have sourceId
'url': 'https://www.nytimes.com/2025/04/30/well/move/hip-mobility-routine.html',
'info_dict': {
'id': 'hip-mobility-routine',
'title': 'Tight Hips? These Moves Can Help.',
'description': 'Sitting all day is hard on your hips. Try this simple routine for better mobility.',
'creators': ['Alyssa Ages', 'Theodore Tae'],
'timestamp': 1746003629,
'upload_date': '20250430',
},
'playlist_count': 7,
}, { }, {
'url': 'https://www.nytimes.com/2023/12/02/business/media/netflix-squid-game-challenge.html', 'url': 'https://www.nytimes.com/2023/12/02/business/media/netflix-squid-game-challenge.html',
'only_matching': True, 'only_matching': True,
@ -256,14 +270,18 @@ class NYTimesArticleIE(NYTimesBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id, impersonate=True)
art_json = self._search_json( art_json = self._search_json(
r'window\.__preloadedData\s*=', webpage, 'media details', page_id, r'window\.__preloadedData\s*=', webpage, 'media details', page_id,
transform_source=lambda x: x.replace('undefined', 'null'))['initialData']['data']['article'] transform_source=lambda x: x.replace('undefined', 'null'))['initialData']['data']['article']
content = art_json['sprinkledBody']['content']
blocks = traverse_obj(art_json, ( blocks = []
'sprinkledBody', 'content', ..., ('ledeMedia', None), block_filter = lambda k, v: k == 'media' and v['__typename'] in ('Video', 'Audio')
lambda _, v: v['__typename'] in ('Video', 'Audio'))) if lede_media_block := traverse_obj(content, (..., 'ledeMedia', block_filter, any)):
lede_media_block.setdefault('sourceId', art_json.get('sourceId'))
blocks.append(lede_media_block)
blocks.extend(traverse_obj(content, (..., block_filter)))
if not blocks: if not blocks:
raise ExtractorError('Unable to extract any media blocks from webpage') raise ExtractorError('Unable to extract any media blocks from webpage')
@ -273,8 +291,7 @@ class NYTimesArticleIE(NYTimesBaseIE):
'sprinkledBody', 'content', ..., 'summary', 'content', ..., 'text', {str}), 'sprinkledBody', 'content', ..., 'summary', 'content', ..., 'text', {str}),
get_all=False) or self._html_search_meta(['og:description', 'twitter:description'], webpage), get_all=False) or self._html_search_meta(['og:description', 'twitter:description'], webpage),
'timestamp': traverse_obj(art_json, ('firstPublished', {parse_iso8601})), 'timestamp': traverse_obj(art_json, ('firstPublished', {parse_iso8601})),
'creator': ', '.join( 'creators': traverse_obj(art_json, ('bylines', ..., 'creators', ..., 'displayName', {str})),
traverse_obj(art_json, ('bylines', ..., 'creators', ..., 'displayName'))), # TODO: change to 'creators' (list)
'thumbnails': self._extract_thumbnails(traverse_obj( 'thumbnails': self._extract_thumbnails(traverse_obj(
art_json, ('promotionalMedia', 'assetCrops', ..., 'renditions', ...))), art_json, ('promotionalMedia', 'assetCrops', ..., 'renditions', ...))),
} }

View File

@ -14,8 +14,9 @@ from ..utils import (
int_or_none, int_or_none,
parse_qs, parse_qs,
srt_subtitles_timecode, srt_subtitles_timecode,
traverse_obj, url_or_none,
) )
from ..utils.traversal import traverse_obj
class PanoptoBaseIE(InfoExtractor): class PanoptoBaseIE(InfoExtractor):
@ -345,21 +346,16 @@ class PanoptoIE(PanoptoBaseIE):
subtitles = {} subtitles = {}
for stream in streams or []: for stream in streams or []:
stream_formats = [] stream_formats = []
http_stream_url = stream.get('StreamHttpUrl') for stream_url in set(traverse_obj(stream, (('StreamHttpUrl', 'StreamUrl'), {url_or_none}))):
stream_url = stream.get('StreamUrl')
if http_stream_url:
stream_formats.append({'url': http_stream_url})
if stream_url:
media_type = stream.get('ViewerMediaFileTypeName') media_type = stream.get('ViewerMediaFileTypeName')
if media_type in ('hls', ): if media_type in ('hls', ):
m3u8_formats, stream_subtitles = self._extract_m3u8_formats_and_subtitles(stream_url, video_id) fmts, subs = self._extract_m3u8_formats_and_subtitles(stream_url, video_id, m3u8_id='hls', fatal=False)
stream_formats.extend(m3u8_formats) stream_formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, stream_subtitles) self._merge_subtitles(subs, target=subtitles)
else: else:
stream_formats.append({ stream_formats.append({
'url': stream_url, 'url': stream_url,
'ext': media_type,
}) })
for fmt in stream_formats: for fmt in stream_formats:
fmt.update({ fmt.update({

View File

@ -1,5 +1,3 @@
import re
from .youtube import YoutubeIE from .youtube import YoutubeIE
from .zdf import ZDFBaseIE from .zdf import ZDFBaseIE
from ..utils import ( from ..utils import (
@ -7,44 +5,27 @@ from ..utils import (
merge_dicts, merge_dicts,
try_get, try_get,
unified_timestamp, unified_timestamp,
urljoin,
) )
class PhoenixIE(ZDFBaseIE): class PhoenixIE(ZDFBaseIE):
IE_NAME = 'phoenix.de' IE_NAME = 'phoenix.de'
_VALID_URL = r'https?://(?:www\.)?phoenix\.de/(?:[^/]+/)*[^/?#&]*-a-(?P<id>\d+)\.html' _VALID_URL = r'https?://(?:www\.)?phoenix\.de/(?:[^/?#]+/)*[^/?#&]*-a-(?P<id>\d+)\.html'
_TESTS = [{ _TESTS = [{
# Same as https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html 'url': 'https://www.phoenix.de/sendungen/dokumentationen/spitzbergen-a-893349.html',
'url': 'https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html', 'md5': 'a79e86d9774d0b3f2102aff988a0bd32',
'md5': '34ec321e7eb34231fd88616c65c92db0',
'info_dict': { 'info_dict': {
'id': '210222_phx_nachgehakt_corona_protest', 'id': '221215_phx_spitzbergen',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Wohin führt der Protest in der Pandemie?', 'title': 'Spitzbergen',
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd', 'description': 'Film von Tilmann Bünz',
'duration': 1691, 'duration': 728.0,
'timestamp': 1613902500, 'timestamp': 1555600500,
'upload_date': '20210221', 'upload_date': '20190418',
'uploader': 'Phoenix', 'uploader': 'Phoenix',
'series': 'corona nachgehakt', 'thumbnail': 'https://www.phoenix.de/sixcms/media.php/21/Bergspitzen1.png',
'episode': 'Wohin führt der Protest in der Pandemie?', 'series': 'Dokumentationen',
}, 'episode': 'Spitzbergen',
}, {
# Youtube embed
'url': 'https://www.phoenix.de/sendungen/gespraeche/phoenix-streitgut-brennglas-corona-a-1965505.html',
'info_dict': {
'id': 'hMQtqFYjomk',
'ext': 'mp4',
'title': 'phoenix streitgut: Brennglas Corona - Wie gerecht ist unsere Gesellschaft?',
'description': 'md5:ac7a02e2eb3cb17600bc372e4ab28fdd',
'duration': 3509,
'upload_date': '20201219',
'uploader': 'phoenix',
'uploader_id': 'phoenix',
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'https://www.phoenix.de/entwicklungen-in-russland-a-2044720.html', 'url': 'https://www.phoenix.de/entwicklungen-in-russland-a-2044720.html',
@ -90,8 +71,8 @@ class PhoenixIE(ZDFBaseIE):
content_id = details['tracking']['nielsen']['content']['assetid'] content_id = details['tracking']['nielsen']['content']['assetid']
info = self._extract_ptmd( info = self._extract_ptmd(
f'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/{content_id}', f'https://tmd.phoenix.de/tmd/2/android_native_6/vod/ptmd/phoenix/{content_id}',
content_id, None, url) content_id)
duration = int_or_none(try_get( duration = int_or_none(try_get(
details, lambda x: x['tracking']['nielsen']['content']['length'])) details, lambda x: x['tracking']['nielsen']['content']['length']))
@ -101,20 +82,8 @@ class PhoenixIE(ZDFBaseIE):
str) str)
episode = title if details.get('contentType') == 'episode' else None episode = title if details.get('contentType') == 'episode' else None
thumbnails = []
teaser_images = try_get(details, lambda x: x['teaserImageRef']['layouts'], dict) or {} teaser_images = try_get(details, lambda x: x['teaserImageRef']['layouts'], dict) or {}
for thumbnail_key, thumbnail_url in teaser_images.items(): thumbnails = self._extract_thumbnails(teaser_images)
thumbnail_url = urljoin(url, thumbnail_url)
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return merge_dicts(info, { return merge_dicts(info, {
'id': content_id, 'id': content_id,

View File

@ -7,11 +7,13 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
join_nonempty,
parse_qs, parse_qs,
traverse_obj, traverse_obj,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import unpack
class PlaySuisseIE(InfoExtractor): class PlaySuisseIE(InfoExtractor):
@ -26,12 +28,12 @@ class PlaySuisseIE(InfoExtractor):
{ {
# episode in a series # episode in a series
'url': 'https://www.playsuisse.ch/watch/763182?episodeId=763211', 'url': 'https://www.playsuisse.ch/watch/763182?episodeId=763211',
'md5': '82df2a470b2dfa60c2d33772a8a60cf8', 'md5': 'e20d1ede6872a03b41905ca1060a1ef2',
'info_dict': { 'info_dict': {
'id': '763211', 'id': '763211',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Knochen', 'title': 'Knochen',
'description': 'md5:8ea7a8076ba000cd9e8bc132fd0afdd8', 'description': 'md5:3bdd80e2ce20227c47aab1df2a79a519',
'duration': 3344, 'duration': 3344,
'series': 'Wilder', 'series': 'Wilder',
'season': 'Season 1', 'season': 'Season 1',
@ -42,24 +44,33 @@ class PlaySuisseIE(InfoExtractor):
}, },
}, { }, {
# film # film
'url': 'https://www.playsuisse.ch/watch/808675', 'url': 'https://www.playsuisse.ch/detail/2573198',
'md5': '818b94c1d2d7c4beef953f12cb8f3e75', 'md5': '1f115bb0a5191477b1a5771643a4283d',
'info_dict': { 'info_dict': {
'id': '808675', 'id': '2573198',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Der Läufer', 'title': 'Azor',
'description': 'md5:9f61265c7e6dcc3e046137a792b275fd', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 5280, 'genres': ['Fiction'],
'creators': ['Andreas Fontana'],
'cast': ['Fabrizio Rongione', 'Stéphanie Cléau', 'Gilles Privat', 'Alexandre Trocki'],
'location': 'France; Argentine',
'release_year': 2021,
'duration': 5981,
'thumbnail': 're:https://playsuisse-img.akamaized.net/', 'thumbnail': 're:https://playsuisse-img.akamaized.net/',
}, },
}, { }, {
# series (treated as a playlist) # series (treated as a playlist)
'url': 'https://www.playsuisse.ch/detail/1115687', 'url': 'https://www.playsuisse.ch/detail/1115687',
'info_dict': { 'info_dict': {
'description': 'md5:e4a2ae29a8895823045b5c3145a02aa3',
'id': '1115687', 'id': '1115687',
'series': 'They all came out to Montreux', 'series': 'They all came out to Montreux',
'title': 'They all came out to Montreux', 'title': 'They all came out to Montreux',
'description': 'md5:0fefd8c5b4468a0bb35e916887681520',
'genres': ['Documentary'],
'creators': ['Oliver Murray'],
'location': 'Switzerland',
'release_year': 2021,
}, },
'playlist': [{ 'playlist': [{
'info_dict': { 'info_dict': {
@ -120,6 +131,12 @@ class PlaySuisseIE(InfoExtractor):
id id
name name
description description
descriptionLong
year
contentTypes
directors
mainCast
productionCountries
duration duration
episodeNumber episodeNumber
seasonNumber seasonNumber
@ -215,9 +232,7 @@ class PlaySuisseIE(InfoExtractor):
if not self._ID_TOKEN: if not self._ID_TOKEN:
raise ExtractorError('Login failed') raise ExtractorError('Login failed')
def _get_media_data(self, media_id): def _get_media_data(self, media_id, locale=None):
# NOTE In the web app, the "locale" header is used to switch between languages,
# However this doesn't seem to take effect when passing the header here.
response = self._download_json( response = self._download_json(
'https://www.playsuisse.ch/api/graphql', 'https://www.playsuisse.ch/api/graphql',
media_id, data=json.dumps({ media_id, data=json.dumps({
@ -225,7 +240,7 @@ class PlaySuisseIE(InfoExtractor):
'query': self._GRAPHQL_QUERY, 'query': self._GRAPHQL_QUERY,
'variables': {'assetId': media_id}, 'variables': {'assetId': media_id},
}).encode(), }).encode(),
headers={'Content-Type': 'application/json', 'locale': 'de'}) headers={'Content-Type': 'application/json', 'locale': locale or 'de'})
return response['data']['assetV2'] return response['data']['assetV2']
@ -234,7 +249,7 @@ class PlaySuisseIE(InfoExtractor):
self.raise_login_required(method='password') self.raise_login_required(method='password')
media_id = self._match_id(url) media_id = self._match_id(url)
media_data = self._get_media_data(media_id) media_data = self._get_media_data(media_id, traverse_obj(parse_qs(url), ('locale', 0)))
info = self._extract_single(media_data) info = self._extract_single(media_data)
if media_data.get('episodes'): if media_data.get('episodes'):
info.update({ info.update({
@ -257,15 +272,22 @@ class PlaySuisseIE(InfoExtractor):
self._merge_subtitles(subs, target=subtitles) self._merge_subtitles(subs, target=subtitles)
return { return {
'id': media_data['id'],
'title': media_data.get('name'),
'description': media_data.get('description'),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'duration': int_or_none(media_data.get('duration')),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'series': media_data.get('seriesName'), **traverse_obj(media_data, {
'season_number': int_or_none(media_data.get('seasonNumber')), 'id': ('id', {str}),
'episode': media_data.get('name') if media_data.get('episodeNumber') else None, 'title': ('name', {str}),
'episode_number': int_or_none(media_data.get('episodeNumber')), 'description': (('descriptionLong', 'description'), {str}, any),
'genres': ('contentTypes', ..., {str}),
'creators': ('directors', ..., {str}),
'cast': ('mainCast', ..., {str}),
'location': ('productionCountries', ..., {str}, all, {unpack(join_nonempty, delim='; ')}, filter),
'release_year': ('year', {str}, {lambda x: x[:4]}, {int_or_none}),
'duration': ('duration', {int_or_none}),
'series': ('seriesName', {str}),
'season_number': ('seasonNumber', {int_or_none}),
'episode': ('name', {str}, {lambda x: x if media_data['episodeNumber'] is not None else None}),
'episode_number': ('episodeNumber', {int_or_none}),
}),
} }

View File

@ -321,6 +321,27 @@ class RaiPlayIE(RaiBaseIE):
'timestamp': 1348495020, 'timestamp': 1348495020,
'upload_date': '20120924', 'upload_date': '20120924',
}, },
}, {
# checking program_info gives false positive for DRM
'url': 'https://www.raiplay.it/video/2022/10/Ad-ogni-costo---Un-giorno-in-Pretura---Puntata-del-15102022-1dfd1295-ea38-4bac-b51e-f87e2881693b.html',
'md5': '572c6f711b7c5f2d670ba419b4ae3b08',
'info_dict': {
'id': '1dfd1295-ea38-4bac-b51e-f87e2881693b',
'ext': 'mp4',
'title': 'Ad ogni costo - Un giorno in Pretura - Puntata del 15/10/2022',
'alt_title': 'St 2022/23 - Un giorno in pretura - Ad ogni costo',
'description': 'md5:4046d97b2687f74f06a8b8270ba5599f',
'uploader': 'Rai 3',
'duration': 3773.0,
'thumbnail': 'https://www.raiplay.it/dl/img/2022/10/12/1665586539957_2048x2048.png',
'creators': ['Rai 3'],
'series': 'Un giorno in pretura',
'season': '2022/23',
'episode': 'Ad ogni costo',
'timestamp': 1665507240,
'upload_date': '20221011',
'release_year': 2025,
},
}, { }, {
'url': 'http://www.raiplay.it/video/2016/11/gazebotraindesi-efebe701-969c-4593-92f3-285f0d1ce750.html?', 'url': 'http://www.raiplay.it/video/2016/11/gazebotraindesi-efebe701-969c-4593-92f3-285f0d1ce750.html?',
'only_matching': True, 'only_matching': True,
@ -340,9 +361,8 @@ class RaiPlayIE(RaiBaseIE):
media = self._download_json( media = self._download_json(
f'{base}.json', video_id, 'Downloading video JSON') f'{base}.json', video_id, 'Downloading video JSON')
if not self.get_param('allow_unplayable_formats'): if traverse_obj(media, ('rights_management', 'rights', 'drm')):
if traverse_obj(media, (('program_info', None), 'rights_management', 'rights', 'drm')): self.report_drm(video_id)
self.report_drm(video_id)
video = media['video'] video = media['video']
relinker_info = self._extract_relinker_info(video['content_url'], video_id) relinker_info = self._extract_relinker_info(video['content_url'], video_id)

View File

@ -388,7 +388,8 @@ class RedditIE(InfoExtractor):
}) })
if entries: if entries:
return self.playlist_result(entries, video_id, **info) return self.playlist_result(entries, video_id, **info)
raise ExtractorError('No media found', expected=True) self.raise_no_formats('No media found', expected=True, video_id=video_id)
return {**info, 'id': video_id}
# Check if media is hosted on reddit: # Check if media is hosted on reddit:
reddit_video = traverse_obj(data, ( reddit_video = traverse_obj(data, (

View File

@ -1,35 +1,142 @@
import base64 import base64
import io import io
import struct import struct
import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
determine_ext, determine_ext,
float_or_none, float_or_none,
make_archive_id,
parse_iso8601,
qualities, qualities,
remove_end, url_or_none,
remove_start,
try_get,
) )
from ..utils.traversal import subs_list_to_dict, traverse_obj
class RTVEALaCartaIE(InfoExtractor): class RTVEBaseIE(InfoExtractor):
# Reimplementation of https://js2.rtve.es/pages/app-player/3.5.1/js/pf_video.js
@staticmethod
def _decrypt_url(png):
encrypted_data = io.BytesIO(base64.b64decode(png)[8:])
while True:
length_data = encrypted_data.read(4)
length = struct.unpack('!I', length_data)[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
data = bytes(filter(None, data))
alphabet_data, _, url_data = data.partition(b'#')
quality_str, _, url_data = url_data.rpartition(b'%%')
quality_str = quality_str.decode() or ''
alphabet = RTVEBaseIE._get_alphabet(alphabet_data)
url = RTVEBaseIE._get_url(alphabet, url_data)
yield quality_str, url
encrypted_data.read(4) # CRC
@staticmethod
def _get_url(alphabet, url_data):
url = ''
f = 0
e = 3
b = 1
for char in url_data.decode('iso-8859-1'):
if f == 0:
l = int(char) * 10
f = 1
else:
if e == 0:
l += int(char)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
@staticmethod
def _get_alphabet(alphabet_data):
alphabet = []
e = 0
d = 0
for char in alphabet_data.decode('iso-8859-1'):
if d == 0:
alphabet.append(char)
d = e = (e + 1) % 4
else:
d -= 1
return alphabet
def _extract_png_formats_and_subtitles(self, video_id, media_type='videos'):
formats, subtitles = [], {}
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
for manager in ('rtveplayw', 'default'):
png = self._download_webpage(
f'http://www.rtve.es/ztnr/movil/thumbnail/{manager}/{media_type}/{video_id}.png',
video_id, 'Downloading url information', query={'q': 'v2'}, fatal=False)
if not png:
continue
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
video_url, video_id, 'dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
return formats, subtitles
def _parse_metadata(self, metadata):
return traverse_obj(metadata, {
'title': ('title', {str.strip}),
'alt_title': ('alt', {str.strip}),
'description': ('description', {clean_html}),
'timestamp': ('dateOfEmission', {parse_iso8601(delimiter=' ')}),
'release_timestamp': ('publicationDate', {parse_iso8601(delimiter=' ')}),
'modified_timestamp': ('modificationDate', {parse_iso8601(delimiter=' ')}),
'thumbnail': (('thumbnail', 'image', 'imageSEO'), {url_or_none}, any),
'duration': ('duration', {float_or_none(scale=1000)}),
'is_live': ('live', {bool}),
'series': (('programTitle', ('programInfo', 'title')), {clean_html}, any),
})
class RTVEALaCartaIE(RTVEBaseIE):
IE_NAME = 'rtve.es:alacarta' IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta' IE_DESC = 'RTVE a la carta and Play'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)' _VALID_URL = [
r'https?://(?:www\.)?rtve\.es/(?:m/)?(?:(?:alacarta|play)/videos|filmoteca)/(?!directo)(?:[^/?#]+/){2}(?P<id>\d+)',
r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/?#]+/video/[^/?#]+/(?P<id>\d+)',
]
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/', 'url': 'http://www.rtve.es/alacarta/videos/la-aventura-del-saber/aventuraentornosilla/3088905/',
'md5': '1d49b7e1ca7a7502c56a4bf1b60f1b43', 'md5': 'a964547824359a5753aef09d79fe984b',
'info_dict': { 'info_dict': {
'id': '2491869', 'id': '3088905',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia', 'title': 'En torno a la silla',
'duration': 5024.566, 'duration': 1216.981,
'series': 'Balonmano', 'series': 'La aventura del Saber',
'thumbnail': 'https://img2.rtve.es/v/aventuraentornosilla_3088905.png',
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, { }, {
'note': 'Live stream', 'note': 'Live stream',
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/', 'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
@ -38,140 +145,88 @@ class RTVEALaCartaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True, 'is_live': True,
'live_status': 'is_live',
'thumbnail': r're:https://img2\.rtve\.es/v/.*\.png',
}, },
'params': { 'params': {
'skip_download': 'live stream', 'skip_download': 'live stream',
}, },
}, { }, {
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/', 'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
'md5': 'd850f3c8731ea53952ebab489cf81cbf', 'md5': 'f3cf0d1902d008c48c793e736706c174',
'info_dict': { 'info_dict': {
'id': '4236788', 'id': '4236788',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Servir y proteger - Capítulo 104', 'title': 'Episodio 104',
'duration': 3222.0, 'duration': 3222.8,
'thumbnail': r're:https://img2\.rtve\.es/v/.*\.png',
'series': 'Servir y proteger',
}, },
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, { }, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve', 'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.rtve.es/filmoteca/no-do/not-1-introduccion-primer-noticiario-espanol/1465256/', 'url': 'http://www.rtve.es/filmoteca/no-do/not-1-introduccion-primer-noticiario-espanol/1465256/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.rtve.es/play/videos/saber-vivir/07-07-24/16177116/',
'md5': 'a5b24fcdfa3ff5cb7908aba53d22d4b6',
'info_dict': {
'id': '16177116',
'ext': 'mp4',
'title': 'Saber vivir - 07/07/24',
'thumbnail': r're:https://img2\.rtve\.es/v/.*\.png',
'duration': 2162.68,
'series': 'Saber vivir',
},
}, {
'url': 'https://www.rtve.es/infantil/serie/agus-lui-churros-crafts/video/gusano/7048976/',
'info_dict': {
'id': '7048976',
'ext': 'mp4',
'title': 'Gusano',
'thumbnail': r're:https://img2\.rtve\.es/v/.*\.png',
'duration': 292.86,
'series': 'Agus & Lui: Churros y Crafts',
'_old_archive_ids': ['rtveinfantil 7048976'],
},
}] }]
def _real_initialize(self): def _get_subtitles(self, video_id):
user_agent_b64 = base64.b64encode(self.get_param('http_headers')['User-Agent'].encode()).decode('utf-8') subtitle_data = self._download_json(
self._manager = self._download_json( f'https://api2.rtve.es/api/videos/{video_id}/subtitulos.json', video_id,
'http://www.rtve.es/odin/loki/' + user_agent_b64, 'Downloading subtitles info')
None, 'Fetching manager info')['manager'] return traverse_obj(subtitle_data, ('page', 'items', ..., {
'id': ('lang', {str}),
@staticmethod 'url': ('src', {url_or_none}),
def _decrypt_url(png): }, all, {subs_list_to_dict(lang='es')}))
encrypted_data = io.BytesIO(base64.b64decode(png)[8:])
while True:
length = struct.unpack('!I', encrypted_data.read(4))[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
alphabet_data, text = data.split(b'\0')
quality, url_data = text.split(b'%%')
alphabet = []
e = 0
d = 0
for l in alphabet_data.decode('iso-8859-1'):
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data.decode('iso-8859-1'):
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
yield quality.decode(), url
encrypted_data.read(4) # CRC
def _extract_png_formats(self, video_id):
png = self._download_webpage(
f'http://www.rtve.es/ztnr/movil/thumbnail/{self._manager}/videos/{video_id}.png',
video_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
return formats
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
info = self._download_json( metadata = self._download_json(
f'http://www.rtve.es/api/videos/{video_id}/config/alacarta_videos.json', f'http://www.rtve.es/api/videos/{video_id}/config/alacarta_videos.json',
video_id)['page']['items'][0] video_id)['page']['items'][0]
if info['state'] == 'DESPU': if metadata['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True) raise ExtractorError('The video is no longer available', expected=True)
title = info['title'].strip() formats, subtitles = self._extract_png_formats_and_subtitles(video_id)
formats = self._extract_png_formats(video_id)
subtitles = None self._merge_subtitles(self.extract_subtitles(video_id), target=subtitles)
sbt_file = info.get('sbtFile')
if sbt_file:
subtitles = self.extract_subtitles(video_id, sbt_file)
is_live = info.get('live') is True is_infantil = urllib.parse.urlparse(url).path.startswith('/infantil/')
return { return {
'id': video_id, 'id': video_id,
'title': title,
'formats': formats, 'formats': formats,
'thumbnail': info.get('image'),
'subtitles': subtitles, 'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), 1000), **self._parse_metadata(metadata),
'is_live': is_live, '_old_archive_ids': [make_archive_id('rtveinfantil', video_id)] if is_infantil else None,
'series': info.get('programTitle'),
} }
def _get_subtitles(self, video_id, sub_file):
subs = self._download_json(
sub_file + '.json', video_id,
'Downloading subtitles info')['page']['items']
return dict(
(s['lang'], [{'ext': 'vtt', 'url': s['src']}])
for s in subs)
class RTVEAudioIE(RTVEBaseIE):
class RTVEAudioIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE
IE_NAME = 'rtve.es:audio' IE_NAME = 'rtve.es:audio'
IE_DESC = 'RTVE audio' IE_DESC = 'RTVE audio'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/(alacarta|play)/audios/[^/]+/[^/]+/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/(alacarta|play)/audios/(?:[^/?#]+/){2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.rtve.es/alacarta/audios/a-hombros-de-gigantes/palabra-ingeniero-codigos-informaticos-27-04-21/5889192/', 'url': 'https://www.rtve.es/alacarta/audios/a-hombros-de-gigantes/palabra-ingeniero-codigos-informaticos-27-04-21/5889192/',
@ -180,9 +235,11 @@ class RTVEAudioIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE
'id': '5889192', 'id': '5889192',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Códigos informáticos', 'title': 'Códigos informáticos',
'thumbnail': r're:https?://.+/1598856591583.jpg', 'alt_title': 'Códigos informáticos - Escuchar ahora',
'duration': 349.440, 'duration': 349.440,
'series': 'A hombros de gigantes', 'series': 'A hombros de gigantes',
'description': 'md5:72b0d7c1ca20fd327bdfff7ac0171afb',
'thumbnail': 'https://img2.rtve.es/a/palabra-ingeniero-codigos-informaticos-270421_5889192.png',
}, },
}, { }, {
'url': 'https://www.rtve.es/play/audios/en-radio-3/ignatius-farray/5791165/', 'url': 'https://www.rtve.es/play/audios/en-radio-3/ignatius-farray/5791165/',
@ -191,9 +248,11 @@ class RTVEAudioIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE
'id': '5791165', 'id': '5791165',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Ignatius Farray', 'title': 'Ignatius Farray',
'alt_title': 'En Radio 3 - Ignatius Farray - 13/02/21 - escuchar ahora',
'thumbnail': r're:https?://.+/1613243011863.jpg', 'thumbnail': r're:https?://.+/1613243011863.jpg',
'duration': 3559.559, 'duration': 3559.559,
'series': 'En Radio 3', 'series': 'En Radio 3',
'description': 'md5:124aa60b461e0b1724a380bad3bc4040',
}, },
}, { }, {
'url': 'https://www.rtve.es/play/audios/frankenstein-o-el-moderno-prometeo/capitulo-26-ultimo-muerte-victor-juan-jose-plans-mary-shelley/6082623/', 'url': 'https://www.rtve.es/play/audios/frankenstein-o-el-moderno-prometeo/capitulo-26-ultimo-muerte-victor-juan-jose-plans-mary-shelley/6082623/',
@ -202,126 +261,101 @@ class RTVEAudioIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE
'id': '6082623', 'id': '6082623',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Capítulo 26 y último: La muerte de Victor', 'title': 'Capítulo 26 y último: La muerte de Victor',
'alt_title': 'Frankenstein o el moderno Prometeo - Capítulo 26 y último: La muerte de Victor',
'thumbnail': r're:https?://.+/1632147445707.jpg', 'thumbnail': r're:https?://.+/1632147445707.jpg',
'duration': 3174.086, 'duration': 3174.086,
'series': 'Frankenstein o el moderno Prometeo', 'series': 'Frankenstein o el moderno Prometeo',
'description': 'md5:4ee6fcb82ebe2e46d267e1d1c1a8f7b5',
}, },
}] }]
def _extract_png_formats(self, audio_id):
"""
This function retrieves media related png thumbnail which obfuscate
valuable information about the media. This information is decrypted
via base class _decrypt_url function providing media quality and
media url
"""
png = self._download_webpage(
f'http://www.rtve.es/ztnr/movil/thumbnail/{self._manager}/audios/{audio_id}.png',
audio_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, audio_url in self._decrypt_url(png):
ext = determine_ext(audio_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
audio_url, audio_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
audio_url, audio_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': audio_url,
})
return formats
def _real_extract(self, url): def _real_extract(self, url):
audio_id = self._match_id(url) audio_id = self._match_id(url)
info = self._download_json( metadata = self._download_json(
f'https://www.rtve.es/api/audios/{audio_id}.json', f'https://www.rtve.es/api/audios/{audio_id}.json', audio_id)['page']['items'][0]
audio_id)['page']['items'][0]
formats, subtitles = self._extract_png_formats_and_subtitles(audio_id, media_type='audios')
return { return {
'id': audio_id, 'id': audio_id,
'title': info['title'].strip(), 'formats': formats,
'thumbnail': info.get('thumbnail'), 'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), 1000), **self._parse_metadata(metadata),
'series': try_get(info, lambda x: x['programInfo']['title']),
'formats': self._extract_png_formats(audio_id),
} }
class RTVEInfantilIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE class RTVELiveIE(RTVEBaseIE):
IE_NAME = 'rtve.es:infantil'
IE_DESC = 'RTVE infantil'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
_TESTS = [{
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
'md5': '5747454717aedf9f9fdf212d1bcfc48d',
'info_dict': {
'id': '3040283',
'ext': 'mp4',
'title': 'Maneras de vivir',
'thumbnail': r're:https?://.+/1426182947956\.JPG',
'duration': 357.958,
},
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}]
class RTVELiveIE(RTVEALaCartaIE): # XXX: Do not subclass from concrete IE
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = [
r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)',
r'https?://(?:www\.)?rtve\.es/play/videos/directo/[^/?#]+/(?P<id>[a-zA-Z0-9-]+)',
]
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/directo/la-1/', 'url': 'http://www.rtve.es/directo/la-1/',
'info_dict': { 'info_dict': {
'id': 'la-1', 'id': 'la-1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'live_status': 'is_live',
'title': str,
'description': str,
'thumbnail': r're:https://img\d\.rtve\.es/resources/thumbslive/\d+\.jpg',
'timestamp': int,
'upload_date': str,
}, },
'params': { 'params': {'skip_download': 'live stream'},
'skip_download': 'live stream', }, {
'url': 'https://www.rtve.es/play/videos/directo/deportes/tdp/',
'info_dict': {
'id': 'tdp',
'ext': 'mp4',
'live_status': 'is_live',
'title': str,
'description': str,
'thumbnail': r're:https://img2\d\.rtve\.es/resources/thumbslive/\d+\.jpg',
'timestamp': int,
'upload_date': str,
}, },
'params': {'skip_download': 'live stream'},
}, {
'url': 'http://www.rtve.es/play/videos/directo/canales-lineales/la-1/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
title = remove_start(title, 'Estoy viendo ')
vidplayer_id = self._search_regex( data_setup = self._search_json(
(r'playerId=player([0-9]+)', r'<div[^>]+class="[^"]*videoPlayer[^"]*"[^>]*data-setup=\'',
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)', webpage, 'data_setup', video_id)
r'data-id=["\'](\d+)'),
webpage, 'internal video ID') formats, subtitles = self._extract_png_formats_and_subtitles(data_setup['idAsset'])
return { return {
'id': video_id, 'id': video_id,
'title': title, **self._search_json_ld(webpage, video_id, fatal=False),
'formats': self._extract_png_formats(vidplayer_id), 'title': self._html_extract_title(webpage),
'formats': formats,
'subtitles': subtitles,
'is_live': True, 'is_live': True,
} }
class RTVETelevisionIE(InfoExtractor): class RTVETelevisionIE(InfoExtractor):
IE_NAME = 'rtve.es:television' IE_NAME = 'rtve.es:television'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml' _VALID_URL = r'https?://(?:www\.)?rtve\.es/television/[^/?#]+/[^/?#]+/(?P<id>\d+).shtml'
_TEST = { _TEST = {
'url': 'http://www.rtve.es/television/20160628/revolucion-del-movil/1364141.shtml', 'url': 'https://www.rtve.es/television/20091103/video-inedito-del-8o-programa/299020.shtml',
'info_dict': { 'info_dict': {
'id': '3069778', 'id': '572515',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Documentos TV - La revolución del móvil', 'title': 'Clase inédita',
'duration': 3496.948, 'duration': 335.817,
'thumbnail': r're:https://img2\.rtve\.es/v/.*\.png',
'series': 'El coro de la cárcel',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -332,11 +366,8 @@ class RTVETelevisionIE(InfoExtractor):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
alacarta_url = self._search_regex( play_url = self._html_search_meta('contentUrl', webpage)
r'data-location="alacarta_videos"[^<]+url&quot;:&quot;(http://www\.rtve\.es/alacarta.+?)&', if play_url is None:
webpage, 'alacarta url', default=None) raise ExtractorError('The webpage doesn\'t contain any video', expected=True)
if alacarta_url is None:
raise ExtractorError(
'The webpage doesn\'t contain any video', expected=True)
return self.url_result(alacarta_url, ie=RTVEALaCartaIE.ie_key()) return self.url_result(play_url, ie=RTVEALaCartaIE.ie_key())

View File

@ -1,61 +0,0 @@
from .adobepass import AdobePassIE
from ..utils import (
int_or_none,
smuggle_url,
update_url_query,
)
class SproutIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?:sproutonline|universalkids)\.com/(?:watch|(?:[^/]+/)*videos)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.universalkids.com/shows/remy-and-boo/season/1/videos/robot-bike-race',
'info_dict': {
'id': 'bm0foJFaTKqb',
'ext': 'mp4',
'title': 'Robot Bike Race',
'description': 'md5:436b1d97117cc437f54c383f4debc66d',
'timestamp': 1606148940,
'upload_date': '20201123',
'uploader': 'NBCU-MPAT',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.sproutonline.com/watch/cowboy-adventure',
'only_matching': True,
}, {
'url': 'https://www.universalkids.com/watch/robot-bike-race',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
def _real_extract(self, url):
display_id = self._match_id(url)
mpx_metadata = self._download_json(
# http://nbcuunikidsprod.apps.nbcuni.com/networks/universalkids/content/videos/
'https://www.universalkids.com/_api/videos/' + display_id,
display_id)['mpxMetadata']
media_pid = mpx_metadata['mediaPid']
theplatform_url = 'https://link.theplatform.com/s/HNK2IC/' + media_pid
query = {
'mbr': 'true',
'manifest': 'm3u',
}
if mpx_metadata.get('entitlement') == 'auth':
query['auth'] = self._extract_mvpd_auth(url, media_pid, 'sprout', 'sprout')
theplatform_url = smuggle_url(
update_url_query(theplatform_url, query), {
'force_smil_url': True,
'geo_countries': self._GEO_COUNTRIES,
})
return {
'_type': 'url_transparent',
'id': media_pid,
'url': theplatform_url,
'series': mpx_metadata.get('seriesName'),
'season_number': int_or_none(mpx_metadata.get('seasonNumber')),
'episode_number': int_or_none(mpx_metadata.get('episodeNumber')),
'ie_key': 'ThePlatform',
}

View File

@ -471,8 +471,7 @@ class SVTPageIE(SVTBaseIE):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
urql_state = self._search_json( urql_state = self._search_json(r'urqlState\s*[=:]', webpage, 'json data', display_id)
r'window\.svt\.(?:nyh\.)?urqlState\s*=', webpage, 'json data', display_id)
data = traverse_obj(urql_state, (..., 'data', {str}, {json.loads}), get_all=False) or {} data = traverse_obj(urql_state, (..., 'data', {str}, {json.loads}), get_all=False) or {}

View File

@ -2,12 +2,13 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
extract_attributes,
js_to_json, js_to_json,
url_or_none, url_or_none,
) )
from ..utils.traversal import find_element, traverse_obj
class TV2DKIE(InfoExtractor): class TV2DKIE(InfoExtractor):
@ -21,35 +22,46 @@ class TV2DKIE(InfoExtractor):
tv2fyn| tv2fyn|
tv2east| tv2east|
tv2lorry| tv2lorry|
tv2nord tv2nord|
tv2kosmopol
)\.dk/ )\.dk/
(:[^/]+/)* (?:[^/?#]+/)*
(?P<id>[^/?\#&]+) (?P<id>[^/?\#&]+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.tvsyd.dk/nyheder/28-10-2019/1930/1930-28-okt-2019?autoplay=1#player', 'url': 'https://www.tvsyd.dk/nyheder/28-10-2019/1930/1930-28-okt-2019?autoplay=1#player',
'info_dict': { 'info_dict': {
'id': '0_52jmwa0p', 'id': 'sPp5z21q',
'ext': 'mp4', 'ext': 'mp4',
'title': '19:30 - 28. okt. 2019', 'title': '19:30 - 28. okt. 2019',
'timestamp': 1572290248, 'description': '',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/sPp5z21q/poster.jpg?width=720',
'timestamp': 1572287400,
'upload_date': '20191028', 'upload_date': '20191028',
'uploader_id': 'tvsyd',
'duration': 1347,
'view_count': int,
}, },
'add_ie': ['Kaltura'],
}, { }, {
'url': 'https://www.tv2lorry.dk/gadekamp/gadekamp-6-hoejhuse-i-koebenhavn', 'url': 'https://www.tv2lorry.dk/gadekamp/gadekamp-6-hoejhuse-i-koebenhavn',
'info_dict': { 'info_dict': {
'id': '1_7iwll9n0', 'id': 'oD9cyq0m',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20211027',
'title': 'Gadekamp #6 - Højhuse i København', 'title': 'Gadekamp #6 - Højhuse i København',
'uploader_id': 'tv2lorry', 'description': '',
'timestamp': 1635345229, 'thumbnail': 'https://cdn.jwplayer.com/v2/media/oD9cyq0m/poster.jpg?width=720',
'timestamp': 1635348600,
'upload_date': '20211027',
}, },
'add_ie': ['Kaltura'], }, {
'url': 'https://www.tvsyd.dk/haderslev/x-factor-brodre-fulde-af-selvtillid-er-igen-hjemme-hos-mor-vores-diagnoser-har-vaeret-en-fordel',
'info_dict': {
'id': 'x-factor-brodre-fulde-af-selvtillid-er-igen-hjemme-hos-mor-vores-diagnoser-har-vaeret-en-fordel',
},
'playlist_count': 2,
}, {
'url': 'https://www.tv2ostjylland.dk/aarhus/dom-kan-fa-alvorlige-konsekvenser',
'info_dict': {
'id': 'dom-kan-fa-alvorlige-konsekvenser',
},
'playlist_count': 3,
}, { }, {
'url': 'https://www.tv2ostjylland.dk/artikel/minister-gaar-ind-i-sag-om-diabetes-teknologi', 'url': 'https://www.tv2ostjylland.dk/artikel/minister-gaar-ind-i-sag-om-diabetes-teknologi',
'only_matching': True, 'only_matching': True,
@ -71,40 +83,22 @@ class TV2DKIE(InfoExtractor):
}, { }, {
'url': 'https://www.tv2nord.dk/artikel/dybt-uacceptabelt', 'url': 'https://www.tv2nord.dk/artikel/dybt-uacceptabelt',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.tv2kosmopol.dk/metropolen/chaufforer-beordres-til-at-kore-videre-i-ulovlige-busser-med-rode-advarselslamper',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
search_space = traverse_obj(webpage, {find_element(tag='article')}) or webpage
entries = [] player_ids = traverse_obj(
re.findall(r'x-data="(?:video_player|simple_player)\(({[^"]+})', search_space),
(..., {js_to_json}, {json.loads}, ('jwpMediaId', 'videoId'), {str}))
def add_entry(partner_id, kaltura_id): return self.playlist_from_matches(
entries.append(self.url_result( player_ids, video_id, getter=lambda x: f'jwplatform:{x}', ie=JWPlatformIE)
f'kaltura:{partner_id}:{kaltura_id}', 'Kaltura',
video_id=kaltura_id))
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
video = extract_attributes(video_el)
kaltura_id = video.get('data-entryid')
if not kaltura_id:
continue
partner_id = video.get('data-partnerid')
if not partner_id:
continue
add_entry(partner_id, kaltura_id)
if not entries:
kaltura_id = self._search_regex(
(r'entry_id\s*:\s*["\']([0-9a-z_]+)',
r'\\u002FentryId\\u002F(\w+)\\u002F'), webpage, 'kaltura id')
partner_id = self._search_regex(
(r'\\u002Fp\\u002F(\d+)\\u002F', r'/p/(\d+)/'), webpage,
'partner id')
add_entry(partner_id, kaltura_id)
if len(entries) == 1:
return entries[0]
return self.playlist_result(entries)
class TV2DKBornholmPlayIE(InfoExtractor): class TV2DKBornholmPlayIE(InfoExtractor):

View File

@ -513,7 +513,7 @@ class TVPVODBaseIE(InfoExtractor):
class TVPVODVideoIE(TVPVODBaseIE): class TVPVODVideoIE(TVPVODBaseIE):
IE_NAME = 'tvp:vod' IE_NAME = 'tvp:vod'
_VALID_URL = r'https?://vod\.tvp\.pl/(?P<category>[a-z\d-]+,\d+)/[a-z\d-]+(?<!-odcinki)(?:-odcinki,\d+/odcinek-\d+,S\d+E\d+)?,(?P<id>\d+)/?(?:[?#]|$)' _VALID_URL = r'https?://vod\.tvp\.pl/(?P<category>[a-z\d-]+,\d+)/[a-z\d-]+(?<!-odcinki)(?:-odcinki,\d+/odcinek--?\d+,S-?\d+E-?\d+)?,(?P<id>\d+)/?(?:[?#]|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338/odcinek-24,S01E24,311357', 'url': 'https://vod.tvp.pl/dla-dzieci,24/laboratorium-alchemika-odcinki,309338/odcinek-24,S01E24,311357',
@ -568,6 +568,9 @@ class TVPVODVideoIE(TVPVODBaseIE):
'live_status': 'is_live', 'live_status': 'is_live',
'thumbnail': 're:https?://.+', 'thumbnail': 're:https?://.+',
}, },
}, {
'url': 'https://vod.tvp.pl/informacje-i-publicystyka,205/konskie-2025-debata-przedwyborcza-odcinki,2028435/odcinek--1,S01E-1,2028419',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,13 +1,21 @@
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import clean_html, remove_end, unified_timestamp, url_or_none from ..utils import (
from ..utils.traversal import traverse_obj clean_html,
extract_attributes,
parse_qs,
remove_end,
require,
unified_timestamp,
url_or_none,
)
from ..utils.traversal import find_element, traverse_obj
class TvwIE(InfoExtractor): class TvwIE(InfoExtractor):
IE_NAME = 'tvw'
_VALID_URL = r'https?://(?:www\.)?tvw\.org/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?tvw\.org/video/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://tvw.org/video/billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211/', 'url': 'https://tvw.org/video/billy-frank-jr-statue-maquette-unveiling-ceremony-2024011211/',
'md5': '9ceb94fe2bb7fd726f74f16356825703', 'md5': '9ceb94fe2bb7fd726f74f16356825703',
@ -115,3 +123,43 @@ class TvwIE(InfoExtractor):
'is_live': ('eventStatus', {lambda x: x == 'live'}), 'is_live': ('eventStatus', {lambda x: x == 'live'}),
}), }),
} }
class TvwTvChannelsIE(InfoExtractor):
IE_NAME = 'tvw:tvchannels'
_VALID_URL = r'https?://(?:www\.)?tvw\.org/tvchannels/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://tvw.org/tvchannels/air/',
'info_dict': {
'id': 'air',
'ext': 'mp4',
'title': r're:TVW Cable Channel Live Stream',
'thumbnail': r're:https?://.+/.+\.(?:jpe?g|png)$',
'live_status': 'is_live',
},
}, {
'url': 'https://tvw.org/tvchannels/tvw2/',
'info_dict': {
'id': 'tvw2',
'ext': 'mp4',
'title': r're:TVW-2 Broadcast Channel',
'thumbnail': r're:https?://.+/.+\.(?:jpe?g|png)$',
'live_status': 'is_live',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m3u8_url = traverse_obj(webpage, (
{find_element(id='invintus-persistent-stream-frame', html=True)}, {extract_attributes},
'src', {parse_qs}, 'encoder', 0, {json.loads}, 'live247URI', {url_or_none}, {require('stream url')}))
return {
'id': video_id,
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', live=True),
'title': remove_end(self._og_search_title(webpage, default=None), ' - TVW'),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'is_live': True,
}

View File

@ -14,12 +14,13 @@ from ..utils import (
parse_duration, parse_duration,
qualities, qualities,
str_to_int, str_to_int,
traverse_obj,
try_get, try_get,
unified_timestamp, unified_timestamp,
url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import traverse_obj
class TwitCastingIE(InfoExtractor): class TwitCastingIE(InfoExtractor):
@ -138,13 +139,7 @@ class TwitCastingIE(InfoExtractor):
r'data-toggle="true"[^>]+datetime="([^"]+)"', r'data-toggle="true"[^>]+datetime="([^"]+)"',
webpage, 'datetime', None)) webpage, 'datetime', None))
stream_server_data = self._download_json(
f'https://twitcasting.tv/streamserver.php?target={uploader_id}&mode=client', video_id,
'Downloading live info', fatal=False)
is_live = any(f'data-{x}' in webpage for x in ['is-onlive="true"', 'live-type="live"', 'status="online"']) is_live = any(f'data-{x}' in webpage for x in ['is-onlive="true"', 'live-type="live"', 'status="online"'])
if not traverse_obj(stream_server_data, 'llfmp4') and is_live:
self.raise_login_required(method='cookies')
base_dict = { base_dict = {
'title': title, 'title': title,
@ -165,28 +160,37 @@ class TwitCastingIE(InfoExtractor):
return [data_movie_url] return [data_movie_url]
m3u8_urls = (try_get(webpage, find_dmu, list) m3u8_urls = (try_get(webpage, find_dmu, list)
or traverse_obj(video_js_data, (..., 'source', 'url')) or traverse_obj(video_js_data, (..., 'source', 'url')))
or ([f'https://twitcasting.tv/{uploader_id}/metastream.m3u8'] if is_live else None))
if not m3u8_urls:
raise ExtractorError('Failed to get m3u8 playlist')
if is_live: if is_live:
m3u8_url = m3u8_urls[0] stream_data = self._download_json(
formats = self._extract_m3u8_formats( 'https://twitcasting.tv/streamserver.php',
m3u8_url, video_id, ext='mp4', m3u8_id='hls', video_id, 'Downloading live info', query={
live=True, headers=self._M3U8_HEADERS) 'target': uploader_id,
'mode': 'client',
'player': 'pc_web',
})
if traverse_obj(stream_server_data, ('hls', 'source')): formats = []
formats.extend(self._extract_m3u8_formats( # low: 640x360, medium: 1280x720, high: 1920x1080
m3u8_url, video_id, ext='mp4', m3u8_id='source', qq = qualities(['low', 'medium', 'high'])
live=True, query={'mode': 'source'}, for quality, m3u8_url in traverse_obj(stream_data, (
note='Downloading source quality m3u8', 'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
headers=self._M3U8_HEADERS, fatal=False)) )):
formats.append({
'url': m3u8_url,
'format_id': f'hls-{quality}',
'ext': 'mp4',
'quality': qq(quality),
'protocol': 'm3u8',
'http_headers': self._M3U8_HEADERS,
})
if websockets: if websockets:
qq = qualities(['base', 'mobilesource', 'main']) qq = qualities(['base', 'mobilesource', 'main'])
streams = traverse_obj(stream_server_data, ('llfmp4', 'streams')) or {} for mode, ws_url in traverse_obj(stream_data, (
for mode, ws_url in streams.items(): 'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
)):
formats.append({ formats.append({
'url': ws_url, 'url': ws_url,
'format_id': f'ws-{mode}', 'format_id': f'ws-{mode}',
@ -197,10 +201,15 @@ class TwitCastingIE(InfoExtractor):
'protocol': 'websocket_frag', 'protocol': 'websocket_frag',
}) })
if not formats:
self.raise_login_required()
infodict = { infodict = {
'formats': formats, 'formats': formats,
'_format_sort_fields': ('source', ), '_format_sort_fields': ('source', ),
} }
elif not m3u8_urls:
raise ExtractorError('Failed to get m3u8 playlist')
elif len(m3u8_urls) == 1: elif len(m3u8_urls) == 1:
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
m3u8_urls[0], video_id, 'mp4', headers=self._M3U8_HEADERS) m3u8_urls[0], video_id, 'mp4', headers=self._M3U8_HEADERS)

View File

@ -1225,8 +1225,8 @@ class TwitchClipsIE(TwitchBaseIE):
'channel_id': ('broadcaster', 'id', {str}), 'channel_id': ('broadcaster', 'id', {str}),
'channel_follower_count': ('broadcaster', 'followers', 'totalCount', {int_or_none}), 'channel_follower_count': ('broadcaster', 'followers', 'totalCount', {int_or_none}),
'channel_is_verified': ('broadcaster', 'isPartner', {bool}), 'channel_is_verified': ('broadcaster', 'isPartner', {bool}),
'uploader': ('broadcaster', 'displayName', {str}), 'uploader': ('curator', 'displayName', {str}),
'uploader_id': ('broadcaster', 'id', {str}), 'uploader_id': ('curator', 'id', {str}),
'categories': ('game', 'displayName', {str}, filter, all, filter), 'categories': ('game', 'displayName', {str}, filter, all, filter),
}), }),
} }

View File

@ -1221,20 +1221,10 @@ class TwitterIE(TwitterBaseIE):
}] }]
_MEDIA_ID_RE = re.compile(r'_video/(\d+)/') _MEDIA_ID_RE = re.compile(r'_video/(\d+)/')
_GRAPHQL_ENDPOINT = '2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId'
@property
def _GRAPHQL_ENDPOINT(self):
if self.is_logged_in:
return 'zZXycP0V6H7m-2r0mOnFcA/TweetDetail'
return '2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId'
def _graphql_to_legacy(self, data, twid): def _graphql_to_legacy(self, data, twid):
result = traverse_obj(data, ( result = traverse_obj(data, ('tweetResult', 'result', {dict})) or {}
'threaded_conversation_with_injections_v2', 'instructions', 0, 'entries',
lambda _, v: v['entryId'] == f'tweet-{twid}', 'content', 'itemContent',
'tweet_results', 'result', ('tweet', None), {dict},
), default={}, get_all=False) if self.is_logged_in else traverse_obj(
data, ('tweetResult', 'result', {dict}), default={})
typename = result.get('__typename') typename = result.get('__typename')
if typename not in ('Tweet', 'TweetWithVisibilityResults', 'TweetTombstone', 'TweetUnavailable', None): if typename not in ('Tweet', 'TweetWithVisibilityResults', 'TweetTombstone', 'TweetUnavailable', None):
@ -1278,37 +1268,6 @@ class TwitterIE(TwitterBaseIE):
def _build_graphql_query(self, media_id): def _build_graphql_query(self, media_id):
return { return {
'variables': {
'focalTweetId': media_id,
'includePromotedContent': True,
'with_rux_injections': False,
'withBirdwatchNotes': True,
'withCommunity': True,
'withDownvotePerspective': False,
'withQuickPromoteEligibilityTweetFields': True,
'withReactionsMetadata': False,
'withReactionsPerspective': False,
'withSuperFollowsTweetFields': True,
'withSuperFollowsUserFields': True,
'withV2Timeline': True,
'withVoice': True,
},
'features': {
'graphql_is_translatable_rweb_tweet_is_translatable_enabled': False,
'interactive_text_enabled': True,
'responsive_web_edit_tweet_api_enabled': True,
'responsive_web_enhance_cards_enabled': True,
'responsive_web_graphql_timeline_navigation_enabled': False,
'responsive_web_text_conversations_enabled': False,
'responsive_web_uc_gql_enabled': True,
'standardized_nudges_misinfo': True,
'tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled': False,
'tweetypie_unmention_optimization_enabled': True,
'unified_cards_ad_metadata_container_dynamic_card_content_query_enabled': True,
'verified_phone_label_enabled': False,
'vibe_api_enabled': True,
},
} if self.is_logged_in else {
'variables': { 'variables': {
'tweetId': media_id, 'tweetId': media_id,
'withCommunity': False, 'withCommunity': False,
@ -1717,21 +1676,22 @@ class TwitterSpacesIE(TwitterBaseIE):
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/spaces/(?P<id>[0-9a-zA-Z]{13})' _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/spaces/(?P<id>[0-9a-zA-Z]{13})'
_TESTS = [{ _TESTS = [{
'url': 'https://twitter.com/i/spaces/1RDxlgyvNXzJL', 'url': 'https://twitter.com/i/spaces/1OwxWwQOPlNxQ',
'info_dict': { 'info_dict': {
'id': '1RDxlgyvNXzJL', 'id': '1OwxWwQOPlNxQ',
'ext': 'm4a', 'ext': 'm4a',
'title': 'King Carlo e la mossa Kansas City per fare il Grande Centro', 'title': 'Everybody in: @mtbarra & @elonmusk discuss the future of EV charging',
'description': 'Twitter Space participated by annarita digiorgio, Signor Ernesto, Raffaello Colosimo, Simone M. Sepe', 'description': 'Twitter Space participated by Elon Musk',
'uploader': r're:Lucio Di Gaetano.*?',
'uploader_id': 'luciodigaetano',
'live_status': 'was_live', 'live_status': 'was_live',
'timestamp': 1659877956, 'release_date': '20230608',
'upload_date': '20220807', 'release_timestamp': 1686256230,
'release_timestamp': 1659904215, 'thumbnail': r're:https?://pbs\.twimg\.com/profile_images/.+',
'release_date': '20220807', 'timestamp': 1686254250,
'upload_date': '20230608',
'uploader': 'Mary Barra',
'uploader_id': 'mtbarra',
}, },
'skip': 'No longer available', 'params': {'skip_download': 'm3u8'},
}, { }, {
# post_live/TimedOut but downloadable # post_live/TimedOut but downloadable
'url': 'https://twitter.com/i/spaces/1vAxRAVQWONJl', 'url': 'https://twitter.com/i/spaces/1vAxRAVQWONJl',
@ -1743,9 +1703,10 @@ class TwitterSpacesIE(TwitterBaseIE):
'uploader': 'Google Cloud', 'uploader': 'Google Cloud',
'uploader_id': 'googlecloud', 'uploader_id': 'googlecloud',
'live_status': 'post_live', 'live_status': 'post_live',
'thumbnail': r're:https?://pbs\.twimg\.com/profile_images/.+',
'timestamp': 1681409554, 'timestamp': 1681409554,
'upload_date': '20230413', 'upload_date': '20230413',
'release_timestamp': 1681839000, 'release_timestamp': 1681839082,
'release_date': '20230418', 'release_date': '20230418',
'protocol': 'm3u8', # ffmpeg is forced 'protocol': 'm3u8', # ffmpeg is forced
'container': 'm4a_dash', # audio-only format fixup is applied 'container': 'm4a_dash', # audio-only format fixup is applied
@ -1762,6 +1723,9 @@ class TwitterSpacesIE(TwitterBaseIE):
'uploader': '息根とめる', 'uploader': '息根とめる',
'uploader_id': 'tomeru_ikinone', 'uploader_id': 'tomeru_ikinone',
'live_status': 'was_live', 'live_status': 'was_live',
'release_date': '20230601',
'release_timestamp': 1685617200,
'thumbnail': r're:https?://pbs\.twimg\.com/profile_images/.+',
'timestamp': 1685617198, 'timestamp': 1685617198,
'upload_date': '20230601', 'upload_date': '20230601',
'protocol': 'm3u8', # ffmpeg is forced 'protocol': 'm3u8', # ffmpeg is forced
@ -1779,9 +1743,10 @@ class TwitterSpacesIE(TwitterBaseIE):
'uploader': 'Candace Owens', 'uploader': 'Candace Owens',
'uploader_id': 'RealCandaceO', 'uploader_id': 'RealCandaceO',
'live_status': 'was_live', 'live_status': 'was_live',
'thumbnail': r're:https?://pbs\.twimg\.com/profile_images/.+',
'timestamp': 1723931351, 'timestamp': 1723931351,
'upload_date': '20240817', 'upload_date': '20240817',
'release_timestamp': 1723932000, 'release_timestamp': 1723932056,
'release_date': '20240817', 'release_date': '20240817',
'protocol': 'm3u8_native', # not ffmpeg, detected as video space 'protocol': 'm3u8_native', # not ffmpeg, detected as video space
}, },
@ -1861,18 +1826,21 @@ class TwitterSpacesIE(TwitterBaseIE):
return { return {
'id': space_id, 'id': space_id,
'title': metadata.get('title'),
'description': f'Twitter Space participated by {participants}', 'description': f'Twitter Space participated by {participants}',
'uploader': traverse_obj(
metadata, ('creator_results', 'result', 'legacy', 'name')),
'uploader_id': traverse_obj(
metadata, ('creator_results', 'result', 'legacy', 'screen_name')),
'live_status': live_status,
'release_timestamp': try_call(
lambda: int_or_none(metadata['scheduled_start'], scale=1000)),
'timestamp': int_or_none(metadata.get('created_at'), scale=1000),
'formats': formats, 'formats': formats,
'http_headers': headers, 'http_headers': headers,
'live_status': live_status,
**traverse_obj(metadata, {
'title': ('title', {str}),
# started_at is None when stream is_upcoming so fallback to scheduled_start for --wait-for-video
'release_timestamp': (('started_at', 'scheduled_start'), {int_or_none(scale=1000)}, any),
'timestamp': ('created_at', {int_or_none(scale=1000)}),
}),
**traverse_obj(metadata, ('creator_results', 'result', 'legacy', {
'uploader': ('name', {str}),
'uploader_id': ('screen_name', {str_or_none}),
'thumbnail': ('profile_image_url_https', {lambda x: x.replace('_normal', '_400x400')}, {url_or_none}),
})),
} }

View File

@ -39,6 +39,14 @@ class VimeoBaseInfoExtractor(InfoExtractor):
_NETRC_MACHINE = 'vimeo' _NETRC_MACHINE = 'vimeo'
_LOGIN_REQUIRED = False _LOGIN_REQUIRED = False
_LOGIN_URL = 'https://vimeo.com/log_in' _LOGIN_URL = 'https://vimeo.com/log_in'
_IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw=='
_IOS_CLIENT_HEADERS = {
'Accept': 'application/vnd.vimeo.*+json; version=3.4.10',
'Accept-Language': 'en',
'User-Agent': 'Vimeo/11.10.0 (com.vimeo; build:250424.164813.0; iOS 18.4.1) Alamofire/5.9.0 VimeoNetworking/5.0.0',
}
_IOS_OAUTH_CACHE_KEY = 'oauth-token-ios'
_ios_oauth_token = None
@staticmethod @staticmethod
def _smuggle_referrer(url, referrer_url): def _smuggle_referrer(url, referrer_url):
@ -88,13 +96,16 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True) expected=True)
return password return password
def _verify_video_password(self, video_id, password, token): def _verify_video_password(self, video_id):
video_password = self._get_video_password()
token = self._download_json(
'https://vimeo.com/_next/viewer', video_id, 'Downloading viewer info')['xsrft']
url = f'https://vimeo.com/{video_id}' url = f'https://vimeo.com/{video_id}'
try: try:
return self._download_webpage( self._request_webpage(
f'{url}/password', video_id, f'{url}/password', video_id,
'Submitting video password', data=json.dumps({ 'Submitting video password', data=json.dumps({
'password': password, 'password': video_password,
'token': token, 'token': token,
}, separators=(',', ':')).encode(), headers={ }, separators=(',', ':')).encode(), headers={
'Accept': '*/*', 'Accept': '*/*',
@ -239,20 +250,39 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'), '_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'),
} }
def _call_videos_api(self, video_id, jwt_token, unlisted_hash=None, **kwargs): def _fetch_oauth_token(self):
if not self._ios_oauth_token:
self._ios_oauth_token = self.cache.load(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY)
if not self._ios_oauth_token:
self._ios_oauth_token = self._download_json(
'https://api.vimeo.com/oauth/authorize/client', None,
'Fetching OAuth token', 'Failed to fetch OAuth token',
headers={
'Authorization': f'Basic {self._IOS_CLIENT_AUTH}',
**self._IOS_CLIENT_HEADERS,
}, data=urlencode_postdata({
'grant_type': 'client_credentials',
'scope': 'private public create edit delete interact upload purchased stats',
}, quote_via=urllib.parse.quote))['access_token']
self.cache.store(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY, self._ios_oauth_token)
return self._ios_oauth_token
def _call_videos_api(self, video_id, unlisted_hash=None, **kwargs):
return self._download_json( return self._download_json(
join_nonempty(f'https://api.vimeo.com/videos/{video_id}', unlisted_hash, delim=':'), join_nonempty(f'https://api.vimeo.com/videos/{video_id}', unlisted_hash, delim=':'),
video_id, 'Downloading API JSON', headers={ video_id, 'Downloading API JSON', headers={
'Authorization': f'jwt {jwt_token}', 'Authorization': f'Bearer {self._fetch_oauth_token()}',
'Accept': 'application/json', **self._IOS_CLIENT_HEADERS,
}, query={ }, query={
'fields': ','.join(( 'fields': ','.join((
'config_url', 'created_time', 'description', 'download', 'license', 'config_url', 'embed_player_config_url', 'player_embed_url', 'download', 'play',
'metadata.connections.comments.total', 'metadata.connections.likes.total', 'files', 'description', 'license', 'release_time', 'created_time', 'stats.plays',
'release_time', 'stats.plays')), 'metadata.connections.comments.total', 'metadata.connections.likes.total')),
}, **kwargs) }, **kwargs)
def _extract_original_format(self, url, video_id, unlisted_hash=None, jwt=None, api_data=None): def _extract_original_format(self, url, video_id, unlisted_hash=None, api_data=None):
# Original/source formats are only available when logged in # Original/source formats are only available when logged in
if not self._get_cookies('https://vimeo.com/').get('vimeo'): if not self._get_cookies('https://vimeo.com/').get('vimeo'):
return return
@ -283,12 +313,8 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'quality': 1, 'quality': 1,
} }
jwt = jwt or traverse_obj(self._download_json(
'https://vimeo.com/_rv/viewer', video_id, 'Downloading jwt token', fatal=False), ('jwt', {str}))
if not jwt:
return
original_response = api_data or self._call_videos_api( original_response = api_data or self._call_videos_api(
video_id, jwt, unlisted_hash, fatal=False, expected_status=(403, 404)) video_id, unlisted_hash, fatal=False, expected_status=(403, 404))
for download_data in traverse_obj(original_response, ('download', ..., {dict})): for download_data in traverse_obj(original_response, ('download', ..., {dict})):
download_url = download_data.get('link') download_url = download_data.get('link')
if not download_url or download_data.get('quality') != 'source': if not download_url or download_data.get('quality') != 'source':
@ -410,6 +436,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 10, 'duration': 10,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d', 'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
}, },
'params': { 'params': {
@ -500,15 +527,16 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'The DMCI', 'uploader': 'The DMCI',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/dmci', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/dmci',
'uploader_id': 'dmci', 'uploader_id': 'dmci',
'timestamp': 1324343742, 'timestamp': 1324361742,
'upload_date': '20111220', 'upload_date': '20111220',
'description': 'md5:ae23671e82d05415868f7ad1aec21147', 'description': 'md5:f37b4ad0f3ded6fa16f38ecde16c3c44',
'duration': 60, 'duration': 60,
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d', 'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d',
'like_count': int, 'like_count': int,
'tags': 'count:11', 'release_timestamp': 1324361742,
'release_date': '20111220',
}, },
# 'params': {'format': 'Original'}, # 'params': {'format': 'Original'},
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -521,15 +549,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '393756517', 'id': '393756517',
# 'ext': 'mov', # 'ext': 'mov',
'ext': 'mp4', 'ext': 'mp4',
'timestamp': 1582642091, 'timestamp': 1582660091,
'uploader_id': 'frameworkla', 'uploader_id': 'frameworkla',
'title': 'Straight To Hell - Sabrina: Netflix', 'title': 'Straight To Hell - Sabrina: Netflix',
'uploader': 'Framework Studio', 'uploader': 'Framework Studio',
'description': 'md5:f2edc61af3ea7a5592681ddbb683db73',
'upload_date': '20200225', 'upload_date': '20200225',
'duration': 176, 'duration': 176,
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d', 'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d',
'uploader_url': 'https://vimeo.com/frameworkla', 'uploader_url': 'https://vimeo.com/frameworkla',
'comment_count': int,
'like_count': int,
'release_timestamp': 1582660091,
'release_date': '20200225',
}, },
# 'params': {'format': 'source'}, # 'params': {'format': 'source'},
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -630,7 +661,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': str, # FIXME: Dynamic SEO spam description 'description': str, # FIXME: Dynamic SEO spam description
'upload_date': '20150209', 'upload_date': '20150209',
'timestamp': 1423518307, 'timestamp': 1423518307,
'thumbnail': 'https://i.vimeocdn.com/video/default', 'thumbnail': r're:https://i\.vimeocdn\.com/video/default',
'duration': 10, 'duration': 10,
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/user20132939', 'uploader_url': 'https://vimeo.com/user20132939',
@ -667,6 +698,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/aliniamedia', 'uploader_url': 'https://vimeo.com/aliniamedia',
'release_date': '20160329', 'release_date': '20160329',
'view_count': int,
}, },
'params': {'skip_download': True}, 'params': {'skip_download': True},
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -678,18 +710,19 @@ class VimeoIE(VimeoBaseInfoExtractor):
# 'ext': 'm4v', # 'ext': 'm4v',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Eastnor Castle 2015 Firework Champions - The Promo!', 'title': 'Eastnor Castle 2015 Firework Champions - The Promo!',
'description': 'md5:5967e090768a831488f6e74b7821b3c1', 'description': 'md5:9441e6829ae94f380cc6417d982f63ac',
'uploader_id': 'fireworkchampions', 'uploader_id': 'fireworkchampions',
'uploader': 'Firework Champions', 'uploader': 'Firework Champions',
'upload_date': '20150910', 'upload_date': '20150910',
'timestamp': 1441901895, 'timestamp': 1441916295,
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d', 'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d',
'uploader_url': 'https://vimeo.com/fireworkchampions', 'uploader_url': 'https://vimeo.com/fireworkchampions',
'tags': 'count:6',
'duration': 229, 'duration': 229,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'release_timestamp': 1441916295,
'release_date': '20150910',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -820,7 +853,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Raja Virdi', 'uploader': 'Raja Virdi',
'uploader_id': 'rajavirdi', 'uploader_id': 'rajavirdi',
'uploader_url': 'https://vimeo.com/rajavirdi', 'uploader_url': 'https://vimeo.com/rajavirdi',
'duration': 309, 'duration': 300,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d', 'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d',
}, },
# 'params': {'format': 'source'}, # 'params': {'format': 'source'},
@ -860,12 +893,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
return checked return checked
def _extract_from_api(self, video_id, unlisted_hash=None): def _extract_from_api(self, video_id, unlisted_hash=None):
viewer = self._download_json(
'https://vimeo.com/_next/viewer', video_id, 'Downloading viewer info')
for retry in (False, True): for retry in (False, True):
try: try:
video = self._call_videos_api(video_id, viewer['jwt'], unlisted_hash) video = self._call_videos_api(video_id, unlisted_hash)
break break
except ExtractorError as e: except ExtractorError as e:
if (not retry and isinstance(e.cause, HTTPError) and e.cause.status == 400 if (not retry and isinstance(e.cause, HTTPError) and e.cause.status == 400
@ -873,15 +903,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False), self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False),
({json.loads}, 'invalid_parameters', ..., 'field'), ({json.loads}, 'invalid_parameters', ..., 'field'),
)): )):
self._verify_video_password( self._verify_video_password(video_id)
video_id, self._get_video_password(), viewer['xsrft'])
continue continue
raise raise
info = self._parse_config(self._download_json( info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id) video['config_url'], video_id), video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(
f'https://vimeo.com/{video_id}', video_id, unlisted_hash, jwt=viewer['jwt'], api_data=video) f'https://vimeo.com/{video_id}', video_id, unlisted_hash, api_data=video)
if source_format: if source_format:
info['formats'].append(source_format) info['formats'].append(source_format)
@ -1122,7 +1151,7 @@ class VimeoOndemandIE(VimeoIE): # XXX: Do not subclass from concrete IE
'description': 'md5:aeeba3dbd4d04b0fa98a4fdc9c639998', 'description': 'md5:aeeba3dbd4d04b0fa98a4fdc9c639998',
'upload_date': '20140906', 'upload_date': '20140906',
'timestamp': 1410032453, 'timestamp': 1410032453,
'thumbnail': 'https://i.vimeocdn.com/video/488238335-d7bf151c364cff8d467f1b73784668fe60aae28a54573a35d53a1210ae283bd8-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'comment_count': int, 'comment_count': int,
'license': 'https://creativecommons.org/licenses/by-nc-nd/3.0/', 'license': 'https://creativecommons.org/licenses/by-nc-nd/3.0/',
'duration': 53, 'duration': 53,
@ -1132,7 +1161,7 @@ class VimeoOndemandIE(VimeoIE): # XXX: Do not subclass from concrete IE
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
}, },
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
}, { }, {
# requires Referer to be passed along with og:video:url # requires Referer to be passed along with og:video:url
'url': 'https://vimeo.com/ondemand/36938/126682985', 'url': 'https://vimeo.com/ondemand/36938/126682985',
@ -1149,13 +1178,14 @@ class VimeoOndemandIE(VimeoIE): # XXX: Do not subclass from concrete IE
'duration': 121, 'duration': 121,
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/517077723-7066ae1d9a79d3eb361334fb5d58ec13c8f04b52f8dd5eadfbd6fb0bcf11f613-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'like_count': int, 'like_count': int,
'tags': 'count:5',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
}, { }, {
'url': 'https://vimeo.com/ondemand/nazmaalik', 'url': 'https://vimeo.com/ondemand/nazmaalik',
'only_matching': True, 'only_matching': True,
@ -1237,7 +1267,7 @@ class VimeoUserIE(VimeoChannelIE): # XXX: Do not subclass from concrete IE
_TESTS = [{ _TESTS = [{
'url': 'https://vimeo.com/nkistudio/videos', 'url': 'https://vimeo.com/nkistudio/videos',
'info_dict': { 'info_dict': {
'title': 'Nki', 'title': 'AKAMA',
'id': 'nkistudio', 'id': 'nkistudio',
}, },
'playlist_mincount': 66, 'playlist_mincount': 66,
@ -1370,10 +1400,10 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
'uploader_id': 'user170863801', 'uploader_id': 'user170863801',
'uploader_url': 'https://vimeo.com/user170863801', 'uploader_url': 'https://vimeo.com/user170863801',
'duration': 30, 'duration': 30,
'thumbnail': 'https://i.vimeocdn.com/video/1912612821-09a43bd2e75c203d503aed89de7534f28fc4474a48f59c51999716931a246af5-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
}, { }, {
'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d', 'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d',
'md5': 'c507a72f780cacc12b2248bb4006d253', 'md5': 'c507a72f780cacc12b2248bb4006d253',
@ -1423,12 +1453,8 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
user, video_id, review_hash = self._match_valid_url(url).group('user', 'id', 'hash') user, video_id, review_hash = self._match_valid_url(url).group('user', 'id', 'hash')
data_url = f'https://vimeo.com/{user}/review/data/{video_id}/{review_hash}' data_url = f'https://vimeo.com/{user}/review/data/{video_id}/{review_hash}'
data = self._download_json(data_url, video_id) data = self._download_json(data_url, video_id)
viewer = {}
if data.get('isLocked') is True: if data.get('isLocked') is True:
video_password = self._get_video_password() self._verify_video_password(video_id)
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
self._verify_video_password(video_id, video_password, viewer['xsrft'])
data = self._download_json(data_url, video_id) data = self._download_json(data_url, video_id)
clip_data = data['clipData'] clip_data = data['clipData']
config_url = clip_data['configUrl'] config_url = clip_data['configUrl']
@ -1436,7 +1462,7 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(
f'https://vimeo.com/{user}/review/{video_id}/{review_hash}/action', f'https://vimeo.com/{user}/review/{video_id}/{review_hash}/action',
video_id, unlisted_hash=clip_data.get('unlistedHash'), jwt=viewer.get('jwt')) video_id, unlisted_hash=clip_data.get('unlistedHash'))
if source_format: if source_format:
info_dict['formats'].append(source_format) info_dict['formats'].append(source_format)
info_dict['description'] = clean_html(clip_data.get('description')) info_dict['description'] = clean_html(clip_data.get('description'))
@ -1528,20 +1554,22 @@ class VimeoProIE(VimeoBaseInfoExtractor):
'uploader_id': 'openstreetmapus', 'uploader_id': 'openstreetmapus',
'uploader': 'OpenStreetMap US', 'uploader': 'OpenStreetMap US',
'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography', 'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
'description': 'md5:2c362968038d4499f4d79f88458590c1', 'description': 'md5:8cf69a1a435f2d763f4adf601e9c3125',
'duration': 1595, 'duration': 1595,
'upload_date': '20130610', 'upload_date': '20130610',
'timestamp': 1370893156, 'timestamp': 1370907556,
'license': 'by', 'license': 'by',
'thumbnail': 'https://i.vimeocdn.com/video/440260469-19b0d92fca3bd84066623b53f1eb8aaa3980c6c809e2d67b6b39ab7b4a77a344-d_960', 'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'tags': 'count:1', 'release_timestamp': 1370907556,
'release_date': '20130610',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
}, },
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, { }, {
# password-protected VimeoPro page with Vimeo player embed # password-protected VimeoPro page with Vimeo player embed
'url': 'https://vimeopro.com/cadfem/simulation-conference-mechanische-systeme-in-perfektion', 'url': 'https://vimeopro.com/cadfem/simulation-conference-mechanische-systeme-in-perfektion',
@ -1549,7 +1577,7 @@ class VimeoProIE(VimeoBaseInfoExtractor):
'id': '764543723', 'id': '764543723',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Mechanische Systeme in Perfektion: Realität erfassen, Innovation treiben', 'title': 'Mechanische Systeme in Perfektion: Realität erfassen, Innovation treiben',
'thumbnail': 'https://i.vimeocdn.com/video/1543784598-a1a750494a485e601110136b9fe11e28c2131942452b3a5d30391cb3800ca8fd-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'description': 'md5:2a9d195cd1b0f6f79827107dc88c2420', 'description': 'md5:2a9d195cd1b0f6f79827107dc88c2420',
'uploader': 'CADFEM', 'uploader': 'CADFEM',
'uploader_id': 'cadfem', 'uploader_id': 'cadfem',
@ -1561,6 +1589,7 @@ class VimeoProIE(VimeoBaseInfoExtractor):
'videopassword': 'Conference2022', 'videopassword': 'Conference2022',
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to parse XML: not well-formed'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -300,6 +300,24 @@ class VKIE(VKBaseIE):
'upload_date': '20250130', 'upload_date': '20250130',
}, },
}, },
{
'url': 'https://vkvideo.ru/video-50883936_456244102',
'info_dict': {
'id': '-50883936_456244102',
'ext': 'mp4',
'title': 'Добивание Украины // Техник в коме // МОЯ ЗЛОСТЬ №140',
'description': 'md5:a9bc46181e9ebd0fdd82cef6c0191140',
'uploader': 'Стас Ай, Как Просто!',
'uploader_id': '-50883936',
'comment_count': int,
'like_count': int,
'duration': 4651,
'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:59',
'timestamp': 1743333869,
'upload_date': '20250330',
},
},
{ {
# live stream, hls and rtmp links, most likely already finished live # live stream, hls and rtmp links, most likely already finished live
# stream by the time you are reading this comment # stream by the time you are reading this comment
@ -540,7 +558,7 @@ class VKIE(VKBaseIE):
'title': ('md_title', {unescapeHTML}), 'title': ('md_title', {unescapeHTML}),
'description': ('description', {clean_html}, filter), 'description': ('description', {clean_html}, filter),
'thumbnail': ('jpg', {url_or_none}), 'thumbnail': ('jpg', {url_or_none}),
'uploader': ('md_author', {str}), 'uploader': ('md_author', {unescapeHTML}),
'uploader_id': (('author_id', 'authorId'), {str_or_none}, any), 'uploader_id': (('author_id', 'authorId'), {str_or_none}, any),
'duration': ('duration', {int_or_none}), 'duration': ('duration', {int_or_none}),
'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), { 'chapters': ('time_codes', lambda _, v: isinstance(v['time'], int), {

View File

@ -2,9 +2,11 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
join_nonempty,
try_get, try_get,
unified_strdate, unified_strdate,
) )
from ..utils.traversal import traverse_obj
class WatIE(InfoExtractor): class WatIE(InfoExtractor):
@ -70,8 +72,14 @@ class WatIE(InfoExtractor):
error_desc = video_info.get('error_desc') error_desc = video_info.get('error_desc')
if error_desc: if error_desc:
if video_info.get('error_code') == 'GEOBLOCKED': error_code = video_info.get('error_code')
if error_code == 'GEOBLOCKED':
self.raise_geo_restricted(error_desc, video_info.get('geoList')) self.raise_geo_restricted(error_desc, video_info.get('geoList'))
elif error_code == 'DELIVERY_ERROR':
if traverse_obj(video_data, ('delivery', 'code')) == 500:
self.report_drm(video_id)
error_desc = join_nonempty(
error_desc, traverse_obj(video_data, ('delivery', 'error', {str})), delim=': ')
raise ExtractorError(error_desc, expected=True) raise ExtractorError(error_desc, expected=True)
title = video_info['title'] title = video_info['title']

View File

@ -290,12 +290,14 @@ class WeverseIE(WeverseBaseIE):
elif live_status == 'is_live': elif live_status == 'is_live':
video_info = self._call_api( video_info = self._call_api(
f'/video/v1.2/lives/{api_video_id}/playInfo?preview.format=json&preview.version=v2', f'/video/v1.3/lives/{api_video_id}/playInfo?preview.format=json&preview.version=v2',
video_id, note='Downloading live JSON') video_id, note='Downloading live JSON')
playback = self._parse_json(video_info['lipPlayback'], video_id) playback = self._parse_json(video_info['lipPlayback'], video_id)
m3u8_url = traverse_obj(playback, ( m3u8_url = traverse_obj(playback, (
'media', lambda _, v: v['protocol'] == 'HLS', 'path', {url_or_none}), get_all=False) 'media', lambda _, v: v['protocol'] == 'HLS', 'path', {url_or_none}), get_all=False)
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', live=True) # Live subtitles are not downloadable, but extract to silence "ignoring subs" warning
formats, _ = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', live=True)
elif live_status == 'post_live': elif live_status == 'post_live':
if availability in ('premium_only', 'subscriber_only'): if availability in ('premium_only', 'subscriber_only'):

View File

@ -417,6 +417,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_NETRC_MACHINE = 'youtube' _NETRC_MACHINE = 'youtube'
_COOKIE_HOWTO_WIKI_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies'
def ucid_or_none(self, ucid): def ucid_or_none(self, ucid):
return self._search_regex(rf'^({self._YT_CHANNEL_UCID_RE})$', ucid, 'UC-id', default=None) return self._search_regex(rf'^({self._YT_CHANNEL_UCID_RE})$', ucid, 'UC-id', default=None)
@ -451,17 +453,15 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return preferred_lang return preferred_lang
def _initialize_consent(self): def _initialize_consent(self):
cookies = self._get_cookies('https://www.youtube.com/') if self._has_auth_cookies:
if cookies.get('__Secure-3PSID'):
return return
socs = cookies.get('SOCS') socs = self._youtube_cookies.get('SOCS')
if socs and not socs.value.startswith('CAA'): # not consented if socs and not socs.value.startswith('CAA'): # not consented
return return
self._set_cookie('.youtube.com', 'SOCS', 'CAI', secure=True) # accept all (required for mixes) self._set_cookie('.youtube.com', 'SOCS', 'CAI', secure=True) # accept all (required for mixes)
def _initialize_pref(self): def _initialize_pref(self):
cookies = self._get_cookies('https://www.youtube.com/') pref_cookie = self._youtube_cookies.get('PREF')
pref_cookie = cookies.get('PREF')
pref = {} pref = {}
if pref_cookie: if pref_cookie:
try: try:
@ -472,8 +472,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
self._set_cookie('.youtube.com', name='PREF', value=urllib.parse.urlencode(pref)) self._set_cookie('.youtube.com', name='PREF', value=urllib.parse.urlencode(pref))
def _initialize_cookie_auth(self): def _initialize_cookie_auth(self):
yt_sapisid, yt_1psapisid, yt_3psapisid = self._get_sid_cookies() self._passed_auth_cookies = False
if yt_sapisid or yt_1psapisid or yt_3psapisid: if self._has_auth_cookies:
self._passed_auth_cookies = True
self.write_debug('Found YouTube account cookies') self.write_debug('Found YouTube account cookies')
def _real_initialize(self): def _real_initialize(self):
@ -492,8 +493,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
@property @property
def _youtube_login_hint(self): def _youtube_login_hint(self):
return (f'{self._login_hint(method="cookies")}. Also see ' return (f'{self._login_hint(method="cookies")}. Also see {self._COOKIE_HOWTO_WIKI_URL} '
'https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies '
'for tips on effectively exporting YouTube cookies') 'for tips on effectively exporting YouTube cookies')
def _check_login_required(self): def _check_login_required(self):
@ -553,12 +553,16 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return f'{scheme} {"_".join(parts)}' return f'{scheme} {"_".join(parts)}'
@property
def _youtube_cookies(self):
return self._get_cookies('https://www.youtube.com')
def _get_sid_cookies(self): def _get_sid_cookies(self):
""" """
Get SAPISID, 1PSAPISID, 3PSAPISID cookie values Get SAPISID, 1PSAPISID, 3PSAPISID cookie values
@returns sapisid, 1psapisid, 3psapisid @returns sapisid, 1psapisid, 3psapisid
""" """
yt_cookies = self._get_cookies('https://www.youtube.com') yt_cookies = self._youtube_cookies
yt_sapisid = try_call(lambda: yt_cookies['SAPISID'].value) yt_sapisid = try_call(lambda: yt_cookies['SAPISID'].value)
yt_3papisid = try_call(lambda: yt_cookies['__Secure-3PAPISID'].value) yt_3papisid = try_call(lambda: yt_cookies['__Secure-3PAPISID'].value)
yt_1papisid = try_call(lambda: yt_cookies['__Secure-1PAPISID'].value) yt_1papisid = try_call(lambda: yt_cookies['__Secure-1PAPISID'].value)
@ -595,6 +599,31 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return ' '.join(authorizations) return ' '.join(authorizations)
@property
def is_authenticated(self):
return self._has_auth_cookies
@property
def _has_auth_cookies(self):
yt_sapisid, yt_1psapisid, yt_3psapisid = self._get_sid_cookies()
# YouTube doesn't appear to clear 3PSAPISID when rotating cookies (as of 2025-04-26)
# But LOGIN_INFO is cleared and should exist if logged in
has_login_info = 'LOGIN_INFO' in self._youtube_cookies
return bool(has_login_info and (yt_sapisid or yt_1psapisid or yt_3psapisid))
def _request_webpage(self, *args, **kwargs):
response = super()._request_webpage(*args, **kwargs)
# Check that we are still logged-in and cookies have not rotated after every request
if getattr(self, '_passed_auth_cookies', None) and not self._has_auth_cookies:
self.report_warning(
'The provided YouTube account cookies are no longer valid. '
'They have likely been rotated in the browser as a security measure. '
f'For tips on how to effectively export YouTube cookies, refer to {self._COOKIE_HOWTO_WIKI_URL} .',
only_once=False)
return response
def _call_api(self, ep, query, video_id, fatal=True, headers=None, def _call_api(self, ep, query, video_id, fatal=True, headers=None,
note='Downloading API JSON', errnote='Unable to download API page', note='Downloading API JSON', errnote='Unable to download API page',
context=None, api_key=None, api_hostname=None, default_client='web'): context=None, api_key=None, api_hostname=None, default_client='web'):
@ -695,10 +724,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
args, [('VISITOR_DATA', ('INNERTUBE_CONTEXT', 'client', 'visitorData'), ('responseContext', 'visitorData'))], args, [('VISITOR_DATA', ('INNERTUBE_CONTEXT', 'client', 'visitorData'), ('responseContext', 'visitorData'))],
expected_type=str) expected_type=str)
@functools.cached_property
def is_authenticated(self):
return bool(self._get_sid_authorization_header())
def extract_ytcfg(self, video_id, webpage): def extract_ytcfg(self, video_id, webpage):
if not webpage: if not webpage:
return {} return {}

View File

@ -37,6 +37,7 @@ class YoutubeClipIE(YoutubeTabBaseInfoExtractor):
'chapters': 'count:20', 'chapters': 'count:20',
'comment_count': int, 'comment_count': int,
'heatmap': 'count:100', 'heatmap': 'count:100',
'media_type': 'clip',
}, },
}] }]
@ -59,6 +60,7 @@ class YoutubeClipIE(YoutubeTabBaseInfoExtractor):
'url': f'https://www.youtube.com/watch?v={video_id}', 'url': f'https://www.youtube.com/watch?v={video_id}',
'ie_key': YoutubeIE.ie_key(), 'ie_key': YoutubeIE.ie_key(),
'id': clip_id, 'id': clip_id,
'media_type': 'clip',
'section_start': int(clip_data['startTimeMs']) / 1000, 'section_start': int(clip_data['startTimeMs']) / 1000,
'section_end': int(clip_data['endTimeMs']) / 1000, 'section_end': int(clip_data['endTimeMs']) / 1000,
'_format_sort_fields': ( # https protocol is prioritized for ffmpeg compatibility '_format_sort_fields': ( # https protocol is prioritized for ffmpeg compatibility

View File

@ -35,6 +35,7 @@ class YoutubeYtBeIE(YoutubeBaseInfoExtractor):
'duration': 59, 'duration': 59,
'comment_count': int, 'comment_count': int,
'channel_follower_count': int, 'channel_follower_count': int,
'media_type': 'short',
}, },
'params': { 'params': {
'noplaylist': True, 'noplaylist': True,

View File

@ -379,6 +379,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Afrojack', 'uploader': 'Afrojack',
'uploader_url': 'https://www.youtube.com/@Afrojack', 'uploader_url': 'https://www.youtube.com/@Afrojack',
'uploader_id': '@Afrojack', 'uploader_id': '@Afrojack',
'media_type': 'video',
}, },
'params': { 'params': {
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
@ -416,10 +417,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1401991663, 'timestamp': 1401991663,
'media_type': 'video',
}, },
}, },
{ {
'note': 'Age-gate video with embed allowed in public site', 'note': 'Formerly an age-gate video with embed allowed in public site',
'url': 'https://youtube.com/watch?v=HsUATh_Nc2U', 'url': 'https://youtube.com/watch?v=HsUATh_Nc2U',
'info_dict': { 'info_dict': {
'id': 'HsUATh_Nc2U', 'id': 'HsUATh_Nc2U',
@ -427,8 +429,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'Godzilla 2 (Official Video)', 'title': 'Godzilla 2 (Official Video)',
'description': 'md5:bf77e03fcae5529475e500129b05668a', 'description': 'md5:bf77e03fcae5529475e500129b05668a',
'upload_date': '20200408', 'upload_date': '20200408',
'age_limit': 18, 'age_limit': 0,
'availability': 'needs_auth', 'availability': 'public',
'channel_id': 'UCYQT13AtrJC0gsM1far_zJg', 'channel_id': 'UCYQT13AtrJC0gsM1far_zJg',
'channel': 'FlyingKitty', 'channel': 'FlyingKitty',
'channel_url': 'https://www.youtube.com/channel/UCYQT13AtrJC0gsM1far_zJg', 'channel_url': 'https://www.youtube.com/channel/UCYQT13AtrJC0gsM1far_zJg',
@ -446,8 +448,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@FlyingKitty900', 'uploader_id': '@FlyingKitty900',
'comment_count': int, 'comment_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
'media_type': 'video',
}, },
'skip': 'Age-restricted; requires authentication',
}, },
{ {
'note': 'Age-gate video embedable only with clientScreen=EMBED', 'note': 'Age-gate video embedable only with clientScreen=EMBED',
@ -510,6 +512,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Herr Lurik', 'uploader': 'Herr Lurik',
'uploader_url': 'https://www.youtube.com/@HerrLurik', 'uploader_url': 'https://www.youtube.com/@HerrLurik',
'uploader_id': '@HerrLurik', 'uploader_id': '@HerrLurik',
'media_type': 'video',
}, },
}, },
{ {
@ -549,6 +552,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'deadmau5', 'uploader': 'deadmau5',
'uploader_url': 'https://www.youtube.com/@deadmau5', 'uploader_url': 'https://www.youtube.com/@deadmau5',
'uploader_id': '@deadmau5', 'uploader_id': '@deadmau5',
'media_type': 'video',
}, },
'expected_warnings': [ 'expected_warnings': [
'DASH manifest missing', 'DASH manifest missing',
@ -584,6 +588,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@Olympics', 'uploader_id': '@Olympics',
'channel_is_verified': True, 'channel_is_verified': True,
'timestamp': 1440707674, 'timestamp': 1440707674,
'media_type': 'livestream',
}, },
'params': { 'params': {
'skip_download': 'requires avconv', 'skip_download': 'requires avconv',
@ -618,6 +623,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': 'https://www.youtube.com/@AllenMeow', 'uploader_url': 'https://www.youtube.com/@AllenMeow',
'uploader_id': '@AllenMeow', 'uploader_id': '@AllenMeow',
'timestamp': 1299776999, 'timestamp': 1299776999,
'media_type': 'video',
}, },
}, },
# url_encoded_fmt_stream_map is empty string # url_encoded_fmt_stream_map is empty string
@ -812,6 +818,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'like_count': int, 'like_count': int,
'age_limit': 0, 'age_limit': 0,
'channel_follower_count': int, 'channel_follower_count': int,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -871,6 +878,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@BKCHarvard', 'uploader_id': '@BKCHarvard',
'uploader_url': 'https://www.youtube.com/@BKCHarvard', 'uploader_url': 'https://www.youtube.com/@BKCHarvard',
'timestamp': 1422422076, 'timestamp': 1422422076,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -907,6 +915,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1447987198, 'timestamp': 1447987198,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -971,6 +980,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'comment_count': int, 'comment_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
'timestamp': 1484761047, 'timestamp': 1484761047,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1073,6 +1083,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tags': 'count:11', 'tags': 'count:11',
'live_status': 'not_live', 'live_status': 'not_live',
'channel_follower_count': int, 'channel_follower_count': int,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1127,6 +1138,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': 'https://www.youtube.com/@ElevageOrVert', 'uploader_url': 'https://www.youtube.com/@ElevageOrVert',
'uploader_id': '@ElevageOrVert', 'uploader_id': '@ElevageOrVert',
'timestamp': 1497343210, 'timestamp': 1497343210,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1166,6 +1178,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1377976349, 'timestamp': 1377976349,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1210,6 +1223,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_follower_count': int, 'channel_follower_count': int,
'uploader': 'The Cinematic Orchestra', 'uploader': 'The Cinematic Orchestra',
'comment_count': int, 'comment_count': int,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1278,6 +1292,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': 'https://www.youtube.com/@walkaroundjapan7124', 'uploader_url': 'https://www.youtube.com/@walkaroundjapan7124',
'uploader_id': '@walkaroundjapan7124', 'uploader_id': '@walkaroundjapan7124',
'timestamp': 1605884416, 'timestamp': 1605884416,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1374,6 +1389,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1395685455, 'timestamp': 1395685455,
'media_type': 'video',
}, 'params': {'format': 'mhtml', 'skip_download': True}, }, 'params': {'format': 'mhtml', 'skip_download': True},
}, { }, {
# Ensure video upload_date is in UTC timezone (video was uploaded 1641170939) # Ensure video upload_date is in UTC timezone (video was uploaded 1641170939)
@ -1404,6 +1420,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@LeonNguyen', 'uploader_id': '@LeonNguyen',
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1641170939, 'timestamp': 1641170939,
'media_type': 'video',
}, },
}, { }, {
# date text is premiered video, ensure upload date in UTC (published 1641172509) # date text is premiered video, ensure upload date in UTC (published 1641172509)
@ -1437,6 +1454,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1641172509, 'timestamp': 1641172509,
'media_type': 'video',
}, },
}, },
{ # continuous livestream. { # continuous livestream.
@ -1498,6 +1516,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Lesmiscore', 'uploader': 'Lesmiscore',
'uploader_url': 'https://www.youtube.com/@lesmiscore', 'uploader_url': 'https://www.youtube.com/@lesmiscore',
'timestamp': 1648005313, 'timestamp': 1648005313,
'media_type': 'short',
}, },
}, { }, {
# Prefer primary title+description language metadata by default # Prefer primary title+description language metadata by default
@ -1526,6 +1545,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@coletdjnz', 'uploader_id': '@coletdjnz',
'uploader': 'cole-dlp-test-acc', 'uploader': 'cole-dlp-test-acc',
'timestamp': 1662677394, 'timestamp': 1662677394,
'media_type': 'video',
}, },
'params': {'skip_download': True}, 'params': {'skip_download': True},
}, { }, {
@ -1554,6 +1574,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'cole-dlp-test-acc', 'uploader': 'cole-dlp-test-acc',
'timestamp': 1659073275, 'timestamp': 1659073275,
'like_count': int, 'like_count': int,
'media_type': 'video',
}, },
'params': {'skip_download': True, 'extractor_args': {'youtube': {'lang': ['fr']}}}, 'params': {'skip_download': True, 'extractor_args': {'youtube': {'lang': ['fr']}}},
'expected_warnings': [r'Preferring "fr" translated fields'], 'expected_warnings': [r'Preferring "fr" translated fields'],
@ -1590,6 +1611,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'comment_count': int, 'comment_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'media_type': 'video',
}, },
'params': {'extractor_args': {'youtube': {'player_client': ['ios']}}, 'format': '233-1'}, 'params': {'extractor_args': {'youtube': {'player_client': ['ios']}}, 'format': '233-1'},
}, { }, {
@ -1690,6 +1712,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'comment_count': int, 'comment_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
'heatmap': 'count:100', 'heatmap': 'count:100',
'media_type': 'video',
}, },
'params': { 'params': {
'extractor_args': {'youtube': {'player_client': ['ios'], 'player_skip': ['webpage']}}, 'extractor_args': {'youtube': {'player_client': ['ios'], 'player_skip': ['webpage']}},
@ -1722,6 +1745,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_follower_count': int, 'channel_follower_count': int,
'categories': ['People & Blogs'], 'categories': ['People & Blogs'],
'tags': [], 'tags': [],
'media_type': 'short',
}, },
}, },
] ]
@ -1757,6 +1781,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@ChristopherSykesDocumentaries', 'uploader_id': '@ChristopherSykesDocumentaries',
'heatmap': 'count:100', 'heatmap': 'count:100',
'timestamp': 1211825920, 'timestamp': 1211825920,
'media_type': 'video',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -1827,6 +1852,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
retry.error = f'Cannot find refreshed manifest for format {format_id}{bug_reports_message()}' retry.error = f'Cannot find refreshed manifest for format {format_id}{bug_reports_message()}'
continue continue
# Formats from ended premieres will be missing a manifest_url
# See https://github.com/yt-dlp/yt-dlp/issues/8543
if not f.get('manifest_url'):
break
return f['manifest_url'], f['manifest_stream_number'], is_live return f['manifest_url'], f['manifest_stream_number'], is_live
return None return None
@ -1990,7 +2021,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _player_js_cache_key(self, player_url): def _player_js_cache_key(self, player_url):
player_id = self._extract_player_info(player_url) player_id = self._extract_player_info(player_url)
player_path = remove_start(urllib.parse.urlparse(player_url).path, f'/s/player/{player_id}/') player_path = remove_start(urllib.parse.urlparse(player_url).path, f'/s/player/{player_id}/')
variant = self._INVERSE_PLAYER_JS_VARIANT_MAP.get(player_path) variant = self._INVERSE_PLAYER_JS_VARIANT_MAP.get(player_path) or next((
v for k, v in self._INVERSE_PLAYER_JS_VARIANT_MAP.items()
if re.fullmatch(re.escape(k).replace('en_US', r'[a-zA-Z0-9_]+'), player_path)), None)
if not variant: if not variant:
self.write_debug( self.write_debug(
f'Unable to determine player JS variant\n' f'Unable to determine player JS variant\n'
@ -2128,23 +2161,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return ret return ret
return inner return inner
def _load_nsig_code_from_cache(self, player_url): def _load_player_data_from_cache(self, name, player_url):
cache_id = ('youtube-nsig', self._player_js_cache_key(player_url)) cache_id = (f'youtube-{name}', self._player_js_cache_key(player_url))
if func_code := self._player_cache.get(cache_id): if data := self._player_cache.get(cache_id):
return func_code return data
func_code = self.cache.load(*cache_id, min_ver='2025.03.31') data = self.cache.load(*cache_id, min_ver='2025.03.31')
if func_code: if data:
self._player_cache[cache_id] = func_code self._player_cache[cache_id] = data
return func_code return data
def _store_nsig_code_to_cache(self, player_url, func_code): def _store_player_data_to_cache(self, name, player_url, data):
cache_id = ('youtube-nsig', self._player_js_cache_key(player_url)) cache_id = (f'youtube-{name}', self._player_js_cache_key(player_url))
if cache_id not in self._player_cache: if cache_id not in self._player_cache:
self.cache.store(*cache_id, func_code) self.cache.store(*cache_id, data)
self._player_cache[cache_id] = func_code self._player_cache[cache_id] = data
def _decrypt_signature(self, s, video_id, player_url): def _decrypt_signature(self, s, video_id, player_url):
"""Turn the encrypted s field into a working signature""" """Turn the encrypted s field into a working signature"""
@ -2187,7 +2220,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.write_debug(f'Decrypted nsig {s} => {ret}') self.write_debug(f'Decrypted nsig {s} => {ret}')
# Only cache nsig func JS code to disk if successful, and only once # Only cache nsig func JS code to disk if successful, and only once
self._store_nsig_code_to_cache(player_url, func_code) self._store_player_data_to_cache('nsig', player_url, func_code)
return ret return ret
def _extract_n_function_name(self, jscode, player_url=None): def _extract_n_function_name(self, jscode, player_url=None):
@ -2306,7 +2339,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _extract_n_function_code(self, video_id, player_url): def _extract_n_function_code(self, video_id, player_url):
player_id = self._extract_player_info(player_url) player_id = self._extract_player_info(player_url)
func_code = self._load_nsig_code_from_cache(player_url) func_code = self._load_player_data_from_cache('nsig', player_url)
jscode = func_code or self._load_player(video_id, player_url) jscode = func_code or self._load_player(video_id, player_url)
jsi = JSInterpreter(jscode) jsi = JSInterpreter(jscode)
@ -2342,23 +2375,27 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
Extract signatureTimestamp (sts) Extract signatureTimestamp (sts)
Required to tell API what sig/player version is in use. Required to tell API what sig/player version is in use.
""" """
sts = None if sts := traverse_obj(ytcfg, ('STS', {int_or_none})):
if isinstance(ytcfg, dict): return sts
sts = int_or_none(ytcfg.get('STS'))
if not player_url:
error_msg = 'Cannot extract signature timestamp without player url'
if fatal:
raise ExtractorError(error_msg)
self.report_warning(error_msg)
return None
sts = self._load_player_data_from_cache('sts', player_url)
if sts:
return sts
if code := self._load_player(video_id, player_url, fatal=fatal):
sts = int_or_none(self._search_regex(
r'(?:signatureTimestamp|sts)\s*:\s*(?P<sts>[0-9]{5})', code,
'JS player signature timestamp', group='sts', fatal=fatal))
if sts:
self._store_player_data_to_cache('sts', player_url, sts)
if not sts:
# Attempt to extract from player
if player_url is None:
error_msg = 'Cannot extract signature timestamp without player_url.'
if fatal:
raise ExtractorError(error_msg)
self.report_warning(error_msg)
return
code = self._load_player(video_id, player_url, fatal=fatal)
if code:
sts = int_or_none(self._search_regex(
r'(?:signatureTimestamp|sts)\s*:\s*(?P<sts>[0-9]{5})', code,
'JS player signature timestamp', group='sts', fatal=fatal))
return sts return sts
def _mark_watched(self, video_id, player_responses): def _mark_watched(self, video_id, player_responses):
@ -2897,13 +2934,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return po_token return po_token
def _fetch_po_token(self, client, **kwargs): def _fetch_po_token(self, client, **kwargs):
context = kwargs.get('context') context = kwargs.get('context')
# Avoid fetching PO Tokens when not required # Avoid fetching PO Tokens when not required
if not ( fetch_pot_policy = self._configuration_arg('fetch_pot', [''], ie_key=YoutubeIE)[0]
_PoTokenContext(context) in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS'] if fetch_pot_policy not in ('never', 'auto', 'always'):
or self._configuration_arg('fetch_pot', ['when_required'], ie_key=YoutubeIE)[0] == 'always' fetch_pot_policy = 'auto'
if (
fetch_pot_policy == 'never'
or (
fetch_pot_policy == 'auto'
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
)
): ):
return None return None
@ -3167,9 +3209,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
prs.append(pr) prs.append(pr)
# web_embedded can work around age-gate and age-verification for some embeddable videos
if self._is_agegated(pr) and variant != 'web_embedded':
append_client(f'web_embedded.{base_client}')
# Unauthenticated users will only get web_embedded client formats if age-gated
if self._is_agegated(pr) and not self.is_authenticated:
self.to_screen(
f'{video_id}: This video is age-restricted; some formats may be missing '
f'without authentication. {self._youtube_login_hint}', only_once=True)
# EU countries require age-verification for accounts to access age-restricted videos # EU countries require age-verification for accounts to access age-restricted videos
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients # If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
if self.is_authenticated and self._is_agegated(pr): embedding_is_disabled = variant == 'web_embedded' and self._is_unplayable(pr)
if self.is_authenticated and (self._is_agegated(pr) or embedding_is_disabled):
self.to_screen( self.to_screen(
f'{video_id}: This video is age-restricted and YouTube is requiring ' f'{video_id}: This video is age-restricted and YouTube is requiring '
'account age-verification; some formats may be missing', only_once=True) 'account age-verification; some formats may be missing', only_once=True)
@ -3296,12 +3348,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
fmt_url = url_or_none(try_get(sc, lambda x: x['url'][0])) fmt_url = url_or_none(try_get(sc, lambda x: x['url'][0]))
encrypted_sig = try_get(sc, lambda x: x['s'][0]) encrypted_sig = try_get(sc, lambda x: x['s'][0])
if not all((sc, fmt_url, player_url, encrypted_sig)): if not all((sc, fmt_url, player_url, encrypted_sig)):
self.report_warning( msg = f'Some {client_name} client https formats have been skipped as they are missing a url. '
f'Some {client_name} client https formats have been skipped as they are missing a url. ' if client_name == 'web':
f'{"Your account" if self.is_authenticated else "The current session"} may have ' msg += 'YouTube is forcing SABR streaming for this client. '
f'the SSAP (server-side ads) experiment which interferes with yt-dlp. ' else:
f'Please see https://github.com/yt-dlp/yt-dlp/issues/12482 for more details.', msg += (
video_id, only_once=True) f'YouTube may have enabled the SABR-only or Server-Side Ad Placement experiment for '
f'{"your account" if self.is_authenticated else "the current session"}. '
)
msg += 'See https://github.com/yt-dlp/yt-dlp/issues/12482 for more details'
self.report_warning(msg, video_id, only_once=True)
continue continue
try: try:
fmt_url += '&{}={}'.format( fmt_url += '&{}={}'.format(
@ -3388,8 +3444,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'width': int_or_none(fmt.get('width')), 'width': int_or_none(fmt.get('width')),
'language': join_nonempty(language_code, 'desc' if is_descriptive else '') or None, 'language': join_nonempty(language_code, 'desc' if is_descriptive else '') or None,
'language_preference': PREFERRED_LANG_VALUE if is_original else 5 if is_default else -10 if is_descriptive else -1, 'language_preference': PREFERRED_LANG_VALUE if is_original else 5 if is_default else -10 if is_descriptive else -1,
# Strictly de-prioritize broken, damaged and 3gp formats # Strictly de-prioritize damaged and 3gp formats
'preference': -20 if require_po_token else -10 if is_damaged else -2 if itag == '17' else None, 'preference': -10 if is_damaged else -2 if itag == '17' else None,
} }
mime_mobj = re.match( mime_mobj = re.match(
r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', fmt.get('mimeType') or '') r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', fmt.get('mimeType') or '')
@ -3712,6 +3768,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
reason = f'{remove_end(reason.strip(), ".")}. {self._youtube_login_hint}' reason = f'{remove_end(reason.strip(), ".")}. {self._youtube_login_hint}'
elif get_first(playability_statuses, ('errorScreen', 'playerCaptchaViewModel', {dict})): elif get_first(playability_statuses, ('errorScreen', 'playerCaptchaViewModel', {dict})):
reason += '. YouTube is requiring a captcha challenge before playback' reason += '. YouTube is requiring a captcha challenge before playback'
elif "This content isn't available, try again later" in reason:
reason = (
f'{remove_end(reason.strip(), ".")}. {"Your account" if self.is_authenticated else "The current session"} '
f'has been rate-limited by YouTube for up to an hour. It is recommended to use `-t sleep` to add a delay '
f'between video requests to avoid exceeding the rate limit. For more information, refer to '
f'https://github.com/yt-dlp/yt-dlp/wiki/Extractors#this-content-isnt-available-try-again-later'
)
self.raise_no_formats(reason, expected=True) self.raise_no_formats(reason, expected=True)
keywords = get_first(video_details, 'keywords', expected_type=list) or [] keywords = get_first(video_details, 'keywords', expected_type=list) or []
@ -3818,7 +3881,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tags': keywords, 'tags': keywords,
'playable_in_embed': get_first(playability_statuses, 'playableInEmbed'), 'playable_in_embed': get_first(playability_statuses, 'playableInEmbed'),
'live_status': live_status, 'live_status': live_status,
'media_type': 'livestream' if get_first(video_details, 'isLiveContent') else None, 'media_type': (
'livestream' if get_first(video_details, 'isLiveContent')
else 'short' if get_first(microformats, 'isShortsEligible')
else 'video'),
'release_timestamp': live_start_time, 'release_timestamp': live_start_time,
'_format_sort_fields': ( # source_preference is lower for potentially damaged formats '_format_sort_fields': ( # source_preference is lower for potentially damaged formats
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'), 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'),

View File

@ -41,7 +41,7 @@ import json
@register_provider @register_provider
class MyPoTokenProviderPTP(PoTokenProvider): # Provider name must end with "PTP" class MyPoTokenProviderPTP(PoTokenProvider): # Provider class name must end with "PTP"
PROVIDER_VERSION = '0.2.1' PROVIDER_VERSION = '0.2.1'
# Define a unique display name for the provider # Define a unique display name for the provider
PROVIDER_NAME = 'my-provider' PROVIDER_NAME = 'my-provider'
@ -51,15 +51,18 @@ class MyPoTokenProviderPTP(PoTokenProvider): # Provider name must end with "PTP
# Innertube Client Name. # Innertube Client Name.
# For example, "WEB", "ANDROID", "TVHTML5". # For example, "WEB", "ANDROID", "TVHTML5".
# For a list of WebPO client names, see yt_dlp.extractor.youtube.pot.utils.WEBPO_CLIENTS. # For a list of WebPO client names,
# Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS for a list of client names currently supported by the YouTube extractor. # see yt_dlp.extractor.youtube.pot.utils.WEBPO_CLIENTS.
# Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS
# for a list of client names currently supported by the YouTube extractor.
_SUPPORTED_CLIENTS = ('WEB', 'TVHTML5') _SUPPORTED_CLIENTS = ('WEB', 'TVHTML5')
_SUPPORTED_CONTEXTS = ( _SUPPORTED_CONTEXTS = (
PoTokenContext.GVS, PoTokenContext.GVS,
) )
# If your provider makes external requests to websites (i.e. to youtube.com) using another library or service (i.e., not _request_webpage), # If your provider makes external requests to websites (i.e. to youtube.com)
# using another library or service (i.e., not _request_webpage),
# set the request features that are supported here. # set the request features that are supported here.
# If only using _request_webpage to make external requests, set this to None. # If only using _request_webpage to make external requests, set this to None.
_SUPPORTED_EXTERNAL_REQUEST_FEATURES = ( _SUPPORTED_EXTERNAL_REQUEST_FEATURES = (
@ -84,16 +87,18 @@ class MyPoTokenProviderPTP(PoTokenProvider): # Provider name must end with "PTP
pass pass
def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse: def _real_request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
# If you need to validate the request before making the request to the external source # If you need to validate the request before making the request to the external source.
# Raise yt_dlp.extractor.youtube.pot.provider.PoTokenProviderRejectedRequest if the request is not supported # Raise yt_dlp.extractor.youtube.pot.provider.PoTokenProviderRejectedRequest if the request is not supported.
if request.is_authenticated: if request.is_authenticated:
raise PoTokenProviderRejectedRequest( raise PoTokenProviderRejectedRequest(
'This provider does not support authenticated requests' 'This provider does not support authenticated requests'
) )
# Settings are pulled from extractor args passed to yt-dlp with the key `youtubepot-<PROVIDER_KEY>`. # Settings are pulled from extractor args passed to yt-dlp with the key `youtubepot-<PROVIDER_KEY>`.
# For this example, the extractor arg would be `--extractor-args "youtubepot-mypotokenprovider:url=https://custom.example.com/get_pot"` # For this example, the extractor arg would be:
external_provider_url = self._configuration_arg('url', default=['https://provider.example.com/get_pot'])[0] # `--extractor-args "youtubepot-mypotokenprovider:url=https://custom.example.com/get_pot"`
external_provider_url = self._configuration_arg(
'url', default=['https://provider.example.com/get_pot'])[0]
# See below for logging guidelines # See below for logging guidelines
self.logger.trace(f'Using external provider URL: {external_provider_url}') self.logger.trace(f'Using external provider URL: {external_provider_url}')
@ -114,12 +119,15 @@ class MyPoTokenProviderPTP(PoTokenProvider): # Provider name must end with "PTP
'do_not_cache': request.bypass_cache, 'do_not_cache': request.bypass_cache,
}).encode(), proxies={'all': None}), }).encode(), proxies={'all': None}),
pot_request=request, pot_request=request,
note=f'Requesting {request.context.value} PO Token for {request.internal_client_name} client from external provider', note=(
f'Requesting {request.context.value} PO Token '
f'for {request.internal_client_name} client from external provider'),
) )
except RequestError as e: except RequestError as e:
# If there is an error, raise PoTokenProviderError. # If there is an error, raise PoTokenProviderError.
# You can specify whether it is expected or not. If it is unexpected, the log will include a link to the bug report location (BUG_REPORT_LOCATION). # You can specify whether it is expected or not. If it is unexpected,
# the log will include a link to the bug report location (BUG_REPORT_LOCATION).
raise PoTokenProviderError( raise PoTokenProviderError(
'Networking error while fetching to get PO Token from external provider', 'Networking error while fetching to get PO Token from external provider',
expected=True expected=True
@ -146,25 +154,33 @@ class MyPoTokenProviderPTP(PoTokenProvider): # Provider name must end with "PTP
# you can define a preference function to increase/decrease the priority of providers. # you can define a preference function to increase/decrease the priority of providers.
@register_preference(MyPoTokenProviderPTP) @register_preference(MyPoTokenProviderPTP)
def my_provider_preference(provider: PoTokenProvider, request: PoTokenRequest, *_, **__) -> int: def my_provider_preference(provider: PoTokenProvider, request: PoTokenRequest) -> int:
return 50 return 50
``` ```
## Logging Guidelines ## Logging Guidelines
- Use the `self.logger` object to log messages. - Use the `self.logger` object to log messages.
- When making HTTP requests, use `self.logger.info` to log a message to standard non-verbose output. This lets users know what is happening when a time-expensive operation is taking place. - When making HTTP requests or any other expensive operation, use `self.logger.info` to log a message to standard non-verbose output.
- For example, `self.logger.info(f'Requesting {request.context.value} PO Token for {request.internal_client_name} client from external provider')` - This lets users know what is happening when a time-expensive operation is taking place.
- Use `self.logger.debug` to log a message to the verbose output (`--verbose`). Try to keep this to a minimum. - It is recommended to include the PO Token context and internal client name in the message if possible.
- Use `self.logger.trace` to log a message to the PO Token debug output (`--extractor-args "youtube:pot_debug=true"`). Log as much as you like here as needed for debugging your provider. - For example, `self.logger.info(f'Requesting {request.context.value} PO Token for {request.internal_client_name} client from external provider')`.
- Use `self.logger.debug` to log a message to the verbose output (`--verbose`).
- For debugging information visible to users posting verbose logs.
- Try to not log too much, prefer using trace logging for detailed debug messages.
- Use `self.logger.trace` to log a message to the PO Token debug output (`--extractor-args "youtube:pot_debug=true"`).
- Log as much as you like here as needed for debugging your provider.
- Avoid logging PO Tokens or any sensitive information to debug or info output. - Avoid logging PO Tokens or any sensitive information to debug or info output.
## Debugging ## Debugging
- Use `-v --extractor-args "youtube:pot_debug=true"` to enable PO Token debug output. - Use `-v --extractor-args "youtube:pot_trace=true"` to enable PO Token debug output.
## Caching ## Caching
> [!WARNING]
> The following describes more advance features that most users/developers will not need to use.
> [!IMPORTANT] > [!IMPORTANT]
> yt-dlp currently has a built-in LRU Memory Cache Provider and a cache spec provider for WebPO Tokens. > yt-dlp currently has a built-in LRU Memory Cache Provider and a cache spec provider for WebPO Tokens.
> You should only need to implement cache providers if you want an external cache, or a cache spec if you are handling non-WebPO Tokens. > You should only need to implement cache providers if you want an external cache, or a cache spec if you are handling non-WebPO Tokens.
@ -184,7 +200,7 @@ from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest
@register_provider @register_provider
class MyCacheProviderPCP(PoTokenCacheProvider): # Provider name must end with "PCP" class MyCacheProviderPCP(PoTokenCacheProvider): # Provider class name must end with "PCP"
PROVIDER_VERSION = '0.1.0' PROVIDER_VERSION = '0.1.0'
# Define a unique display name for the provider # Define a unique display name for the provider
PROVIDER_NAME = 'my-cache-provider' PROVIDER_NAME = 'my-cache-provider'
@ -201,34 +217,36 @@ class MyCacheProviderPCP(PoTokenCacheProvider): # Provider name must end with "
""" """
return True return True
"""
Implement the below cache operations.
- expires_at is a timestamp in UTC. It MUST be respected - cache entries should not be returned if they have expired.
"""
def get(self, key: str): def get(self, key: str):
# Similar to PO Token Providers, Cache Providers and Cache Spec Providers are passed down extractor args matching key youtubepot-<PROVIDER_KEY>. # Similar to PO Token Providers, Cache Providers and Cache Spec Providers
# are passed down extractor args matching key youtubepot-<PROVIDER_KEY>.
some_setting = self._configuration_arg('some_setting', default=['default_value'])[0] some_setting = self._configuration_arg('some_setting', default=['default_value'])[0]
return self.my_cache.get(key) return self.my_cache.get(key)
def store(self, key: str, value: str, expires_at: int): def store(self, key: str, value: str, expires_at: int):
# ⚠ expires_at MUST be respected.
# Cache entries should not be returned if they have expired.
self.my_cache.store(key, value, expires_at) self.my_cache.store(key, value, expires_at)
def delete(self, key: str): def delete(self, key: str):
self.my_cache.delete(key) self.my_cache.delete(key)
def close(self): def close(self):
# Optional close hook, called when YoutubeDL is closed. # Optional close hook, called when the YoutubeDL instance is closed.
pass pass
# If there are multiple PO Token Cache Providers available, you can define a preference function to increase/decrease the priority of providers. # If there are multiple PO Token Cache Providers available, you can
# IMPORTANT: Providers should be in preference of cache lookup time. For example, a memory cache should have a higher preference than a disk cache. # define a preference function to increase/decrease the priority of providers.
# VERY IMPORTANT: yt-dlp has a built-in memory cache with a priority of 10000. Your cache provider should be lower than this.
# IMPORTANT: Providers should be in preference of cache lookup time.
# For example, a memory cache should have a higher preference than a disk cache.
# VERY IMPORTANT: yt-dlp has a built-in memory cache with a priority of 10000.
# Your cache provider should be lower than this.
@register_preference(MyCacheProviderPCP) @register_preference(MyCacheProviderPCP)
def my_cache_preference(provider: PoTokenCacheProvider, request: PoTokenRequest, *_, **__) -> int: def my_cache_preference(provider: PoTokenCacheProvider, request: PoTokenRequest) -> int:
return 50 return 50
``` ```
@ -237,7 +255,7 @@ def my_cache_preference(provider: PoTokenCacheProvider, request: PoTokenRequest,
`yt_dlp.extractor.youtube.pot.cache` `yt_dlp.extractor.youtube.pot.cache`
These are used to provide information on how to cache a particular PO Token Request. These are used to provide information on how to cache a particular PO Token Request.
You might have a different cache spec for different kinds of PO Tokens (e.g. WebPO vs iOSGuard) You might have a different cache spec for different kinds of PO Tokens.
```python ```python
from yt_dlp.extractor.youtube.pot.cache import ( from yt_dlp.extractor.youtube.pot.cache import (
@ -251,7 +269,7 @@ from yt_dlp.extractor.youtube.pot.provider import PoTokenRequest
@register_spec @register_spec
class MyCacheSpecProviderPCSP(PoTokenCacheSpecProvider): # Provider name must end with "PCSP" class MyCacheSpecProviderPCSP(PoTokenCacheSpecProvider): # Provider class name must end with "PCSP"
PROVIDER_VERSION = '0.1.0' PROVIDER_VERSION = '0.1.0'
# Define a unique display name for the provider # Define a unique display name for the provider
PROVIDER_NAME = 'mycachespec' PROVIDER_NAME = 'mycachespec'
@ -278,8 +296,10 @@ class MyCacheSpecProviderPCSP(PoTokenCacheSpecProvider): # Provider name must e
default_ttl=21600, default_ttl=21600,
# Optional: Specify a write policy. # Optional: Specify a write policy.
# WRITE_FIRST will write to the highest priority provider only, whereas WRITE_ALL will write to all providers. # WRITE_FIRST will write to the highest priority provider only,
# WRITE_FIRST may be useful if the PO Token is short-lived and there is no use writing to all providers. # whereas WRITE_ALL will write to all providers.
# WRITE_FIRST may be useful if the PO Token is short-lived
# and there is no use writing to all providers.
write_policy=CacheProviderWritePolicy.WRITE_ALL, write_policy=CacheProviderWritePolicy.WRITE_ALL,
) )
``` ```

View File

@ -23,7 +23,11 @@ def initialize_global_cache(max_size: int):
if _pot_memory_cache.value['max_size'] != max_size: if _pot_memory_cache.value['max_size'] != max_size:
raise ValueError('Cannot change max_size of initialized global memory cache') raise ValueError('Cannot change max_size of initialized global memory cache')
return _pot_memory_cache.value['cache'], _pot_memory_cache.value['lock'], _pot_memory_cache.value['max_size'] return (
_pot_memory_cache.value['cache'],
_pot_memory_cache.value['lock'],
_pot_memory_cache.value['max_size'],
)
@register_provider @register_provider
@ -46,31 +50,25 @@ class MemoryLRUPCP(PoTokenCacheProvider, BuiltInIEContentProvider):
def get(self, key: str) -> str | None: def get(self, key: str) -> str | None:
with self.lock: with self.lock:
if key not in self.cache: if key not in self.cache:
self.logger.trace('cache miss')
return None return None
value, expires_at = self.cache.pop(key) value, expires_at = self.cache.pop(key)
if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()): if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()):
self.logger.trace(f'cache expired key={key}')
return None return None
self.cache[key] = (value, expires_at) self.cache[key] = (value, expires_at)
self.logger.trace(f'cache hit key={key}')
return value return value
def store(self, key: str, value: str, expires_at: int): def store(self, key: str, value: str, expires_at: int):
with self.lock: with self.lock:
if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()): if expires_at < int(dt.datetime.now(dt.timezone.utc).timestamp()):
self.logger.trace(f'ignoring expired key={key}')
return return
if key in self.cache: if key in self.cache:
self.cache.pop(key) self.cache.pop(key)
self.cache[key] = (value, expires_at) self.cache[key] = (value, expires_at)
if len(self.cache) > self.max_size: if len(self.cache) > self.max_size:
self.cache.popitem(last=False) self.cache.popitem(last=False)
self.logger.trace(f'storing key={key}')
def delete(self, key: str): def delete(self, key: str):
with self.lock: with self.lock:
self.logger.trace(f'deleting key={key}')
self.cache.pop(key, None) self.cache.pop(key, None)

View File

@ -19,11 +19,13 @@ class WebPoPCSP(PoTokenCacheSpecProvider, BuiltInIEContentProvider):
PROVIDER_NAME = 'webpo' PROVIDER_NAME = 'webpo'
def generate_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None: def generate_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None:
bind_to_visitor_id = self._configuration_arg('bind_to_visitor_id', default=['true'])[0] == 'true' bind_to_visitor_id = self._configuration_arg(
'bind_to_visitor_id', default=['true'])[0] == 'true'
content_binding, content_binding_type = get_webpo_content_binding( content_binding, content_binding_type = get_webpo_content_binding(
request, bind_to_visitor_id=bind_to_visitor_id) request, bind_to_visitor_id=bind_to_visitor_id)
if not content_binding: if not content_binding or not content_binding_type:
return None return None
write_policy = CacheProviderWritePolicy.WRITE_ALL write_policy = CacheProviderWritePolicy.WRITE_ALL

View File

@ -40,14 +40,15 @@ from yt_dlp.extractor.youtube.pot.provider import (
from yt_dlp.utils import ExtractorError, bug_reports_message, format_field, join_nonempty, traverse_obj from yt_dlp.utils import ExtractorError, bug_reports_message, format_field, join_nonempty, traverse_obj
if typing.TYPE_CHECKING: if typing.TYPE_CHECKING:
from yt_dlp.extractor.youtube.pot.cache import PCPPreference from yt_dlp.extractor.youtube.pot.cache import CacheProviderPreference
from yt_dlp.extractor.youtube.pot.provider import Preference
class YoutubeIEContentProviderLogger(IEContentProviderLogger): class YoutubeIEContentProviderLogger(IEContentProviderLogger):
def __init__(self, ie, prefix, log_level=IEContentProviderLogger.LogLevel.INFO): def __init__(self, ie, prefix, log_level: IEContentProviderLogger.LogLevel | None = None):
self.__ie = ie self.__ie = ie
self.prefix = prefix self.prefix = prefix
self.log_level = log_level self.log_level = log_level if log_level is not None else self.LogLevel.INFO
def _format_msg(self, message: str): def _format_msg(self, message: str):
prefixstr = format_field(self.prefix, None, '[%s] ') prefixstr = format_field(self.prefix, None, '[%s] ')
@ -81,17 +82,13 @@ class PoTokenCache:
logger: IEContentProviderLogger, logger: IEContentProviderLogger,
cache_providers: list[PoTokenCacheProvider], cache_providers: list[PoTokenCacheProvider],
cache_spec_providers: list[PoTokenCacheSpecProvider], cache_spec_providers: list[PoTokenCacheSpecProvider],
cache_provider_preferences: list[PCPPreference] | None = None, cache_provider_preferences: list[CacheProviderPreference] | None = None,
): ):
self.cache_providers: dict[str, PoTokenCacheProvider] = { self.cache_providers: dict[str, PoTokenCacheProvider] = {
provider.PROVIDER_KEY: provider for provider in (cache_providers or []) provider.PROVIDER_KEY: provider for provider in (cache_providers or [])}
} self.cache_provider_preferences: list[CacheProviderPreference] = cache_provider_preferences or []
self.cache_provider_preferences: list[PCPPreference] = cache_provider_preferences or []
self.cache_spec_providers: dict[str, PoTokenCacheSpecProvider] = { self.cache_spec_providers: dict[str, PoTokenCacheSpecProvider] = {
provider.PROVIDER_KEY: provider for provider in (cache_spec_providers or []) provider.PROVIDER_KEY: provider for provider in (cache_spec_providers or [])}
}
self.logger = logger self.logger = logger
def _get_cache_providers(self, request: PoTokenRequest) -> Iterable[PoTokenCacheProvider]: def _get_cache_providers(self, request: PoTokenRequest) -> Iterable[PoTokenCacheProvider]:
@ -107,8 +104,8 @@ class PoTokenCache:
f'{provider.PROVIDER_KEY}={pref}' for provider, pref in preferences.items()))) f'{provider.PROVIDER_KEY}={pref}' for provider, pref in preferences.items())))
return ( return (
provider for provider in sorted(self.cache_providers.values(), key=preferences.get, reverse=True) if provider.is_available() provider for provider in sorted(
) self.cache_providers.values(), key=preferences.get, reverse=True) if provider.is_available())
def _get_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None: def _get_cache_spec(self, request: PoTokenRequest) -> PoTokenCacheSpec | None:
for provider in self.cache_spec_providers.values(): for provider in self.cache_spec_providers.values():
@ -119,25 +116,31 @@ class PoTokenCache:
if not spec: if not spec:
continue continue
if not validate_cache_spec(spec): if not validate_cache_spec(spec):
self.logger.error(f'PoTokenCacheSpecProvider "{provider.PROVIDER_KEY}" generate_cache_spec() returned invalid spec {spec}{provider_bug_report_message(provider)}') self.logger.error(
f'PoTokenCacheSpecProvider "{provider.PROVIDER_KEY}" generate_cache_spec() '
f'returned invalid spec {spec}{provider_bug_report_message(provider)}')
continue continue
spec = dataclasses.replace(spec, _provider=provider) spec = dataclasses.replace(spec, _provider=provider)
self.logger.trace(f'Retrieved cache spec {spec} from cache spec provider "{provider.PROVIDER_NAME}"') self.logger.trace(
f'Retrieved cache spec {spec} from cache spec provider "{provider.PROVIDER_NAME}"')
return spec return spec
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache spec provider: {e!r}{provider_bug_report_message(provider)}', f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache spec provider: '
) f'{e!r}{provider_bug_report_message(provider)}')
continue continue
return None
def _generate_key_bindings(self, spec: PoTokenCacheSpec) -> dict[str, str]: def _generate_key_bindings(self, spec: PoTokenCacheSpec) -> dict[str, str]:
bindings_cleaned = { bindings_cleaned = {
**{k: v for k, v in spec.key_bindings.items() if v is not None}, **{k: v for k, v in spec.key_bindings.items() if v is not None},
# Allow us to invalidate caches if such need arises # Allow us to invalidate caches if such need arises
'_yt': 'v1', '_yt': 'v1',
'_p': spec._provider.PROVIDER_KEY,
} }
self.logger.trace('Generate cache key bindings: {}'.format(', '.join(f'{k}={v}' for k, v in bindings_cleaned.items()))) if spec._provider:
bindings_cleaned['_p'] = spec._provider.PROVIDER_KEY
self.logger.trace(
'Generate cache key bindings: {}'.format(', '.join(f'{k}={v}' for k, v in bindings_cleaned.items())))
return bindings_cleaned return bindings_cleaned
def _generate_key(self, bindings: dict) -> str: def _generate_key(self, bindings: dict) -> str:
@ -155,7 +158,8 @@ class PoTokenCache:
for idx, provider in enumerate(self._get_cache_providers(request)): for idx, provider in enumerate(self._get_cache_providers(request)):
try: try:
self.logger.trace(f'Attempting to fetch PO Token response from "{provider.PROVIDER_NAME}" cache provider') self.logger.trace(
f'Attempting to fetch PO Token response from "{provider.PROVIDER_NAME}" cache provider')
cache_response = provider.get(cache_key) cache_response = provider.get(cache_key)
if not cache_response: if not cache_response:
continue continue
@ -164,10 +168,14 @@ class PoTokenCache:
except (TypeError, ValueError, json.JSONDecodeError): except (TypeError, ValueError, json.JSONDecodeError):
po_token_response = None po_token_response = None
if not validate_response(po_token_response): if not validate_response(po_token_response):
self.logger.error(f'Invalid PO Token response retrieved from cache provider "{provider.PROVIDER_NAME}": {cache_response}{provider_bug_report_message(provider)}') self.logger.error(
f'Invalid PO Token response retrieved from cache provider "{provider.PROVIDER_NAME}": '
f'{cache_response}{provider_bug_report_message(provider)}')
provider.delete(cache_key) provider.delete(cache_key)
continue continue
self.logger.trace(f'PO Token response retrieved from cache using "{provider.PROVIDER_NAME}" provider: {po_token_response}') self.logger.trace(
f'PO Token response retrieved from cache using "{provider.PROVIDER_NAME}" provider: '
f'{po_token_response}')
if idx > 0: if idx > 0:
# Write back to the highest priority cache provider, # Write back to the highest priority cache provider,
# so we stop trying to fetch from lower priority providers # so we stop trying to fetch from lower priority providers
@ -177,22 +185,32 @@ class PoTokenCache:
return po_token_response return po_token_response
except PoTokenCacheProviderError as e: except PoTokenCacheProviderError as e:
self.logger.warning( self.logger.warning(
f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: {e!r}{provider_bug_report_message(provider) if not e.expected else ""}') f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: '
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
continue continue
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: {e!r}{provider_bug_report_message(provider)}', f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: '
f'{e!r}{provider_bug_report_message(provider)}',
) )
continue continue
return None
def store(self, request: PoTokenRequest, response: PoTokenResponse, write_policy: CacheProviderWritePolicy | None = None): def store(
self,
request: PoTokenRequest,
response: PoTokenResponse,
write_policy: CacheProviderWritePolicy | None = None,
):
spec = self._get_cache_spec(request) spec = self._get_cache_spec(request)
if not spec: if not spec:
self.logger.trace('No cache spec available for this request. Not caching.') self.logger.trace('No cache spec available for this request. Not caching.')
return return
if not validate_response(response): if not validate_response(response):
self.logger.error(f'Invalid PO Token response provided to PoTokenCache.store(): {response}{bug_reports_message()}') self.logger.error(
f'Invalid PO Token response provided to PoTokenCache.store(): '
f'{response}{bug_reports_message()}')
return return
cache_key = self._generate_key(self._generate_key_bindings(spec)) cache_key = self._generate_key(self._generate_key_bindings(spec))
@ -207,16 +225,20 @@ class PoTokenCache:
for idx, provider in enumerate(self._get_cache_providers(request)): for idx, provider in enumerate(self._get_cache_providers(request)):
try: try:
self.logger.trace( self.logger.trace(
f'Caching PO Token response in "{provider.PROVIDER_NAME}" cache provider (key={cache_key}, expires_at={cache_response.expires_at})', f'Caching PO Token response in "{provider.PROVIDER_NAME}" cache provider '
) f'(key={cache_key}, expires_at={cache_response.expires_at})')
provider.store(key=cache_key, value=json.dumps(dataclasses.asdict(cache_response)), expires_at=cache_response.expires_at) provider.store(
key=cache_key,
value=json.dumps(dataclasses.asdict(cache_response)),
expires_at=cache_response.expires_at)
except PoTokenCacheProviderError as e: except PoTokenCacheProviderError as e:
self.logger.warning( self.logger.warning(
f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: {e!r}{provider_bug_report_message(provider) if not e.expected else ""}') f'Error from "{provider.PROVIDER_NAME}" PO Token cache provider: '
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: {e!r}{provider_bug_report_message(provider)}', f'Error occurred with "{provider.PROVIDER_NAME}" PO Token cache provider: '
) f'{e!r}{provider_bug_report_message(provider)}')
# WRITE_FIRST should not write to lower priority providers in the case the highest priority provider fails # WRITE_FIRST should not write to lower priority providers in the case the highest priority provider fails
if idx == 0 and write_policy == CacheProviderWritePolicy.WRITE_FIRST: if idx == 0 and write_policy == CacheProviderWritePolicy.WRITE_FIRST:
@ -232,15 +254,15 @@ class PoTokenCache:
class PoTokenRequestDirector: class PoTokenRequestDirector:
def __init__(self, logger: IEContentProviderLogger, cache: PoTokenCache): def __init__(self, logger: IEContentProviderLogger, cache: PoTokenCache):
self.providers = {} self.providers: dict[str, PoTokenProvider] = {}
self.preferences = [] self.preferences: list[Preference] = []
self.cache = cache self.cache = cache
self.logger = logger self.logger = logger
def register_provider(self, provider: PoTokenProvider): def register_provider(self, provider: PoTokenProvider):
self.providers[provider.PROVIDER_KEY] = provider self.providers[provider.PROVIDER_KEY] = provider
def register_preference(self, preference): def register_preference(self, preference: Preference):
self.preferences.append(preference) self.preferences.append(preference)
def _get_providers(self, request: PoTokenRequest) -> Iterable[PoTokenProvider]: def _get_providers(self, request: PoTokenRequest) -> Iterable[PoTokenProvider]:
@ -256,32 +278,39 @@ class PoTokenRequestDirector:
f'{provider.PROVIDER_NAME}={pref}' for provider, pref in preferences.items()))) f'{provider.PROVIDER_NAME}={pref}' for provider, pref in preferences.items())))
return ( return (
provider for provider in sorted(self.providers.values(), key=preferences.get, reverse=True) if provider.is_available() provider for provider in sorted(
self.providers.values(), key=preferences.get, reverse=True)
if provider.is_available()
) )
def _get_po_token(self, request) -> PoTokenResponse | None: def _get_po_token(self, request) -> PoTokenResponse | None:
for provider in self._get_providers(request): for provider in self._get_providers(request):
try: try:
self.logger.trace(f'Attempting to fetch a PO Token from "{provider.PROVIDER_NAME}" provider') self.logger.trace(
f'Attempting to fetch a PO Token from "{provider.PROVIDER_NAME}" provider')
response = provider.request_pot(request.copy()) response = provider.request_pot(request.copy())
except PoTokenProviderRejectedRequest as e: except PoTokenProviderRejectedRequest as e:
self.logger.trace( self.logger.trace(
f'PO Token Provider "{provider.PROVIDER_NAME}" rejected this request, trying next available provider. Reason: {e}') f'PO Token Provider "{provider.PROVIDER_NAME}" rejected this request, '
f'trying next available provider. Reason: {e}')
continue continue
except PoTokenProviderError as e: except PoTokenProviderError as e:
self.logger.warning( self.logger.warning(
f'Error fetching PO Token from "{provider.PROVIDER_NAME}" provider: {e!r}{provider_bug_report_message(provider) if not e.expected else ""}') f'Error fetching PO Token from "{provider.PROVIDER_NAME}" provider: '
f'{e!r}{provider_bug_report_message(provider) if not e.expected else ""}')
continue continue
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
f'Unexpected error when fetching PO Token from "{provider.PROVIDER_NAME}" provider: {e!r}{provider_bug_report_message(provider)}') f'Unexpected error when fetching PO Token from "{provider.PROVIDER_NAME}" provider: '
f'{e!r}{provider_bug_report_message(provider)}')
continue continue
self.logger.trace(f'PO Token response from "{provider.PROVIDER_NAME}" provider: {response}') self.logger.trace(f'PO Token response from "{provider.PROVIDER_NAME}" provider: {response}')
if not validate_response(response): if not validate_response(response):
self.logger.error( self.logger.error(
f'Invalid PO Token response received from "{provider.PROVIDER_NAME}" provider: {response}{provider_bug_report_message(provider)}') f'Invalid PO Token response received from "{provider.PROVIDER_NAME}" provider: '
f'{response}{provider_bug_report_message(provider)}')
continue continue
return response return response
@ -307,14 +336,14 @@ class PoTokenRequestDirector:
if pot_response.expires_at is None or pot_response.expires_at > 0: if pot_response.expires_at is None or pot_response.expires_at > 0:
self.cache.store(request, pot_response) self.cache.store(request, pot_response)
else: else:
self.logger.trace(f'PO Token response will not be cached (expires_at={pot_response.expires_at})') self.logger.trace(
f'PO Token response will not be cached (expires_at={pot_response.expires_at})')
return pot_response.po_token return pot_response.po_token
def close(self): def close(self):
for provider in self.providers.values(): for provider in self.providers.values():
provider.close() provider.close()
self.cache.close() self.cache.close()
@ -325,21 +354,32 @@ def initialize_pot_director(ie):
if not ie._downloader: if not ie._downloader:
raise ExtractorError('Downloader not set', expected=False) raise ExtractorError('Downloader not set', expected=False)
log_level = min( enable_trace = ie._configuration_arg(
IEContentProviderLogger.LogLevel(ie._configuration_arg('pot_log_level', ['INFO'], ie_key='youtube', casesense=False)[0].upper()), 'pot_trace', ['false'], ie_key='youtube', casesense=False)[0] == 'true'
IEContentProviderLogger.LogLevel.DEBUG if ie._downloader.params.get('verbose', False) else IEContentProviderLogger.LogLevel.INFO,
) if enable_trace:
log_level = IEContentProviderLogger.LogLevel.TRACE
elif ie._downloader.params.get('verbose', False):
log_level = IEContentProviderLogger.LogLevel.DEBUG
else:
log_level = IEContentProviderLogger.LogLevel.INFO
cache_providers = [] cache_providers = []
for cache_provider in _pot_cache_providers.value.values(): for cache_provider in _pot_cache_providers.value.values():
settings = traverse_obj(ie._downloader.params, ('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_provider.PROVIDER_KEY.lower()}')) settings = traverse_obj(
cache_provider_logger = YoutubeIEContentProviderLogger(ie, f'pot:cache:{cache_provider.PROVIDER_NAME}', log_level=log_level) ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_provider.PROVIDER_KEY.lower()}'))
cache_provider_logger = YoutubeIEContentProviderLogger(
ie, f'pot:cache:{cache_provider.PROVIDER_NAME}', log_level=log_level)
cache_providers.append(cache_provider(ie, cache_provider_logger, settings or {})) cache_providers.append(cache_provider(ie, cache_provider_logger, settings or {}))
cache_spec_providers = [] cache_spec_providers = []
for cache_spec_provider in _pot_pcs_providers.value.values(): for cache_spec_provider in _pot_pcs_providers.value.values():
settings = traverse_obj(ie._downloader.params, ('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_spec_provider.PROVIDER_KEY.lower()}')) settings = traverse_obj(
cache_spec_provider_logger = YoutubeIEContentProviderLogger(ie, f'pot:cache:spec:{cache_spec_provider.PROVIDER_NAME}', log_level=log_level) ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_spec_provider.PROVIDER_KEY.lower()}'))
cache_spec_provider_logger = YoutubeIEContentProviderLogger(
ie, f'pot:cache:spec:{cache_spec_provider.PROVIDER_NAME}', log_level=log_level)
cache_spec_providers.append(cache_spec_provider(ie, cache_spec_provider_logger, settings or {})) cache_spec_providers.append(cache_spec_provider(ie, cache_spec_provider_logger, settings or {}))
cache = PoTokenCache( cache = PoTokenCache(
@ -357,14 +397,18 @@ def initialize_pot_director(ie):
ie._downloader.add_close_hook(director.close) ie._downloader.add_close_hook(director.close)
for provider in _pot_providers.value.values(): for provider in _pot_providers.value.values():
settings = traverse_obj(ie._downloader.params, ('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{provider.PROVIDER_KEY.lower()}')) settings = traverse_obj(
logger = YoutubeIEContentProviderLogger(ie, f'pot:{provider.PROVIDER_NAME}', log_level=log_level) ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{provider.PROVIDER_KEY.lower()}'))
logger = YoutubeIEContentProviderLogger(
ie, f'pot:{provider.PROVIDER_NAME}', log_level=log_level)
director.register_provider(provider(ie, logger, settings or {})) director.register_provider(provider(ie, logger, settings or {}))
for preference in _ptp_preferences.value: for preference in _ptp_preferences.value:
director.register_preference(preference) director.register_preference(preference)
if director.logger.log_level <= director.logger.LogLevel.DEBUG: if director.logger.log_level <= director.logger.LogLevel.DEBUG:
# calling is_available() for every PO Token provider upfront may have some overhead
director.logger.debug(f'PO Token Providers: {provider_display_list(director.providers.values())}') director.logger.debug(f'PO Token Providers: {provider_display_list(director.providers.values())}')
director.logger.debug(f'PO Token Cache Providers: {provider_display_list(cache.cache_providers.values())}') director.logger.debug(f'PO Token Cache Providers: {provider_display_list(cache.cache_providers.values())}')
director.logger.debug(f'PO Token Cache Spec Providers: {provider_display_list(cache.cache_spec_providers.values())}') director.logger.debug(f'PO Token Cache Spec Providers: {provider_display_list(cache.cache_spec_providers.values())}')
@ -376,7 +420,9 @@ def initialize_pot_director(ie):
def provider_display_list(providers: Iterable[IEContentProvider]): def provider_display_list(providers: Iterable[IEContentProvider]):
def provider_display_name(provider): def provider_display_name(provider):
display_str = join_nonempty(provider.PROVIDER_NAME, provider.PROVIDER_VERSION if not isinstance(provider, BuiltInIEContentProvider) else None) display_str = join_nonempty(
provider.PROVIDER_NAME,
provider.PROVIDER_VERSION if not isinstance(provider, BuiltInIEContentProvider) else None)
statuses = [] statuses = []
if not isinstance(provider, BuiltInIEContentProvider): if not isinstance(provider, BuiltInIEContentProvider):
statuses.append('external') statuses.append('external')
@ -393,12 +439,13 @@ def clean_pot(po_token: str):
# Clean and validate the PO Token. This will strip invalid characters off # Clean and validate the PO Token. This will strip invalid characters off
# (e.g. additional url params the user may accidentally include) # (e.g. additional url params the user may accidentally include)
try: try:
return base64.urlsafe_b64encode(base64.urlsafe_b64decode(urllib.parse.unquote(po_token))).decode() return base64.urlsafe_b64encode(
base64.urlsafe_b64decode(urllib.parse.unquote(po_token))).decode()
except (binascii.Error, ValueError): except (binascii.Error, ValueError):
raise ValueError('Invalid PO Token') raise ValueError('Invalid PO Token')
def validate_response(response: PoTokenResponse): def validate_response(response: PoTokenResponse | None):
if ( if (
not isinstance(response, PoTokenResponse) not isinstance(response, PoTokenResponse)
or not response.po_token or not response.po_token
@ -415,13 +462,15 @@ def validate_response(response: PoTokenResponse):
response.expires_at is None response.expires_at is None
or ( or (
isinstance(response.expires_at, int) isinstance(response.expires_at, int)
and (response.expires_at <= 0 or response.expires_at > int(dt.datetime.now(dt.timezone.utc).timestamp())) and (
response.expires_at <= 0
or response.expires_at > int(dt.datetime.now(dt.timezone.utc).timestamp())
)
) )
) )
def validate_cache_spec(spec: PoTokenCacheSpec): def validate_cache_spec(spec: PoTokenCacheSpec):
return ( return (
isinstance(spec, PoTokenCacheSpec) isinstance(spec, PoTokenCacheSpec)
and isinstance(spec.write_policy, CacheProviderWritePolicy) and isinstance(spec.write_policy, CacheProviderWritePolicy)

View File

@ -62,7 +62,12 @@ class IEContentProvider(abc.ABC):
PROVIDER_VERSION: str = '0.0.0' PROVIDER_VERSION: str = '0.0.0'
BUG_REPORT_LOCATION: str = '(developer has not provided a bug report location)' BUG_REPORT_LOCATION: str = '(developer has not provided a bug report location)'
def __init__(self, ie: InfoExtractor, logger: IEContentProviderLogger, settings: dict[str, list[str]], *_, **__): def __init__(
self,
ie: InfoExtractor,
logger: IEContentProviderLogger,
settings: dict[str, list[str]], *_, **__,
):
self.ie = ie self.ie = ie
self.settings = settings or {} self.settings = settings or {}
self.logger = logger self.logger = logger
@ -103,12 +108,12 @@ class IEContentProvider(abc.ABC):
pass pass
def _configuration_arg(self, key, default=NO_DEFAULT, *, casesense=False): def _configuration_arg(self, key, default=NO_DEFAULT, *, casesense=False):
''' """
@returns A list of values for the setting given by "key" @returns A list of values for the setting given by "key"
or "default" if no such key is present or "default" if no such key is present
@param default The default value to return when the key is not present (default: []) @param default The default value to return when the key is not present (default: [])
@param casesense When false, the values are converted to lower case @param casesense When false, the values are converted to lower case
''' """
val = traverse_obj(self.settings, key) val = traverse_obj(self.settings, key)
if val is None: if val is None:
return [] if default is NO_DEFAULT else default return [] if default is NO_DEFAULT else default

View File

@ -83,8 +83,8 @@ def register_spec(provider: type[PoTokenCacheSpecProvider]):
) )
# XXX: I don't think the typing is correct, and that we need py3.10 to properly type this def register_preference(
def register_preference(*providers: type[PoTokenCacheProvider]) -> typing.Callable[[PCPPreference], PCPPreference]: *providers: type[PoTokenCacheProvider]) -> typing.Callable[[CacheProviderPreference], CacheProviderPreference]:
"""Register a preference for a PoTokenCacheProvider""" """Register a preference for a PoTokenCacheProvider"""
return register_preference_generic( return register_preference_generic(
PoTokenCacheProvider, PoTokenCacheProvider,
@ -94,4 +94,4 @@ def register_preference(*providers: type[PoTokenCacheProvider]) -> typing.Callab
if typing.TYPE_CHECKING: if typing.TYPE_CHECKING:
PCPPreference = typing.Callable[[PoTokenCacheProvider, PoTokenRequest, ...], int] CacheProviderPreference = typing.Callable[[PoTokenCacheProvider, PoTokenRequest], int]

View File

@ -110,12 +110,14 @@ class PoTokenProvider(IEContentProvider, abc.ABC, suffix='PTP'):
# Innertube Client Name. # Innertube Client Name.
# For example, "WEB", "ANDROID", "TVHTML5". # For example, "WEB", "ANDROID", "TVHTML5".
# For a list of WebPO client names, see yt_dlp.extractor.youtube.pot._builtin.utils.WEBPO_CLIENTS. # For a list of WebPO client names, see yt_dlp.extractor.youtube.pot.utils.WEBPO_CLIENTS.
# Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS for a list of client names currently supported by the YouTube extractor. # Also see yt_dlp.extractor.youtube._base.INNERTUBE_CLIENTS
# for a list of client names currently supported by the YouTube extractor.
_SUPPORTED_CLIENTS: tuple[str] | None = () _SUPPORTED_CLIENTS: tuple[str] | None = ()
# If making external requests to websites (i.e. to youtube.com) using another library or service (i.e., not _request_webpage), # If making external requests to websites (i.e. to youtube.com)
# add the request features that are supported. # using another library or service (i.e., not _request_webpage),
# add the request features that are supported.
# If only using _request_webpage to make external requests, set this to None. # If only using _request_webpage to make external requests, set this to None.
_SUPPORTED_EXTERNAL_REQUEST_FEATURES: tuple[ExternalRequestFeature] | None = () _SUPPORTED_EXTERNAL_REQUEST_FEATURES: tuple[ExternalRequestFeature] | None = ()
@ -124,14 +126,20 @@ class PoTokenProvider(IEContentProvider, abc.ABC, suffix='PTP'):
raise PoTokenProviderRejectedRequest(f'{self.PROVIDER_NAME} is not available') raise PoTokenProviderRejectedRequest(f'{self.PROVIDER_NAME} is not available')
# Validate request using built-in settings # Validate request using built-in settings
if self._SUPPORTED_CONTEXTS is not None and request.context not in self._SUPPORTED_CONTEXTS: if (
raise PoTokenProviderRejectedRequest(f'PO Token Context "{request.context}" is not supported by {self.PROVIDER_NAME}') self._SUPPORTED_CONTEXTS is not None
and request.context not in self._SUPPORTED_CONTEXTS
):
raise PoTokenProviderRejectedRequest(
f'PO Token Context "{request.context}" is not supported by {self.PROVIDER_NAME}')
if self._SUPPORTED_CLIENTS is not None: if self._SUPPORTED_CLIENTS is not None:
client_name = traverse_obj(request.innertube_context, ('client', 'clientName')) client_name = traverse_obj(
request.innertube_context, ('client', 'clientName'))
if client_name not in self._SUPPORTED_CLIENTS: if client_name not in self._SUPPORTED_CLIENTS:
raise PoTokenProviderRejectedRequest( raise PoTokenProviderRejectedRequest(
f'Client "{client_name}" is not supported by {self.PROVIDER_NAME}. Supported clients: {", ".join(self._SUPPORTED_CLIENTS) or "none"}') f'Client "{client_name}" is not supported by {self.PROVIDER_NAME}. '
f'Supported clients: {", ".join(self._SUPPORTED_CLIENTS) or "none"}')
self.__validate_external_request_features(request) self.__validate_external_request_features(request)
@ -158,15 +166,25 @@ class PoTokenProvider(IEContentProvider, abc.ABC, suffix='PTP'):
scheme = urllib.parse.urlparse(request.request_proxy).scheme scheme = urllib.parse.urlparse(request.request_proxy).scheme
if scheme.lower() not in self._supported_proxy_schemes: if scheme.lower() not in self._supported_proxy_schemes:
raise PoTokenProviderRejectedRequest( raise PoTokenProviderRejectedRequest(
f'External requests by "{self.PROVIDER_NAME}" provider do not support proxy scheme "{scheme}". Supported proxy schemes: {", ".join(self._supported_proxy_schemes) or "none"}') f'External requests by "{self.PROVIDER_NAME}" provider do not '
f'support proxy scheme "{scheme}". Supported proxy schemes: '
f'{", ".join(self._supported_proxy_schemes) or "none"}')
if request.request_source_address and ExternalRequestFeature.SOURCE_ADDRESS not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES: if (
request.request_source_address
and ExternalRequestFeature.SOURCE_ADDRESS not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES
):
raise PoTokenProviderRejectedRequest( raise PoTokenProviderRejectedRequest(
f'External requests by "{self.PROVIDER_NAME}" provider do not support setting source address') f'External requests by "{self.PROVIDER_NAME}" provider '
f'do not support setting source address')
if not request.request_verify_tls and ExternalRequestFeature.DISABLE_TLS_VERIFICATION not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES: if (
not request.request_verify_tls
and ExternalRequestFeature.DISABLE_TLS_VERIFICATION not in self._SUPPORTED_EXTERNAL_REQUEST_FEATURES
):
raise PoTokenProviderRejectedRequest( raise PoTokenProviderRejectedRequest(
f'External requests by "{self.PROVIDER_NAME}" provider do not support ignoring TLS certificate failures') f'External requests by "{self.PROVIDER_NAME}" provider '
f'do not support ignoring TLS certificate failures')
def request_pot(self, request: PoTokenRequest) -> PoTokenResponse: def request_pot(self, request: PoTokenRequest) -> PoTokenResponse:
self.__validate_request(request) self.__validate_request(request)
@ -229,7 +247,6 @@ def provider_bug_report_message(provider: IEContentProvider, before=';'):
return (before + ' ' if before else '') + msg return (before + ' ' if before else '') + msg
# XXX: I don't think the typing is correct, and that we need py3.10 to properly type this
def register_preference(*providers: type[PoTokenProvider]) -> typing.Callable[[Preference], Preference]: def register_preference(*providers: type[PoTokenProvider]) -> typing.Callable[[Preference], Preference]:
"""Register a preference for a PoTokenProvider""" """Register a preference for a PoTokenProvider"""
return register_preference_generic( return register_preference_generic(
@ -240,7 +257,8 @@ def register_preference(*providers: type[PoTokenProvider]) -> typing.Callable[[P
if typing.TYPE_CHECKING: if typing.TYPE_CHECKING:
Preference = typing.Callable[[PoTokenProvider, PoTokenRequest, ...], int] Preference = typing.Callable[[PoTokenProvider, PoTokenRequest], int]
__all__.append('Preference')
# Barebones innertube context. There may be more fields. # Barebones innertube context. There may be more fields.
class ClientInfo(typing.TypedDict, total=False): class ClientInfo(typing.TypedDict, total=False):

View File

@ -31,7 +31,12 @@ class ContentBindingType(enum.Enum):
VISITOR_ID = 'visitor_id' VISITOR_ID = 'visitor_id'
def get_webpo_content_binding(request: PoTokenRequest, webpo_clients=WEBPO_CLIENTS, bind_to_visitor_id=False) -> tuple[str | None, ContentBindingType | None]: def get_webpo_content_binding(
request: PoTokenRequest,
webpo_clients=WEBPO_CLIENTS,
bind_to_visitor_id=False,
) -> tuple[str | None, ContentBindingType | None]:
client_name = traverse_obj(request.innertube_context, ('client', 'clientName')) client_name = traverse_obj(request.innertube_context, ('client', 'clientName'))
if not client_name or client_name not in webpo_clients: if not client_name or client_name not in webpo_clients:
return None, None return None, None
@ -49,15 +54,18 @@ def get_webpo_content_binding(request: PoTokenRequest, webpo_clients=WEBPO_CLIEN
elif request.context == PoTokenContext.PLAYER or client_name != 'WEB_REMIX': elif request.context == PoTokenContext.PLAYER or client_name != 'WEB_REMIX':
return request.video_id, ContentBindingType.VIDEO_ID return request.video_id, ContentBindingType.VIDEO_ID
return None, None
def _extract_visitor_id(visitor_data): def _extract_visitor_id(visitor_data):
if not visitor_data: if not visitor_data:
return return None
# Attempt to extract the visitor ID from the visitor_data protobuf # Attempt to extract the visitor ID from the visitor_data protobuf
# xxx: ideally should use a protobuf parser # xxx: ideally should use a protobuf parser
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
visitor_id = base64.urlsafe_b64decode(urllib.parse.unquote_plus(visitor_data))[2:13].decode() visitor_id = base64.urlsafe_b64decode(
urllib.parse.unquote_plus(visitor_data))[2:13].decode()
# check that visitor id is all letters and numbers # check that visitor id is all letters and numbers
if re.fullmatch(r'[A-Za-z0-9_-]{11}', visitor_id): if re.fullmatch(r'[A-Za-z0-9_-]{11}', visitor_id):
return visitor_id return visitor_id

File diff suppressed because it is too large Load Diff

View File

@ -150,6 +150,15 @@ class _YoutubeDLHelpFormatter(optparse.IndentedHelpFormatter):
return opts return opts
_PRESET_ALIASES = {
'mp3': ['-f', 'ba[acodec^=mp3]/ba/b', '-x', '--audio-format', 'mp3'],
'aac': ['-f', 'ba[acodec^=aac]/ba[acodec^=mp4a.40.]/ba/b', '-x', '--audio-format', 'aac'],
'mp4': ['--merge-output-format', 'mp4', '--remux-video', 'mp4', '-S', 'vcodec:h264,lang,quality,res,fps,hdr:12,acodec:aac'],
'mkv': ['--merge-output-format', 'mkv', '--remux-video', 'mkv'],
'sleep': ['--sleep-subtitles', '5', '--sleep-requests', '0.75', '--sleep-interval', '10', '--max-sleep-interval', '20'],
}
class _YoutubeDLOptionParser(optparse.OptionParser): class _YoutubeDLOptionParser(optparse.OptionParser):
# optparse is deprecated since Python 3.2. So assume a stable interface even for private methods # optparse is deprecated since Python 3.2. So assume a stable interface even for private methods
ALIAS_DEST = '_triggered_aliases' ALIAS_DEST = '_triggered_aliases'
@ -215,6 +224,22 @@ class _YoutubeDLOptionParser(optparse.OptionParser):
return e.possibilities[0] return e.possibilities[0]
raise raise
def format_option_help(self, formatter=None):
assert formatter, 'Formatter can not be None'
formatted_help = super().format_option_help(formatter=formatter)
formatter.indent()
heading = formatter.format_heading('Preset Aliases')
formatter.indent()
result = []
for name, args in _PRESET_ALIASES.items():
option = optparse.Option('-t', help=shlex.join(args))
formatter.option_strings[option] = f'-t {name}'
result.append(formatter.format_option(option))
formatter.dedent()
formatter.dedent()
help_lines = '\n'.join(result)
return f'{formatted_help}\n{heading}{help_lines}'
def create_parser(): def create_parser():
def _list_from_options_callback(option, opt_str, value, parser, append=True, delim=',', process=str.strip): def _list_from_options_callback(option, opt_str, value, parser, append=True, delim=',', process=str.strip):
@ -317,6 +342,13 @@ def create_parser():
parser.rargs[:0] = shlex.split( parser.rargs[:0] = shlex.split(
opts if value is None else opts.format(*map(shlex.quote, value))) opts if value is None else opts.format(*map(shlex.quote, value)))
def _preset_alias_callback(option, opt_str, value, parser):
if not value:
return
if value not in _PRESET_ALIASES:
raise optparse.OptionValueError(f'Unknown preset alias: {value}')
parser.rargs[:0] = _PRESET_ALIASES[value]
general = optparse.OptionGroup(parser, 'General Options') general = optparse.OptionGroup(parser, 'General Options')
general.add_option( general.add_option(
'-h', '--help', dest='print_help', action='store_true', '-h', '--help', dest='print_help', action='store_true',
@ -519,6 +551,15 @@ def create_parser():
'Alias options can trigger more aliases; so be careful to avoid defining recursive options. ' 'Alias options can trigger more aliases; so be careful to avoid defining recursive options. '
f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. ' f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. '
'This option can be used multiple times')) 'This option can be used multiple times'))
general.add_option(
'-t', '--preset-alias',
metavar='PRESET', dest='_', type='str',
action='callback', callback=_preset_alias_callback,
help=(
'Applies a predefined set of options. e.g. --preset-alias mp3. '
f'The following presets are available: {", ".join(_PRESET_ALIASES)}. '
'See the "Preset Aliases" section at the end for more info. '
'This option can be used multiple times'))
network = optparse.OptionGroup(parser, 'Network Options') network = optparse.OptionGroup(parser, 'Network Options')
network.add_option( network.add_option(

View File

@ -202,7 +202,7 @@ class UpdateInfo:
requested_version: str | None = None requested_version: str | None = None
commit: str | None = None commit: str | None = None
binary_name: str | None = _get_binary_name() # noqa: RUF009: Always returns the same value binary_name: str | None = _get_binary_name() # noqa: RUF009 # Always returns the same value
checksum: str | None = None checksum: str | None = None

View File

@ -54,7 +54,7 @@ from ..compat import (
from ..dependencies import xattr from ..dependencies import xattr
from ..globals import IN_CLI from ..globals import IN_CLI
__name__ = __name__.rsplit('.', 1)[0] # noqa: A001: Pretend to be the parent module __name__ = __name__.rsplit('.', 1)[0] # noqa: A001 # Pretend to be the parent module
class NO_DEFAULT: class NO_DEFAULT:

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2025.03.31' __version__ = '2025.04.30'
RELEASE_GIT_HEAD = '5e457af57fae9645b1b8fa0ed689229c8fb9656b' RELEASE_GIT_HEAD = '505b400795af557bdcfd9d4fa7e9133b26ef431c'
VARIANT = None VARIANT = None
@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.03.31' _pkg_version = '2025.04.30'