Merge branch 'yt-dlp:master' into pr/13205

This commit is contained in:
bashonly 2025-06-01 18:11:46 -05:00
commit e386616c75
No known key found for this signature in database
GPG Key ID: 783F096F253D15B0
57 changed files with 2123 additions and 995 deletions

View File

@ -256,7 +256,7 @@ jobs:
with: with:
path: | path: |
~/yt-dlp-build-venv ~/yt-dlp-build-venv
key: cache-reqs-${{ github.job }} key: cache-reqs-${{ github.job }}-${{ github.ref }}
- name: Install Requirements - name: Install Requirements
run: | run: |
@ -331,19 +331,16 @@ jobs:
if: steps.restore-cache.outputs.cache-hit == 'true' if: steps.restore-cache.outputs.cache-hit == 'true'
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
cache_key: cache-reqs-${{ github.job }} cache_key: cache-reqs-${{ github.job }}-${{ github.ref }}
repository: ${{ github.repository }}
branch: ${{ github.ref }}
run: | run: |
gh extension install actions/gh-actions-cache gh cache delete "${cache_key}"
gh actions-cache delete "${cache_key}" -R "${repository}" -B "${branch}" --confirm
- name: Cache requirements - name: Cache requirements
uses: actions/cache/save@v4 uses: actions/cache/save@v4
with: with:
path: | path: |
~/yt-dlp-build-venv ~/yt-dlp-build-venv
key: cache-reqs-${{ github.job }} key: cache-reqs-${{ github.job }}-${{ github.ref }}
macos_legacy: macos_legacy:
needs: process needs: process

2
.gitignore vendored
View File

@ -105,6 +105,8 @@ README.txt
*.zsh *.zsh
*.spec *.spec
test/testdata/sigs/player-*.js test/testdata/sigs/player-*.js
test/testdata/thumbnails/empty.webp
test/testdata/thumbnails/foo\ %d\ bar/foo_%d.*
# Binary # Binary
/youtube-dl /youtube-dl

View File

@ -770,3 +770,8 @@ NeonMan
pj47x pj47x
troex troex
WouterGordts WouterGordts
baierjan
GeoffreyFrogeye
Pawka
v3DJG6GL
yozel

View File

@ -4,6 +4,52 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.05.22
#### Core changes
- **cookies**: [Fix Linux desktop environment detection](https://github.com/yt-dlp/yt-dlp/commit/e491fd4d090db3af52a82863fb0553dd5e17fb85) ([#13197](https://github.com/yt-dlp/yt-dlp/issues/13197)) by [mbway](https://github.com/mbway)
- **jsinterp**: [Fix increment/decrement evaluation](https://github.com/yt-dlp/yt-dlp/commit/167d7a9f0ffd1b4fe600193441bdb7358db2740b) ([#13238](https://github.com/yt-dlp/yt-dlp/issues/13238)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
#### Extractor changes
- **1tv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/41c0a1fb89628696f8bb88e2b9f3a68f355b8c26) ([#13168](https://github.com/yt-dlp/yt-dlp/issues/13168)) by [bashonly](https://github.com/bashonly)
- **amcnetworks**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/464c84fedf78eef822a431361155f108b5df96d7) ([#13147](https://github.com/yt-dlp/yt-dlp/issues/13147)) by [bashonly](https://github.com/bashonly)
- **bitchute**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/1d0f6539c47e5d5c68c3c47cdb7075339e2885ac) ([#13081](https://github.com/yt-dlp/yt-dlp/issues/13081)) by [bashonly](https://github.com/bashonly)
- **cartoonnetwork**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/7dbb47f84f0ee1266a3a01f58c9bc4c76d76794a) ([#13148](https://github.com/yt-dlp/yt-dlp/issues/13148)) by [bashonly](https://github.com/bashonly)
- **iprima**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/a7d9a5eb79ceeecb851389f3f2c88597871ca3f2) ([#12937](https://github.com/yt-dlp/yt-dlp/issues/12937)) by [baierjan](https://github.com/baierjan)
- **jiosaavn**
- artist: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/586b557b124f954d3f625360ebe970989022ad97) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- playlist, show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/317f4b8006c2c0f0f64f095b1485163ad97c9053) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/6839276496d8814cf16f58b637e45663467928e6) ([#12803](https://github.com/yt-dlp/yt-dlp/issues/12803)) by [subrat-lima](https://github.com/subrat-lima)
- **lrtradio**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/abf58dcd6a09e14eec4ea82ae12f79a0337cb383) ([#13200](https://github.com/yt-dlp/yt-dlp/issues/13200)) by [Pawka](https://github.com/Pawka)
- **nebula**: [Support `--mark-watched`](https://github.com/yt-dlp/yt-dlp/commit/20f288bdc2173c7cc58d709d25ca193c1f6001e7) ([#13120](https://github.com/yt-dlp/yt-dlp/issues/13120)) by [GeoffreyFrogeye](https://github.com/GeoffreyFrogeye)
- **niconico**
- [Fix error handling](https://github.com/yt-dlp/yt-dlp/commit/f569be4602c2a857087e495d5d7ed6060cd97abe) ([#13236](https://github.com/yt-dlp/yt-dlp/issues/13236)) by [bashonly](https://github.com/bashonly)
- live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7a7b85c9014d96421e18aa7ea5f4c1bee5ceece0) ([#13045](https://github.com/yt-dlp/yt-dlp/issues/13045)) by [doe1080](https://github.com/doe1080)
- **nytimesarticle**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/b26bc32579c00ef579d75a835807ccc87d20ee0a) ([#13104](https://github.com/yt-dlp/yt-dlp/issues/13104)) by [bashonly](https://github.com/bashonly)
- **once**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/f475e8b529d18efdad603ffda02a56e707fe0e2c) ([#13164](https://github.com/yt-dlp/yt-dlp/issues/13164)) by [bashonly](https://github.com/bashonly)
- **picarto**: vod: [Support `/profile/` video URLs](https://github.com/yt-dlp/yt-dlp/commit/31e090cb787f3504ec25485adff9a2a51d056734) ([#13227](https://github.com/yt-dlp/yt-dlp/issues/13227)) by [subrat-lima](https://github.com/subrat-lima)
- **playsuisse**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/d880e060803ae8ed5a047e578cca01e1f0e630ce) ([#12466](https://github.com/yt-dlp/yt-dlp/issues/12466)) by [v3DJG6GL](https://github.com/v3DJG6GL)
- **sprout**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/cbcfe6378dde33a650e3852ab17ad4503b8e008d) ([#13149](https://github.com/yt-dlp/yt-dlp/issues/13149)) by [bashonly](https://github.com/bashonly)
- **svtpage**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ea8498ed534642dd7e925961b97b934987142fd3) ([#12957](https://github.com/yt-dlp/yt-dlp/issues/12957)) by [diman8](https://github.com/diman8)
- **twitch**: [Support `--live-from-start`](https://github.com/yt-dlp/yt-dlp/commit/00b1bec55249cf2ad6271d36492c51b34b6459d1) ([#13202](https://github.com/yt-dlp/yt-dlp/issues/13202)) by [bashonly](https://github.com/bashonly)
- **vimeo**: event: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/545c1a5b6f2fe88722b41aef0e7485bf3be3f3f9) ([#13216](https://github.com/yt-dlp/yt-dlp/issues/13216)) by [bashonly](https://github.com/bashonly)
- **wat.tv**: [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/f123cc83b3aea45053f5fa1d9141048b01fc2774) ([#13111](https://github.com/yt-dlp/yt-dlp/issues/13111)) by [bashonly](https://github.com/bashonly)
- **weverse**: [Fix live extraction](https://github.com/yt-dlp/yt-dlp/commit/5328eda8820cc5f21dcf917684d23fbdca41831d) ([#13084](https://github.com/yt-dlp/yt-dlp/issues/13084)) by [bashonly](https://github.com/bashonly)
- **xinpianchang**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/83fabf352489d52843f67e6e9cc752db86d27e6e) ([#13245](https://github.com/yt-dlp/yt-dlp/issues/13245)) by [garret1317](https://github.com/garret1317)
- **youtube**
- [Add PO token support for subtitles](https://github.com/yt-dlp/yt-dlp/commit/32ed5f107c6c641958d1cd2752e130de4db55a13) ([#13234](https://github.com/yt-dlp/yt-dlp/issues/13234)) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz)
- [Add `web_embedded` client for age-restricted videos](https://github.com/yt-dlp/yt-dlp/commit/0feec6dc131f488428bf881519e7c69766fbb9ae) ([#13089](https://github.com/yt-dlp/yt-dlp/issues/13089)) by [bashonly](https://github.com/bashonly)
- [Add a PO Token Provider Framework](https://github.com/yt-dlp/yt-dlp/commit/2685654a37141cca63eda3a92da0e2706e23ccfd) ([#12840](https://github.com/yt-dlp/yt-dlp/issues/12840)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract `media_type` for all videos](https://github.com/yt-dlp/yt-dlp/commit/ded11ebc9afba6ba33923375103e9be2d7c804e7) ([#13136](https://github.com/yt-dlp/yt-dlp/issues/13136)) by [bashonly](https://github.com/bashonly)
- [Fix `--live-from-start` support for premieres](https://github.com/yt-dlp/yt-dlp/commit/8f303afb43395be360cafd7ad4ce2b6e2eedfb8a) ([#13079](https://github.com/yt-dlp/yt-dlp/issues/13079)) by [arabcoders](https://github.com/arabcoders)
- [Fix geo-restriction error handling](https://github.com/yt-dlp/yt-dlp/commit/c7e575e31608c19c5b26c10a4229db89db5fc9a8) ([#13217](https://github.com/yt-dlp/yt-dlp/issues/13217)) by [yozel](https://github.com/yozel)
#### Misc. changes
- **build**
- [Bump PyInstaller to v6.13.0](https://github.com/yt-dlp/yt-dlp/commit/17cf9088d0d535e4a7feffbf02bd49cd9dae5ab9) ([#13082](https://github.com/yt-dlp/yt-dlp/issues/13082)) by [bashonly](https://github.com/bashonly)
- [Bump run-on-arch-action to v3](https://github.com/yt-dlp/yt-dlp/commit/9064d2482d1fe722bbb4a49731fe0711c410d1c8) ([#13088](https://github.com/yt-dlp/yt-dlp/issues/13088)) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [7977b32](https://github.com/yt-dlp/yt-dlp/commit/7977b329ed97b216e37bd402f4935f28c00eac9e) by [bashonly](https://github.com/bashonly)
### 2025.04.30 ### 2025.04.30
#### Important changes #### Important changes

View File

@ -18,10 +18,11 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites \
tar pypi-files lazy-extractors install uninstall tar pypi-files lazy-extractors install uninstall
clean-test: clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \ rm -rf tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \ *.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \ *.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \
*.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp *.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp \
test/testdata/sigs/player-*.js test/testdata/thumbnails/empty.webp "test/testdata/thumbnails/foo %d bar/foo_%d."*
clean-dist: clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \ rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \
yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS

View File

@ -44,6 +44,7 @@ yt-dlp is a feature-rich command-line audio/video downloader with support for [t
* [Post-processing Options](#post-processing-options) * [Post-processing Options](#post-processing-options)
* [SponsorBlock Options](#sponsorblock-options) * [SponsorBlock Options](#sponsorblock-options)
* [Extractor Options](#extractor-options) * [Extractor Options](#extractor-options)
* [Preset Aliases](#preset-aliases)
* [CONFIGURATION](#configuration) * [CONFIGURATION](#configuration)
* [Configuration file encoding](#configuration-file-encoding) * [Configuration file encoding](#configuration-file-encoding)
* [Authentication with netrc](#authentication-with-netrc) * [Authentication with netrc](#authentication-with-netrc)
@ -348,8 +349,8 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--no-flat-playlist Fully extract the videos of a playlist --no-flat-playlist Fully extract the videos of a playlist
(default) (default)
--live-from-start Download livestreams from the start. --live-from-start Download livestreams from the start.
Currently only supported for YouTube Currently experimental and only supported
(Experimental) for YouTube and Twitch
--no-live-from-start Download livestreams from the current time --no-live-from-start Download livestreams from the current time
(default) (default)
--wait-for-video MIN[-MAX] Wait for scheduled streams to become --wait-for-video MIN[-MAX] Wait for scheduled streams to become
@ -375,12 +376,12 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
an alias starts with a dash "-", it is an alias starts with a dash "-", it is
prefixed with "--". Arguments are parsed prefixed with "--". Arguments are parsed
according to the Python string formatting according to the Python string formatting
mini-language. E.g. --alias get-audio,-X mini-language. E.g. --alias get-audio,-X "-S
"-S=aext:{0},abr -x --audio-format {0}" aext:{0},abr -x --audio-format {0}" creates
creates options "--get-audio" and "-X" that options "--get-audio" and "-X" that takes an
takes an argument (ARG0) and expands to argument (ARG0) and expands to "-S
"-S=aext:ARG0,abr -x --audio-format ARG0". aext:ARG0,abr -x --audio-format ARG0". All
All defined aliases are listed in the --help defined aliases are listed in the --help
output. Alias options can trigger more output. Alias options can trigger more
aliases; so be careful to avoid defining aliases; so be careful to avoid defining
recursive options. As a safety measure, each recursive options. As a safety measure, each
@ -1105,6 +1106,10 @@ Make chapter entries for, or remove various segments (sponsor,
arguments for different extractors arguments for different extractors
## Preset Aliases: ## Preset Aliases:
Predefined aliases for convenience and ease of use. Note that future
versions of yt-dlp may add or adjust presets, but the existing preset
names will not be changed or removed
-t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format -t mp3 -f 'ba[acodec^=mp3]/ba/b' -x --audio-format
mp3 mp3
@ -1805,7 +1810,7 @@ The following extractors use this feature:
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage` * `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID) * `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request) * `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be any of `gvs` (Google Video Server URLs), `player` (Innertube player request) or `subs` (Subtitles)
* `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default) * `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default)
* `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context) * `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context)

View File

@ -2,6 +2,7 @@
set -e set -e
source ~/.local/share/pipx/venvs/pyinstaller/bin/activate source ~/.local/share/pipx/venvs/pyinstaller/bin/activate
python -m devscripts.install_deps -o --include build
python -m devscripts.install_deps --include secretstorage --include curl-cffi python -m devscripts.install_deps --include secretstorage --include curl-cffi
python -m devscripts.make_lazy_extractors python -m devscripts.make_lazy_extractors
python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}" python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}"

View File

@ -36,6 +36,9 @@ def main():
f'--name={name}', f'--name={name}',
'--icon=devscripts/logo.ico', '--icon=devscripts/logo.ico',
'--upx-exclude=vcruntime140.dll', '--upx-exclude=vcruntime140.dll',
# Ref: https://github.com/yt-dlp/yt-dlp/issues/13311
# https://github.com/pyinstaller/pyinstaller/issues/9149
'--exclude-module=pkg_resources',
'--noconfirm', '--noconfirm',
'--additional-hooks-dir=yt_dlp/__pyinstaller', '--additional-hooks-dir=yt_dlp/__pyinstaller',
*opts, *opts,

View File

@ -65,7 +65,7 @@ build = [
"build", "build",
"hatchling", "hatchling",
"pip", "pip",
"setuptools>=71.0.2", # 71.0.0 broke pyinstaller "setuptools>=71.0.2,<81", # See https://github.com/pyinstaller/pyinstaller/issues/9149
"wheel", "wheel",
] ]
dev = [ dev = [

View File

@ -246,7 +246,6 @@ The only reliable way to check if a site is supported is to try it.
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalsurmas** - **Canalsurmas**
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine") - **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
- **CartoonNetwork**
- **cbc.ca** - **cbc.ca**
- **cbc.ca:player** - **cbc.ca:player**
- **cbc.ca:player:playlist** - **cbc.ca:player:playlist**
@ -649,7 +648,10 @@ The only reliable way to check if a site is supported is to try it.
- **jiocinema**: [*jiocinema*](## "netrc machine") - **jiocinema**: [*jiocinema*](## "netrc machine")
- **jiocinema:series**: [*jiocinema*](## "netrc machine") - **jiocinema:series**: [*jiocinema*](## "netrc machine")
- **jiosaavn:album** - **jiosaavn:album**
- **jiosaavn:artist**
- **jiosaavn:playlist** - **jiosaavn:playlist**
- **jiosaavn:show**
- **jiosaavn:show:playlist**
- **jiosaavn:song** - **jiosaavn:song**
- **Joj** - **Joj**
- **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR) - **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)
@ -1081,8 +1083,8 @@ The only reliable way to check if a site is supported is to try it.
- **Photobucket** - **Photobucket**
- **PiaLive** - **PiaLive**
- **Piapro**: [*piapro*](## "netrc machine") - **Piapro**: [*piapro*](## "netrc machine")
- **Picarto** - **picarto**
- **PicartoVod** - **picarto:vod**
- **Piksel** - **Piksel**
- **Pinkbike** - **Pinkbike**
- **Pinterest** - **Pinterest**
@ -1390,7 +1392,6 @@ The only reliable way to check if a site is supported is to try it.
- **Spreaker** - **Spreaker**
- **SpreakerShow** - **SpreakerShow**
- **SpringboardPlatform** - **SpringboardPlatform**
- **Sprout**
- **SproutVideo** - **SproutVideo**
- **sr:mediathek**: Saarländischer Rundfunk (**Currently broken**) - **sr:mediathek**: Saarländischer Rundfunk (**Currently broken**)
- **SRGSSR** - **SRGSSR**
@ -1656,6 +1657,7 @@ The only reliable way to check if a site is supported is to try it.
- **vimeo**: [*vimeo*](## "netrc machine") - **vimeo**: [*vimeo*](## "netrc machine")
- **vimeo:album**: [*vimeo*](## "netrc machine") - **vimeo:album**: [*vimeo*](## "netrc machine")
- **vimeo:channel**: [*vimeo*](## "netrc machine") - **vimeo:channel**: [*vimeo*](## "netrc machine")
- **vimeo:event**: [*vimeo*](## "netrc machine")
- **vimeo:group**: [*vimeo*](## "netrc machine") - **vimeo:group**: [*vimeo*](## "netrc machine")
- **vimeo:likes**: [*vimeo*](## "netrc machine") Vimeo user likes - **vimeo:likes**: [*vimeo*](## "netrc machine") Vimeo user likes
- **vimeo:ondemand**: [*vimeo*](## "netrc machine") - **vimeo:ondemand**: [*vimeo*](## "netrc machine")

View File

@ -314,6 +314,20 @@ class TestInfoExtractor(unittest.TestCase):
}, },
{}, {},
), ),
(
# test thumbnail_url key without URL scheme
r'''
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "VideoObject",
"thumbnail_url": "//www.nobelprize.org/images/12693-landscape-medium-gallery.jpg"
}</script>''',
{
'thumbnails': [{'url': 'https://www.nobelprize.org/images/12693-landscape-medium-gallery.jpg'}],
},
{},
),
] ]
for html, expected_dict, search_json_ld_kwargs in _TESTS: for html, expected_dict, search_json_ld_kwargs in _TESTS:
expect_dict( expect_dict(

View File

@ -58,6 +58,14 @@ class TestCookies(unittest.TestCase):
({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3), ({'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE), ({'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'gnome'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'mate'}, _LinuxDesktopEnvironment.GNOME),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'kde'}, _LinuxDesktopEnvironment.KDE3),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'xfce'}, _LinuxDesktopEnvironment.XFCE),
({'XDG_CURRENT_DESKTOP': 'my_custom_de', 'DESKTOP_SESSION': 'my_custom_de', 'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME), ({'GNOME_DESKTOP_SESSION_ID': 1}, _LinuxDesktopEnvironment.GNOME),
({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE3), ({'KDE_FULL_SESSION': 1}, _LinuxDesktopEnvironment.KDE3),
({'KDE_FULL_SESSION': 1, 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4), ({'KDE_FULL_SESSION': 1, 'DESKTOP_SESSION': 'kde4'}, _LinuxDesktopEnvironment.KDE4),

View File

@ -478,6 +478,14 @@ class TestJSInterpreter(unittest.TestCase):
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000}) func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111) self.assertEqual(func([1]), 1111)
def test_increment_decrement(self):
self._test('function f() { var x = 1; return ++x; }', 2)
self._test('function f() { var x = 1; return x++; }', 1)
self._test('function f() { var x = 1; x--; return x }', 0)
self._test('function f() { var y; var x = 1; x++, --x, x--, x--, y="z", "abc", x++; return --x }', -1)
self._test('function f() { var a = "test--"; return a; }', 'test--')
self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -8,6 +8,8 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import subprocess
from yt_dlp import YoutubeDL from yt_dlp import YoutubeDL
from yt_dlp.utils import shell_quote from yt_dlp.utils import shell_quote
from yt_dlp.postprocessor import ( from yt_dlp.postprocessor import (
@ -47,7 +49,18 @@ class TestConvertThumbnail(unittest.TestCase):
print('Skipping: ffmpeg not found') print('Skipping: ffmpeg not found')
return return
file = 'test/testdata/thumbnails/foo %d bar/foo_%d.{}' test_data_dir = 'test/testdata/thumbnails'
generated_file = f'{test_data_dir}/empty.webp'
subprocess.check_call([
pp.executable, '-y', '-f', 'lavfi', '-i', 'color=c=black:s=320x320',
'-c:v', 'libwebp', '-pix_fmt', 'yuv420p', '-vframes', '1', generated_file,
], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
file = test_data_dir + '/foo %d bar/foo_%d.{}'
initial_file = file.format('webp')
os.replace(generated_file, initial_file)
tests = (('webp', 'png'), ('png', 'jpg')) tests = (('webp', 'png'), ('png', 'jpg'))
for inp, out in tests: for inp, out in tests:
@ -55,11 +68,13 @@ class TestConvertThumbnail(unittest.TestCase):
if os.path.exists(out_file): if os.path.exists(out_file):
os.remove(out_file) os.remove(out_file)
pp.convert_thumbnail(file.format(inp), out) pp.convert_thumbnail(file.format(inp), out)
assert os.path.exists(out_file) self.assertTrue(os.path.exists(out_file))
for _, out in tests: for _, out in tests:
os.remove(file.format(out)) os.remove(file.format(out))
os.remove(initial_file)
class TestExec(unittest.TestCase): class TestExec(unittest.TestCase):
def test_parse_cmd(self): def test_parse_cmd(self):
@ -610,3 +625,7 @@ outpoint 10.000000
self.assertEqual( self.assertEqual(
r"'special '\'' characters '\'' galore'\'\'\'", r"'special '\'' characters '\'' galore'\'\'\'",
self._pp._quote_for_ffmpeg("special ' characters ' galore'''")) self._pp._quote_for_ffmpeg("special ' characters ' galore'''"))
if __name__ == '__main__':
unittest.main()

View File

@ -15,6 +15,7 @@ class TestGetWebPoContentBinding:
for context, is_authenticated, expected in [ for context, is_authenticated, expected in [
(PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)), (PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),
(PoTokenContext.PLAYER, False, ('example-video-id', ContentBindingType.VIDEO_ID)), (PoTokenContext.PLAYER, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
(PoTokenContext.SUBS, False, ('example-video-id', ContentBindingType.VIDEO_ID)),
(PoTokenContext.GVS, True, ('example-data-sync-id', ContentBindingType.DATASYNC_ID)), (PoTokenContext.GVS, True, ('example-data-sync-id', ContentBindingType.DATASYNC_ID)),
]], ]],
('WEB_REMIX', PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)), ('WEB_REMIX', PoTokenContext.GVS, False, ('example-visitor-data', ContentBindingType.VISITOR_DATA)),

View File

@ -316,6 +316,10 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js', 'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE', 'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
), ),
(
'https://www.youtube.com/s/player/59b252b9/player_ias.vflset/en_US/base.js',
'D3XWVpYgwhLLKNK4AGX', 'aZrQ1qWJ5yv5h',
),
] ]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.8 KiB

View File

View File

@ -764,11 +764,11 @@ def _get_linux_desktop_environment(env, logger):
GetDesktopEnvironment GetDesktopEnvironment
""" """
xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None) xdg_current_desktop = env.get('XDG_CURRENT_DESKTOP', None)
desktop_session = env.get('DESKTOP_SESSION', None) desktop_session = env.get('DESKTOP_SESSION', '')
if xdg_current_desktop is not None: if xdg_current_desktop is not None:
for part in map(str.strip, xdg_current_desktop.split(':')): for part in map(str.strip, xdg_current_desktop.split(':')):
if part == 'Unity': if part == 'Unity':
if desktop_session is not None and 'gnome-fallback' in desktop_session: if 'gnome-fallback' in desktop_session:
return _LinuxDesktopEnvironment.GNOME return _LinuxDesktopEnvironment.GNOME
else: else:
return _LinuxDesktopEnvironment.UNITY return _LinuxDesktopEnvironment.UNITY
@ -797,35 +797,34 @@ def _get_linux_desktop_environment(env, logger):
return _LinuxDesktopEnvironment.UKUI return _LinuxDesktopEnvironment.UKUI
elif part == 'LXQt': elif part == 'LXQt':
return _LinuxDesktopEnvironment.LXQT return _LinuxDesktopEnvironment.LXQT
logger.info(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"') logger.debug(f'XDG_CURRENT_DESKTOP is set to an unknown value: "{xdg_current_desktop}"')
elif desktop_session is not None: if desktop_session == 'deepin':
if desktop_session == 'deepin': return _LinuxDesktopEnvironment.DEEPIN
return _LinuxDesktopEnvironment.DEEPIN elif desktop_session in ('mate', 'gnome'):
elif desktop_session in ('mate', 'gnome'): return _LinuxDesktopEnvironment.GNOME
return _LinuxDesktopEnvironment.GNOME elif desktop_session in ('kde4', 'kde-plasma'):
elif desktop_session in ('kde4', 'kde-plasma'): return _LinuxDesktopEnvironment.KDE4
elif desktop_session == 'kde':
if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE4 return _LinuxDesktopEnvironment.KDE4
elif desktop_session == 'kde':
if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE4
else:
return _LinuxDesktopEnvironment.KDE3
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
return _LinuxDesktopEnvironment.XFCE
elif desktop_session == 'ukui':
return _LinuxDesktopEnvironment.UKUI
else: else:
logger.info(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"') return _LinuxDesktopEnvironment.KDE3
elif 'xfce' in desktop_session or desktop_session == 'xubuntu':
return _LinuxDesktopEnvironment.XFCE
elif desktop_session == 'ukui':
return _LinuxDesktopEnvironment.UKUI
else: else:
if 'GNOME_DESKTOP_SESSION_ID' in env: logger.debug(f'DESKTOP_SESSION is set to an unknown value: "{desktop_session}"')
return _LinuxDesktopEnvironment.GNOME
elif 'KDE_FULL_SESSION' in env: if 'GNOME_DESKTOP_SESSION_ID' in env:
if 'KDE_SESSION_VERSION' in env: return _LinuxDesktopEnvironment.GNOME
return _LinuxDesktopEnvironment.KDE4 elif 'KDE_FULL_SESSION' in env:
else: if 'KDE_SESSION_VERSION' in env:
return _LinuxDesktopEnvironment.KDE3 return _LinuxDesktopEnvironment.KDE4
else:
return _LinuxDesktopEnvironment.KDE3
return _LinuxDesktopEnvironment.OTHER return _LinuxDesktopEnvironment.OTHER

View File

@ -300,7 +300,6 @@ from .brainpop import (
BrainPOPIlIE, BrainPOPIlIE,
BrainPOPJrIE, BrainPOPJrIE,
) )
from .bravotv import BravoTVIE
from .breitbart import BreitBartIE from .breitbart import BreitBartIE
from .brightcove import ( from .brightcove import (
BrightcoveLegacyIE, BrightcoveLegacyIE,
@ -1262,6 +1261,7 @@ from .nba import (
) )
from .nbc import ( from .nbc import (
NBCIE, NBCIE,
BravoTVIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE, NBCOlympicsIE,
NBCOlympicsStreamIE, NBCOlympicsStreamIE,
@ -1269,6 +1269,7 @@ from .nbc import (
NBCSportsStreamIE, NBCSportsStreamIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
NBCStationsIE, NBCStationsIE,
SyfyIE,
) )
from .ndr import ( from .ndr import (
NDRIE, NDRIE,
@ -2022,7 +2023,6 @@ from .svt import (
SVTSeriesIE, SVTSeriesIE,
) )
from .swearnet import SwearnetEpisodeIE from .swearnet import SwearnetEpisodeIE
from .syfy import SyfyIE
from .syvdk import SYVDKIE from .syvdk import SYVDKIE
from .sztvhu import SztvHuIE from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE from .tagesschau import TagesschauIE
@ -2147,6 +2147,7 @@ from .toggle import (
from .toggo import ToggoIE from .toggo import ToggoIE
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .toutiao import ToutiaoIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ( from .toypics import (
ToypicsIE, ToypicsIE,
@ -2369,6 +2370,7 @@ from .vimeo import (
VHXEmbedIE, VHXEmbedIE,
VimeoAlbumIE, VimeoAlbumIE,
VimeoChannelIE, VimeoChannelIE,
VimeoEventIE,
VimeoGroupsIE, VimeoGroupsIE,
VimeoIE, VimeoIE,
VimeoLikesIE, VimeoLikesIE,

View File

@ -3,6 +3,7 @@ import json
import re import re
import time import time
import urllib.parse import urllib.parse
import uuid
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
from .common import InfoExtractor from .common import InfoExtractor
@ -10,6 +11,7 @@ from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
ExtractorError, ExtractorError,
parse_qs,
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
urlencode_postdata, urlencode_postdata,
@ -45,6 +47,8 @@ MSO_INFO = {
'name': 'Comcast XFINITY', 'name': 'Comcast XFINITY',
'username_field': 'user', 'username_field': 'user',
'password_field': 'passwd', 'password_field': 'passwd',
'login_hostname': 'login.xfinity.com',
'needs_newer_ua': True,
}, },
'TWC': { 'TWC': {
'name': 'Time Warner Cable | Spectrum', 'name': 'Time Warner Cable | Spectrum',
@ -74,6 +78,12 @@ MSO_INFO = {
'name': 'Verizon FiOS', 'name': 'Verizon FiOS',
'username_field': 'IDToken1', 'username_field': 'IDToken1',
'password_field': 'IDToken2', 'password_field': 'IDToken2',
'login_hostname': 'ssoauth.verizon.com',
},
'Fubo': {
'name': 'Fubo',
'username_field': 'username',
'password_field': 'password',
}, },
'Cablevision': { 'Cablevision': {
'name': 'Optimum/Cablevision', 'name': 'Optimum/Cablevision',
@ -1338,6 +1348,7 @@ MSO_INFO = {
'name': 'Sling TV', 'name': 'Sling TV',
'username_field': 'username', 'username_field': 'username',
'password_field': 'password', 'password_field': 'password',
'login_hostname': 'identity.sling.com',
}, },
'Suddenlink': { 'Suddenlink': {
'name': 'Suddenlink', 'name': 'Suddenlink',
@ -1355,7 +1366,6 @@ MSO_INFO = {
class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should end with BaseIE/InfoExtractor class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should end with BaseIE/InfoExtractor
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s' _SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0' _USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
_MODERN_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; rv:131.0) Gecko/20100101 Firefox/131.0'
_MVPD_CACHE = 'ap-mvpd' _MVPD_CACHE = 'ap-mvpd'
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page' _DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
@ -1367,6 +1377,14 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
return super()._download_webpage_handle( return super()._download_webpage_handle(
*args, **kwargs) *args, **kwargs)
@staticmethod
def _get_mso_headers(mso_info):
# yt-dlp's default user-agent is usually too old for some MSO's like Comcast_SSO
# See: https://github.com/yt-dlp/yt-dlp/issues/10848
return {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:131.0) Gecko/20100101 Firefox/131.0',
} if mso_info.get('needs_newer_ua') else {}
@staticmethod @staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating): def _get_mvpd_resource(provider_id, title, guid, rating):
channel = etree.Element('channel') channel = etree.Element('channel')
@ -1382,7 +1400,13 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
resource_rating.text = rating resource_rating.text = rating
return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>' return '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">' + etree.tostring(channel).decode() + '</rss>'
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource): def _extract_mvpd_auth(self, url, video_id, requestor_id, resource, software_statement):
mso_id = self.get_param('ap_mso')
if mso_id:
mso_info = MSO_INFO[mso_id]
else:
mso_info = {}
def xml_text(xml_str, tag): def xml_text(xml_str, tag):
return self._search_regex( return self._search_regex(
f'<{tag}>(.+?)</{tag}>', xml_str, tag) f'<{tag}>(.+?)</{tag}>', xml_str, tag)
@ -1391,15 +1415,27 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele))) token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele)))
return token_expires and token_expires <= int(time.time()) return token_expires and token_expires <= int(time.time())
def post_form(form_page_res, note, data={}): def post_form(form_page_res, note, data={}, validate_url=False):
form_page, urlh = form_page_res form_page, urlh = form_page_res
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url') post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
if not re.match(r'https?://', post_url): if not re.match(r'https?://', post_url):
post_url = urllib.parse.urljoin(urlh.url, post_url) post_url = urllib.parse.urljoin(urlh.url, post_url)
if validate_url:
# This request is submitting credentials so we should validate it when possible
url_parsed = urllib.parse.urlparse(post_url)
expected_hostname = mso_info.get('login_hostname')
if expected_hostname and expected_hostname != url_parsed.hostname:
raise ExtractorError(
f'Unexpected login URL hostname; expected "{expected_hostname}" but got '
f'"{url_parsed.hostname}". Aborting before submitting credentials')
if url_parsed.scheme != 'https':
self.write_debug('Upgrading login URL scheme to https')
post_url = urllib.parse.urlunparse(url_parsed._replace(scheme='https'))
form_data = self._hidden_inputs(form_page) form_data = self._hidden_inputs(form_page)
form_data.update(data) form_data.update(data)
return self._download_webpage_handle( return self._download_webpage_handle(
post_url, video_id, note, data=urlencode_postdata(form_data), headers={ post_url, video_id, note, data=urlencode_postdata(form_data), headers={
**self._get_mso_headers(mso_info),
'Content-Type': 'application/x-www-form-urlencoded', 'Content-Type': 'application/x-www-form-urlencoded',
}) })
@ -1432,40 +1468,72 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
} }
guid = xml_text(resource, 'guid') if '<' in resource else resource guid = xml_text(resource, 'guid') if '<' in resource else resource
count = 0 for _ in range(2):
while count < 2:
requestor_info = self.cache.load(self._MVPD_CACHE, requestor_id) or {} requestor_info = self.cache.load(self._MVPD_CACHE, requestor_id) or {}
authn_token = requestor_info.get('authn_token') authn_token = requestor_info.get('authn_token')
if authn_token and is_expired(authn_token, 'simpleTokenExpires'): if authn_token and is_expired(authn_token, 'simpleTokenExpires'):
authn_token = None authn_token = None
if not authn_token: if not authn_token:
mso_id = self.get_param('ap_mso') if not mso_id:
if mso_id: raise_mvpd_required()
username, password = self._get_login_info('ap_username', 'ap_password', mso_id) username, password = self._get_login_info('ap_username', 'ap_password', mso_id)
if not username or not password: if not username or not password:
raise_mvpd_required()
mso_info = MSO_INFO[mso_id]
provider_redirect_page_res = self._download_webpage_handle(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
}, headers={
# yt-dlp's default user-agent is usually too old for Comcast_SSO
# See: https://github.com/yt-dlp/yt-dlp/issues/10848
'User-Agent': self._MODERN_USER_AGENT,
} if mso_id == 'Comcast_SSO' else None)
elif not self._cookies_passed:
raise_mvpd_required() raise_mvpd_required()
if not mso_id: device_info, urlh = self._download_json_handle(
pass 'https://sp.auth.adobe.com/indiv/devices',
elif mso_id == 'Comcast_SSO': video_id, 'Registering device with Adobe',
data=json.dumps({'fingerprint': uuid.uuid4().hex}).encode(),
headers={'Content-Type': 'application/json; charset=UTF-8'})
device_id = device_info['deviceId']
mvpd_headers['pass_sfp'] = urlh.get_header('pass_sfp')
mvpd_headers['Ap_21'] = device_id
registration = self._download_json(
'https://sp.auth.adobe.com/o/client/register',
video_id, 'Registering client with Adobe',
data=json.dumps({'software_statement': software_statement}).encode(),
headers={'Content-Type': 'application/json; charset=UTF-8'})
access_token = self._download_json(
'https://sp.auth.adobe.com/o/client/token', video_id,
'Obtaining access token', data=urlencode_postdata({
'grant_type': 'client_credentials',
'client_id': registration['client_id'],
'client_secret': registration['client_secret'],
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})['access_token']
mvpd_headers['Authorization'] = f'Bearer {access_token}'
reg_code = self._download_json(
f'https://sp.auth.adobe.com/reggie/v1/{requestor_id}/regcode',
video_id, 'Obtaining registration code',
data=urlencode_postdata({
'requestor': requestor_id,
'deviceId': device_id,
'format': 'json',
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Authorization': f'Bearer {access_token}',
})['code']
provider_redirect_page_res = self._download_webpage_handle(
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
'Downloading Provider Redirect Page', query={
'noflash': 'true',
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
'reg_code': reg_code,
}, headers=self._get_mso_headers(mso_info))
if mso_id == 'Comcast_SSO':
# Comcast page flow varies by video site and whether you # Comcast page flow varies by video site and whether you
# are on Comcast's network. # are on Comcast's network.
provider_redirect_page, urlh = provider_redirect_page_res provider_redirect_page, urlh = provider_redirect_page_res
@ -1489,8 +1557,8 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
oauth_redirect_url = extract_redirect_url( oauth_redirect_url = extract_redirect_url(
provider_redirect_page, fatal=True) provider_redirect_page, fatal=True)
provider_login_page_res = self._download_webpage_handle( provider_login_page_res = self._download_webpage_handle(
oauth_redirect_url, video_id, oauth_redirect_url, video_id, self._DOWNLOADING_LOGIN_PAGE,
self._DOWNLOADING_LOGIN_PAGE) headers=self._get_mso_headers(mso_info))
else: else:
provider_login_page_res = post_form( provider_login_page_res = post_form(
provider_redirect_page_res, provider_redirect_page_res,
@ -1500,24 +1568,35 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
mvpd_confirm_page, urlh = mvpd_confirm_page_res mvpd_confirm_page, urlh = mvpd_confirm_page_res
if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page: if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page:
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Philo': elif mso_id == 'Philo':
# Philo has very unique authentication method # Philo has very unique authentication method
self._download_webpage( self._request_webpage(
'https://idp.philo.com/auth/init/login_code', video_id, 'Requesting auth code', data=urlencode_postdata({ 'https://idp.philo.com/auth/init/login_code', video_id,
'Requesting Philo auth code', data=json.dumps({
'ident': username, 'ident': username,
'device': 'web', 'device': 'web',
'send_confirm_link': False, 'send_confirm_link': False,
'send_token': True, 'send_token': True,
})) 'device_ident': f'web-{uuid.uuid4().hex}',
'include_login_link': True,
}).encode(), headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
philo_code = getpass.getpass('Type auth code you have received [Return]: ') philo_code = getpass.getpass('Type auth code you have received [Return]: ')
self._download_webpage( self._request_webpage(
'https://idp.philo.com/auth/update/login_code', video_id, 'Submitting token', data=urlencode_postdata({ 'https://idp.philo.com/auth/update/login_code', video_id,
'token': philo_code, 'Submitting token', data=json.dumps({'token': philo_code}).encode(),
})) headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
})
mvpd_confirm_page_res = self._download_webpage_handle('https://idp.philo.com/idp/submit', video_id, 'Confirming Philo Login') mvpd_confirm_page_res = self._download_webpage_handle('https://idp.philo.com/idp/submit', video_id, 'Confirming Philo Login')
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Verizon': elif mso_id == 'Verizon':
@ -1539,7 +1618,7 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
provider_redirect_page_res, 'Logging in', { provider_redirect_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
saml_login_page, urlh = saml_login_page_res saml_login_page, urlh = saml_login_page_res
if 'Please try again.' in saml_login_page: if 'Please try again.' in saml_login_page:
raise ExtractorError( raise ExtractorError(
@ -1560,7 +1639,7 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
[saml_login_page, saml_redirect_url], 'Logging in', { [saml_login_page, saml_redirect_url], 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
if 'Please try again.' in saml_login_page: if 'Please try again.' in saml_login_page:
raise ExtractorError( raise ExtractorError(
'Failed to login, incorrect User ID or Password.') 'Failed to login, incorrect User ID or Password.')
@ -1631,7 +1710,7 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
provider_refresh_redirect_url = extract_redirect_url( provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.url) provider_association_redirect, url=urlh.url)
@ -1682,7 +1761,7 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
provider_login_page_res, 'Logging in', { provider_login_page_res, 'Logging in', {
mso_info['username_field']: username, mso_info['username_field']: username,
mso_info['password_field']: password, mso_info['password_field']: password,
}) }, validate_url=True)
provider_refresh_redirect_url = extract_redirect_url( provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.url) provider_association_redirect, url=urlh.url)
@ -1699,6 +1778,27 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
query=hidden_data) query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Fubo':
_, urlh = provider_redirect_page_res
fubo_response = self._download_json(
'https://api.fubo.tv/partners/tve/connect', video_id,
'Authenticating with Fubo', 'Unable to authenticate with Fubo',
query=parse_qs(urlh.url), data=json.dumps({
'username': username,
'password': password,
}).encode(), headers={
'Accept': 'application/json',
'Content-Type': 'application/json',
})
self._request_webpage(
'https://sp.auth.adobe.com/adobe-services/oauth2', video_id,
'Authenticating with Adobe', 'Failed to authenticate with Adobe',
query={
'code': fubo_response['code'],
'state': fubo_response['state'],
})
else: else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh # Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed. # based redirect that should be followed.
@ -1717,7 +1817,8 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
} }
if mso_id in ('Cablevision', 'AlticeOne'): if mso_id in ('Cablevision', 'AlticeOne'):
form_data['_eventId_proceed'] = '' form_data['_eventId_proceed'] = ''
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', form_data) mvpd_confirm_page_res = post_form(
provider_login_page_res, 'Logging in', form_data, validate_url=True)
if mso_id != 'Rogers': if mso_id != 'Rogers':
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
@ -1727,6 +1828,7 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
'Retrieving Session', data=urlencode_postdata({ 'Retrieving Session', data=urlencode_postdata({
'_method': 'GET', '_method': 'GET',
'requestor_id': requestor_id, 'requestor_id': requestor_id,
'reg_code': reg_code,
}), headers=mvpd_headers) }), headers=mvpd_headers)
except ExtractorError as e: except ExtractorError as e:
if not mso_id and isinstance(e.cause, HTTPError) and e.cause.status == 401: if not mso_id and isinstance(e.cause, HTTPError) and e.cause.status == 401:
@ -1734,7 +1836,6 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
raise raise
if '<pendingLogout' in session: if '<pendingLogout' in session:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
authn_token = unescapeHTML(xml_text(session, 'authnToken')) authn_token = unescapeHTML(xml_text(session, 'authnToken'))
requestor_info['authn_token'] = authn_token requestor_info['authn_token'] = authn_token
@ -1755,7 +1856,6 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
}), headers=mvpd_headers) }), headers=mvpd_headers)
if '<pendingLogout' in authorize: if '<pendingLogout' in authorize:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
if '<error' in authorize: if '<error' in authorize:
raise ExtractorError(xml_text(authorize, 'details'), expected=True) raise ExtractorError(xml_text(authorize, 'details'), expected=True)
@ -1778,6 +1878,5 @@ class AdobePassIE(InfoExtractor): # XXX: Conventionally, base classes should en
}), headers=mvpd_headers) }), headers=mvpd_headers)
if '<pendingLogout' in short_authorize: if '<pendingLogout' in short_authorize:
self.cache.store(self._MVPD_CACHE, requestor_id, {}) self.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue continue
return short_authorize return short_authorize

View File

@ -84,6 +84,8 @@ class AdultSwimIE(TurnerBaseIE):
'skip': '404 Not Found', 'skip': '404 Not Found',
}] }]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwNjg5ZmU2My00OTc5LTQxZmQtYWYxNC1hYjVlNmJjNWVkZWIiLCJuYmYiOjE1MzcxOTA2NzQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwNjc0fQ.Xl3AEduM0s1TxDQ6-XssdKIiLm261hhsEv1C1yo_nitIajZThSI9rXILqtIzO0aujoHhdzUnu_dUCq9ffiSBzEG632tTa1la-5tegHtce80cMhewBN4n2t8n9O5tiaPx8MPY8ALdm5wS7QzWE6DO_LTJKgE8Bl7Yv-CWJT4q4SywtNiQWLVOuhBRnDyfsRezxRwptw8qTn9dv5ZzUrVJaby5fDZ_nOncMKvegOgaKd5KEuCAGQ-mg-PSuValMjGuf6FwDguGaK7IyI5Y2oOrzXmD4Dj7q4WBg8w9QoZhtLeAU56mcsGILolku2R5FHlVLO9xhjResyt-pfmegOkpSw'
def _real_extract(self, url): def _real_extract(self, url):
show_path, episode_path = self._match_valid_url(url).groups() show_path, episode_path = self._match_valid_url(url).groups()
display_id = episode_path or show_path display_id = episode_path or show_path
@ -152,7 +154,7 @@ class AdultSwimIE(TurnerBaseIE):
# CDN_TOKEN_APP_ID from: # CDN_TOKEN_APP_ID from:
# https://d2gg02c3xr550i.cloudfront.net/assets/asvp.e9c8bef24322d060ef87.bundle.js # https://d2gg02c3xr550i.cloudfront.net/assets/asvp.e9c8bef24322d060ef87.bundle.js
'appId': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhcHBJZCI6ImFzLXR2ZS1kZXNrdG9wLXB0enQ2bSIsInByb2R1Y3QiOiJ0dmUiLCJuZXR3b3JrIjoiYXMiLCJwbGF0Zm9ybSI6ImRlc2t0b3AiLCJpYXQiOjE1MzI3MDIyNzl9.BzSCk-WYOZ2GMCIaeVb8zWnzhlgnXuJTCu0jGp_VaZE', 'appId': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhcHBJZCI6ImFzLXR2ZS1kZXNrdG9wLXB0enQ2bSIsInByb2R1Y3QiOiJ0dmUiLCJuZXR3b3JrIjoiYXMiLCJwbGF0Zm9ybSI6ImRlc2t0b3AiLCJpYXQiOjE1MzI3MDIyNzl9.BzSCk-WYOZ2GMCIaeVb8zWnzhlgnXuJTCu0jGp_VaZE',
}, { }, self._SOFTWARE_STATEMENT, {
'url': url, 'url': url,
'site_name': 'AdultSwim', 'site_name': 'AdultSwim',
'auth_required': auth, 'auth_required': auth,

View File

@ -20,13 +20,13 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
_THEPLATFORM_KEY = '43jXaGRQud' _THEPLATFORM_KEY = '43jXaGRQud'
_THEPLATFORM_SECRET = 'S10BPXHMlb' _THEPLATFORM_SECRET = 'S10BPXHMlb'
_DOMAIN_MAP = { _DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'), 'history.com': ('HISTORY', 'history', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1MzZlMTQ3ZS0zMzFhLTQxY2YtYTMwNC01MDA2NzNlOGYwYjYiLCJuYmYiOjE1Mzg2NjMzMDksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM4NjYzMzA5fQ.n24-FVHLGXJe2D4atIQZ700aiXKIajKh5PWFoHJ40Az4itjtwwSFHnvufnoal3T8lYkwNLxce7H-IEGxIykRkZEdwq09pMKMT-ft9ASzE4vQ8fAWbf5ZgDME86x4Jq_YaxkRc9Ne0eShGhl8fgTJHvk07sfWcol61HJ7kU7K8FzzcHR0ucFQgA5VNd8RyjoGWY7c6VxnXR214LOpXsywmit04-vGJC102b_WA2EQfqI93UzG6M6l0EeV4n0_ijP3s8_i8WMJZ_uwnTafCIY6G_731i01dKXDLSFzG1vYglAwDa8DTcdrAAuIFFDF6QNGItCCmwbhjufjmoeVb7R1Gg'),
'aetv.com': ('AETV', 'aetv'), 'aetv.com': ('AETV', 'aetv', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI5Y2IwNjg2Yy03ODUxLTRiZDUtODcyMC00MjNlZTg1YTQ1NzMiLCJuYmYiOjE1Mzg2NjMyOTAsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM4NjYzMjkwfQ.T5Elf0X4TndO4NEgqBas1gDxNHGPVk_daO2Ha5FBzVO6xi3zM7eavdAKfYMCN7gpWYJx03iADaVPtczO_t_aGZczDjpwJHgTUzDgvcLZAVsVDqtDIAMy3S846rPgT6UDbVoxurA7B2VTPm9phjrSXhejvd0LBO8MQL4AZ3sy2VmiPJ2noT1ily5PuHCYlkrT1fheO064duR__Cd9DQ5VTMnKjzY3Cx345CEwKDkUk5gwgxhXM-aY0eblehrq8VD81_aRM_O3tvh7nbTydHOnUpV-k_iKVi49gqz7Sf8zb6Zh5z2Uftn3vYCfE5NQuesitoRMnsH17nW7o_D59hkRgg'),
'mylifetime.com': ('LIFETIME', 'lifetime'), 'mylifetime.com': ('LIFETIME', 'lifetime', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJmODg0MDM1ZC1mZGRmLTRmYjgtYmRkMC05MzRhZDdiYTAwYTciLCJuYmYiOjE1NDkzOTI2NDQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTQ5MzkyNjQ0fQ.vkTIaCpheKdKQd__2-3ec4qkcpbAhyCTvwe5iTl922ItSQfVhpEJG4wseVSNmBTrpBi0hvLedcw6Hj1_UuzBMVuVcCqLprU-pI8recEwL0u7G-eVkylsxe1OTUm1o3V6OykXQ9KlA-QQLL1neUhdhR1n5B1LZ4cmtBmiEpfgf4rFwXD1ScFylIcaWKLBqHoRBNUmxyTmoXXvn_A-GGSj9eCizFzY8W5uBwUcsoiw2Cr1skx7PbB2RSP1I5DsoIJKG-8XV1KS7MWl-fNLjE-hVAsI9znqfEEFcPBiv3LhCP4Nf4OIs7xAselMn0M0c8igRUZhURWX_hdygUAxkbKFtQ'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'), 'fyi.tv': ('FYI', 'fyi', 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxOGZiOWM3Ny1mYmMzLTQxYTktYmE1Yi1lMzM0ZmUzNzU4NjEiLCJuYmYiOjE1ODc1ODAzNzcsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTg3NTgwMzc3fQ.AYDuipKswmIfLBfOjHRsfc5fMV5NmJUmiJnkpiep4VEw9QiXkygFj4bN06Si5tFc5Mee5TDrGzDpV6iuKbVpLT5kuqXhAn-Wozf5zKPsg_IpdEKO7gsiCq4calt72ct44KTqtKD_hVcoxQU24_HaJsRgXzu3B-6Ff6UrmsXkyvYifYVC9v2DSkdCuA02_IrlllzVT2kRuefUXgL4vQRtTFf77uYa0RKSTG7uVkiQ_AU41eXevKlO2qgtc14Hk5cZ7-ZNrDyMCXYA5ngdIHP7Gs9PWaFXT36PFHI_rC4EfxUABPzjQFxjpP75aX5qn8SH__HbM9q3hoPWgaEaf76qIQ'),
'fyi.tv': ('FYI', 'fyi'), 'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc', None),
'historyvault.com': (None, 'historyvault'), 'historyvault.com': (None, 'historyvault', None),
'biography.com': (None, 'biography'), 'biography.com': (None, 'biography', None),
} }
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
@ -71,7 +71,7 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
} }
def _extract_aetn_info(self, domain, filter_key, filter_value, url): def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain] requestor_id, brand, software_statement = self._DOMAIN_MAP[domain]
result = self._download_json( result = self._download_json(
f'https://feeds.video.aetnd.com/api/v2/{brand}/videos', f'https://feeds.video.aetnd.com/api/v2/{brand}/videos',
filter_value, query={f'filter[{filter_key}]': filter_value}) filter_value, query={f'filter[{filter_key}]': filter_value})
@ -95,7 +95,7 @@ class AENetworksBaseIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'), theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
traverse_obj(theplatform_metadata, ('ratings', 0, 'rating'))) traverse_obj(theplatform_metadata, ('ratings', 0, 'rating')))
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource, software_statement)
info.update(self._extract_aen_smil(media_url, video_id, auth)) info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({ info.update({
'title': title, 'title': title,
@ -132,10 +132,11 @@ class AENetworksIE(AENetworksBaseIE):
'tags': 'count:14', 'tags': 'count:14',
'categories': ['Mountain Men'], 'categories': ['Mountain Men'],
'episode_number': 1, 'episode_number': 1,
'episode': 'Episode 1', 'episode': 'Winter Is Coming',
'season': 'Season 1', 'season': 'Season 1',
'season_number': 1, 'season_number': 1,
'series': 'Mountain Men', 'series': 'Mountain Men',
'age_limit': 0,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -157,18 +158,18 @@ class AENetworksIE(AENetworksBaseIE):
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:^https?://.*\.jpe?g$',
'chapters': 'count:4', 'chapters': 'count:4',
'tags': 'count:23', 'tags': 'count:23',
'episode': 'Episode 1', 'episode': 'Inlawful Entry',
'episode_number': 1, 'episode_number': 1,
'season': 'Season 9', 'season': 'Season 9',
'season_number': 9, 'season_number': 9,
'series': 'Duck Dynasty', 'series': 'Duck Dynasty',
'age_limit': 0,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.',
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True, 'only_matching': True,

View File

@ -1,188 +0,0 @@
from .adobepass import AdobePassIE
from ..networking import HEADRequest
from ..utils import (
extract_attributes,
float_or_none,
get_element_html_by_class,
int_or_none,
merge_dicts,
parse_age_limit,
remove_end,
str_or_none,
traverse_obj,
unescapeHTML,
unified_timestamp,
update_url_query,
url_or_none,
)
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'info_dict': {
'id': '3923059',
'ext': 'mp4',
'title': 'The Top Chef Season 16 Winner Is...',
'description': 'Find out who takes the title of Top Chef!',
'upload_date': '20190314',
'timestamp': 1552591860,
'season_number': 16,
'episode_number': 15,
'series': 'Top Chef',
'episode': 'The Top Chef Season 16 Winner Is...',
'duration': 190.357,
'season': 'Season 16',
'thumbnail': r're:^https://.+\.jpg',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/top-chef/season-20/episode-1/london-calling',
'info_dict': {
'id': '9000234570',
'ext': 'mp4',
'title': 'London Calling',
'description': 'md5:5af95a8cbac1856bd10e7562f86bb759',
'upload_date': '20230310',
'timestamp': 1678410000,
'season_number': 20,
'episode_number': 1,
'series': 'Top Chef',
'episode': 'London Calling',
'duration': 3266.03,
'season': 'Season 20',
'chapters': 'count:7',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-1/closing-night',
'info_dict': {
'id': '3692045',
'ext': 'mp4',
'title': 'Closing Night',
'description': 'md5:3170065c5c2f19548d72a4cbc254af63',
'upload_date': '20180401',
'timestamp': 1522623600,
'season_number': 1,
'episode_number': 1,
'series': 'In Ice Cold Blood',
'episode': 'Closing Night',
'duration': 2629.051,
'season': 'Season 1',
'chapters': 'count:6',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'info_dict': {
'id': '3974019',
'ext': 'mp4',
'title': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'description': 'md5:f9d638dd6946a1c1c0533a9c6100eae5',
'upload_date': '20190617',
'timestamp': 1560790800,
'season_number': 2,
'episode_number': 16,
'series': 'In Ice Cold Blood',
'episode': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'duration': 68.235,
'season': 'Season 2',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url):
site, display_id = self._match_valid_url(url).group('site', 'id')
webpage = self._download_webpage(url, display_id)
settings = self._search_json(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>', webpage, 'settings', display_id)
tve = extract_attributes(get_element_html_by_class('tve-video-deck-app', webpage) or '')
query = {
'manifest': 'm3u',
'formats': 'm3u,mpeg4',
}
if tve:
account_pid = tve.get('data-mpx-media-account-pid') or 'HNK2IC'
account_id = tve['data-mpx-media-account-id']
metadata = self._parse_json(
tve.get('data-normalized-video', ''), display_id, fatal=False, transform_source=unescapeHTML)
video_id = tve.get('data-guid') or metadata['guid']
if tve.get('data-entitlement') == 'auth':
auth = traverse_obj(settings, ('tve_adobe_auth', {dict})) or {}
site = remove_end(site, 'tv')
release_pid = tve['data-release-pid']
resource = self._get_mvpd_resource(
tve.get('data-adobe-pass-resource-id') or auth.get('adobePassResourceId') or site,
tve['data-title'], release_pid, tve.get('data-rating'))
query.update({
'switch': 'HLSServiceSecure',
'auth': self._extract_mvpd_auth(
url, release_pid, auth.get('adobePassRequestorId') or site, resource),
})
else:
ls_playlist = traverse_obj(settings, ('ls_playlist', ..., {dict}), get_all=False) or {}
account_pid = ls_playlist.get('mpxMediaAccountPid') or 'PHSl-B'
account_id = ls_playlist['mpxMediaAccountId']
video_id = ls_playlist['defaultGuid']
metadata = traverse_obj(
ls_playlist, ('videos', lambda _, v: v['guid'] == video_id, {dict}), get_all=False)
tp_url = f'https://link.theplatform.com/s/{account_pid}/media/guid/{account_id}/{video_id}'
tp_metadata = self._download_json(
update_url_query(tp_url, {'format': 'preview'}), video_id, fatal=False)
chapters = traverse_obj(tp_metadata, ('chapters', ..., {
'start_time': ('startTime', {float_or_none(scale=1000)}),
'end_time': ('endTime', {float_or_none(scale=1000)}),
}))
# prune pointless single chapters that span the entire duration from short videos
if len(chapters) == 1 and not traverse_obj(chapters, (0, 'end_time')):
chapters = None
m3u8_url = self._request_webpage(HEADRequest(
update_url_query(f'{tp_url}/stream.m3u8', query)), video_id, 'Checking m3u8 URL').url
if 'mpeg_cenc' in m3u8_url:
self.report_drm(video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'chapters': chapters,
**merge_dicts(traverse_obj(tp_metadata, {
'title': 'title',
'description': 'description',
'duration': ('duration', {float_or_none(scale=1000)}),
'timestamp': ('pubDate', {float_or_none(scale=1000)}),
'season_number': (('pl1$seasonNumber', 'nbcu$seasonNumber'), {int_or_none}),
'episode_number': (('pl1$episodeNumber', 'nbcu$episodeNumber'), {int_or_none}),
'series': (('pl1$show', 'nbcu$show'), (None, ...), {str}),
'episode': (('title', 'pl1$episodeNumber', 'nbcu$episodeNumber'), {str_or_none}),
'age_limit': ('ratings', ..., 'rating', {parse_age_limit}),
}, get_all=False), traverse_obj(metadata, {
'title': 'title',
'description': 'description',
'duration': ('durationInSeconds', {int_or_none}),
'timestamp': ('airDate', {unified_timestamp}),
'thumbnail': ('thumbnailUrl', {url_or_none}),
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode': 'episodeTitle',
'series': 'show',
})),
}

View File

@ -923,10 +923,18 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
errors = json_data.get('errors') errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH': if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
custom_fields = json_data['custom_fields'] custom_fields = json_data['custom_fields']
missing_fields = ', '.join(
key for key in ('source_url', 'software_statement') if not smuggled_data.get(key))
if missing_fields:
raise ExtractorError(
f'Missing fields in smuggled data: {missing_fields}. '
f'This video can be only extracted from the webpage where it is embedded. '
f'Pass the URL of the embedding webpage instead of the Brightcove URL', expected=True)
tve_token = self._extract_mvpd_auth( tve_token = self._extract_mvpd_auth(
smuggled_data['source_url'], video_id, smuggled_data['source_url'], video_id,
custom_fields['bcadobepassrequestorid'], custom_fields['bcadobepassrequestorid'],
custom_fields['bcadobepassresourceid']) custom_fields['bcadobepassresourceid'],
smuggled_data['software_statement'])
json_data = self._download_json( json_data = self._download_json(
api_url, video_id, headers={ api_url, video_id, headers={
'Accept': f'application/json;pk={policy_key}', 'Accept': f'application/json;pk={policy_key}',

View File

@ -329,6 +329,7 @@ class WatchESPNIE(AdobePassIE):
}] }]
_API_KEY = 'ZXNwbiZicm93c2VyJjEuMC4w.ptUt7QxsteaRruuPmGZFaJByOoqKvDP2a5YkInHrc7c' _API_KEY = 'ZXNwbiZicm93c2VyJjEuMC4w.ptUt7QxsteaRruuPmGZFaJByOoqKvDP2a5YkInHrc7c'
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIyZGJmZWM4My03OWE1LTQyNzEtYTVmZC04NTZjYTMxMjRjNjMiLCJuYmYiOjE1NDAyMTI3NjEsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTQwMjEyNzYxfQ.yaK3r4AI2uLVvsyN1GLzqzgzRlxMPtasSaiYYBV0wIstqih5tvjTmeoLmi8Xy9Kp_U7Md-bOffwiyK3srHkpUkhhwXLH2x6RPjmS1tPmhaG7-3LBcHTf2ySPvXhVf7cN4ngldawK4tdtLtsw6rF_JoZE2yaC6XbS2F51nXSFEDDnOQWIHEQRG3aYAj-38P2CLGf7g-Yfhbp5cKXeksHHQ90u3eOO4WH0EAjc9oO47h33U8KMEXxJbvjV5J8Va2G2fQSgLDZ013NBI3kQnE313qgqQh2feQILkyCENpB7g-TVBreAjOaH1fU471htSoGGYepcAXv-UDtpgitDiLy7CQ'
def _call_bamgrid_api(self, path, video_id, payload=None, headers={}): def _call_bamgrid_api(self, path, video_id, payload=None, headers={}):
if 'Authorization' not in headers: if 'Authorization' not in headers:
@ -405,8 +406,8 @@ class WatchESPNIE(AdobePassIE):
# TV Provider required # TV Provider required
else: else:
resource = self._get_mvpd_resource('ESPN', video_data['name'], video_id, None) resource = self._get_mvpd_resource('espn1', video_data['name'], video_id, None)
auth = self._extract_mvpd_auth(url, video_id, 'ESPN', resource).encode() auth = self._extract_mvpd_auth(url, video_id, 'ESPN', resource, self._SOFTWARE_STATEMENT).encode()
asset = self._download_json( asset = self._download_json(
f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb', f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb',

View File

@ -7,161 +7,157 @@ from ..utils import (
int_or_none, int_or_none,
join_nonempty, join_nonempty,
parse_age_limit, parse_age_limit,
remove_end,
remove_start,
traverse_obj,
try_get,
unified_timestamp, unified_timestamp,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import traverse_obj
class GoIE(AdobePassIE): class GoIE(AdobePassIE):
_SITE_INFO = { _SITE_INFO = {
'abc': { 'abc': {
'brand': '001', 'brand': '001',
'requestor_id': 'ABC', 'requestor_id': 'dtci',
'provider_id': 'ABC',
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI4OTcwMjlkYS0yYjM1LTQyOWUtYWQ0NS02ZjZiZjVkZTdhOTUiLCJuYmYiOjE2MjAxNzM5NjksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNjIwMTczOTY5fQ.SC69DVJWSL8sIe-vVUrP6xS_kzHKqwz9PdKYexs_y-f7Vin6mM-7S-W1TE_-K55O0pyf-TL4xYgvm6LIye8CckG-nZfVwNPV4huduov0jmIcxCQFeUwkHULG2IaA44wfBVUBdaHgkhPweZ2amjycO_IXtez-gBXOLbE3B7Gx9j_5ISCFtyVUblThKfoGyQv6KT6t8Vpmc4ZSKCCQp74KWFFypydb9ucego1taW_nQD06Cdf4yByLd6NaTBceMcIKbug9b9gxFm3XBgJ5q3z7KGo1Kr6XalAV5j4m-fQ91wczlTilX8FM4AljMupyRM9mA_aEADILQ4hS79q4SM0w6w',
}, },
'freeform': { 'freeform': {
'brand': '002', 'brand': '002',
'requestor_id': 'ABCFamily', 'requestor_id': 'ABCFamily',
}, 'provider_id': 'ABCFamily',
'watchdisneychannel': { 'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZWM2MGYyNC0xYzRjLTQ1NzQtYjc0Zi03ZmM4N2E5YWMzMzgiLCJuYmYiOjE1ODc2NjU5MjMsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTg3NjY1OTIzfQ.flCn3dhvmvPnWmV0JV8Fm0YFyj07yPez9-n1GFEwVIm_S2wQVWbWyJhqsAyLZVFrhOMZYTqmPS3OHxGwTwXkEYn6PD7o_vIVG3oqi-Xn1m5jRt_Gazw5qEtpat6VE7bvKGSD3ZhcidOrsCk8NcYyq75u61NHDvSl81pcedJjVRVUpsqrEwmo0aVbA0C8PX3ri0mEbGvkMKvHn8E60xp-PSE-VK8SDT0plwPu_TwUszkZ6-_I8_2xcv_WBqcXFkAVg7Q-iNJXgQvmNsrpcrYuLvi6hEH4ZLtoDcXU6MhwTQAJTiHSo8x9aHX1_qFP09CzlNOFQbC2ZEJdP9SvA53SLQ',
'brand': '004',
'resource_id': 'Disney',
},
'watchdisneyjunior': {
'brand': '008',
'resource_id': 'DisneyJunior',
},
'watchdisneyxd': {
'brand': '009',
'resource_id': 'DisneyXD',
}, },
'disneynow': { 'disneynow': {
'brand': '011', 'brand': '011', # also: '004', '008', '009'
'requestor_id': 'DisneyChannels',
'provider_id': 'DisneyChannels',
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1MzAzNTRiOS04NDNiLTRkNjAtYTQ3ZS0yNzk1MzlkOTIyNTciLCJuYmYiOjE1NTg5ODc0NDksImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTU4OTg3NDQ5fQ.Jud6YS6-J2h0h6po0oMheDym0qRTJQGj4kzacrz4DFuEwhcBkkykW6pF5pKuAUJy9HCZ40oDAHe2KcTlDJjCZF5tDaUEfdihakZ9cC_rG7MU-QoRne8qaB_dPDKwGuk-ZyWD8eV3zwTJmbGo8hDxYTEU81YNCxwhyc_BPDr5TYiubbmpP3_pTnXmSpuL58isJ2peSKWlX9BacuXtBY25c_QnPFKk-_EETm7IHkTpDazde1QfHWGu4s4yJpKGk8RVVujVG6h6ELlL-ZeYLilBm7iS7h1TYG1u7fJhyZRL7isaom6NvAzsvN3ngss1fLwt8decP8wzdFHrbYTdTjW8qw',
'resource_id': 'Disney', 'resource_id': 'Disney',
}, },
'fxnow.fxnetworks': { 'fxnetworks': {
'brand': '025', 'brand': '025', # also: '020'
'requestor_id': 'dtci', 'requestor_id': 'dtci',
'provider_id': 'fx', # also 'fxx', 'fxm'
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIzYWRhYWZiNC02OTAxLTRlYzktOTdmNy1lYWZkZTJkODJkN2EiLCJuYmYiOjE1NjIwMjQwNzYsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTYyMDI0MDc2fQ.dhKMpZK50AObbZYrMiYPSfWtzXHUaeMP3jrIY4Cgfvh0GaEgk0Mns_zp78jypFeZgRtPVleQMQDNq2YEloRLcAGqP1aa6WVDglnK77ZWUm4IKai14Rwf3A6YBhSRoO2_lMmUGkuTf6gZY-kMIPqBYKqzTQiQl4HbniPFodIzFRiuI9QJVrkoyTGrJL4oqiX08PoFI3Z-TOti1Heu3EbFC-GveQHhlinYrzU7rbiAqLEz7FImtfBDsnXX1Y3uJDLYM3Bq4Oh0nrzTv1Fd62wNsCNErHHIbELidh1zZF0ujvt7ReuZUwAitm0UhEJ7OxNOUbEQWtae6pVNscvdvTFMpg',
},
'nationalgeographic': {
'brand': '026', # also '023'
'requestor_id': 'dtci',
'provider_id': 'ngc', # also 'ngw'
'software_statement': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIxMzE4YTM1Ni05Mjc4LTQ4NjEtYTFmNi1jMTIzMzg1ZWMzYzMiLCJuYmYiOjE1NjIwMjM4MjgsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTYyMDIzODI4fQ.Le-2OzF9-jrhJ7ZfWtLWk5iSHGVZoxeU1w0_fO--Heli0OwRZsRq2slSmx-oZTzxuWmAgDEiBkWSDcDK6sM25DrCLsdsJa3MBuZ-slBRtH8aq3HpNoqqLkU-vg6gRUEKMtwBUtwCu_9aKUCayYtndWv4b1DjVQeSrteOW5NNudWVYleAe0kxeNJQHo5If9SCzDudKVJktFUjhNks4QPOC_uONPkRRlL9D0fNvtOY-LRFckfcHhf5z9l1iZjeukV0YhdKnuw1wyiaWrQXBUDiBfbkCRd2DM-KnelqPxfiXCaTjGKDURRBO3pz33ebge3IFXSiU5vl4qHQ8xvunzGpFw',
}, },
} }
_VALID_URL = r'''(?x) _URL_PATH_RE = r'(?:video|episode|movies-and-specials)/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})'
https?:// _VALID_URL = [
(?P<sub_domain> fr'https?://(?:www\.)?(?P<site>abc)\.com/{_URL_PATH_RE}',
(?:{}\.)?go|fxnow\.fxnetworks| fr'https?://(?:www\.)?(?P<site>freeform)\.com/{_URL_PATH_RE}',
(?:www\.)?(?:abc|freeform|disneynow) fr'https?://(?:www\.)?(?P<site>disneynow)\.com/{_URL_PATH_RE}',
)\.com/ fr'https?://fxnow\.(?P<site>fxnetworks)\.com/{_URL_PATH_RE}',
(?: fr'https?://(?:www\.)?(?P<site>nationalgeographic)\.com/tv/{_URL_PATH_RE}',
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)| ]
(?:[^/]+/)*(?P<display_id>[^/?\#]+)
)
'''.format(r'\.|'.join(list(_SITE_INFO.keys())))
_TESTS = [{ _TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643', 'url': 'https://abc.com/episode/4192c0e6-26e5-47a8-817b-ce8272b9e440/playlist/PL551127435',
'info_dict': { 'info_dict': {
'id': 'VDKA3807643', 'id': 'VDKA10805898',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Traitor in the White House', 'title': 'Switch the Flip',
'description': 'md5:05b009d2d145a1e85d25111bd37222e8', 'description': 'To help get Brians life in order, Stewie and Brian swap bodies using a machine that Stewie invents.',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'This content is no longer available.',
}, {
'url': 'https://disneynow.com/shows/big-hero-6-the-series',
'info_dict': {
'title': 'Doraemon',
'id': 'SH55574025',
},
'playlist_mincount': 51,
}, {
'url': 'http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood',
'info_dict': {
'id': 'VDKA3609139',
'title': 'This Guilty Blood',
'description': 'md5:f18e79ad1c613798d95fdabfe96cd292',
'age_limit': 14, 'age_limit': 14,
'duration': 1297,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'Family Guy',
'season': 'Season 16',
'season_number': 16,
'episode': 'Episode 17',
'episode_number': 17,
'timestamp': 1746082800.0,
'upload_date': '20250501',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://disneynow.com/episode/21029660-ba06-4406-adb0-a9a78f6e265e/playlist/PL553044961',
'info_dict': {
'id': 'VDKA39546942',
'ext': 'mp4',
'title': 'Zero Friends Again',
'description': 'Relationships fray under the pressures of a difficult journey.',
'age_limit': 0,
'duration': 1721,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'Star Wars: Skeleton Crew',
'season': 'Season 1',
'season_number': 1,
'episode': 'Episode 6',
'episode_number': 6,
'timestamp': 1746946800.0,
'upload_date': '20250511',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://fxnow.fxnetworks.com/episode/09f4fa6f-c293-469e-aebe-32c9ca5842a7/playlist/PL554408064',
'info_dict': {
'id': 'VDKA38112033',
'ext': 'mp4',
'title': 'The Return of Jerry',
'description': 'The vampires long-lost fifth roommate returns. Written by Paul Simms; directed by Kyle Newacheck.',
'age_limit': 17,
'duration': 1493,
'thumbnail': r're:https?://.+/.+\.jpg',
'series': 'What We Do in the Shadows',
'season': 'Season 6',
'season_number': 6,
'episode': 'Episode 1', 'episode': 'Episode 1',
'upload_date': '20170102',
'season': 'Season 2',
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/abcf/Shadowhunters/video/201/ae5f75608d86bf88aa4f9f4aa76ab1b7/579x325-Q100_ae5f75608d86bf88aa4f9f4aa76ab1b7.jpg',
'duration': 2544,
'season_number': 2,
'series': 'Shadowhunters',
'episode_number': 1, 'episode_number': 1,
'timestamp': 1483387200, 'timestamp': 1729573200.0,
'ext': 'mp4', 'upload_date': '20241022',
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
}, },
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, { }, {
'url': 'https://abc.com/shows/the-rookie/episode-guide/season-04/12-the-knock', 'url': 'https://www.freeform.com/episode/bda0eaf7-761a-4838-aa44-96f794000844/playlist/PL553044961',
'info_dict': { 'info_dict': {
'id': 'VDKA26050359', 'id': 'VDKA39007340',
'title': 'The Knock',
'description': 'md5:0c2947e3ada4c31f28296db7db14aa64',
'age_limit': 14,
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/abc/TheRookie/video/412/daf830d06e83b11eaf5c0a299d993ae3/1556x876-Q75_daf830d06e83b11eaf5c0a299d993ae3.jpg', 'title': 'Angel\'s Landing',
'episode': 'Episode 12', 'description': 'md5:91bf084e785c968fab16734df7313446',
'season_number': 4, 'age_limit': 14,
'season': 'Season 4', 'duration': 2523,
'timestamp': 1642975200, 'thumbnail': r're:https?://.+/.+\.jpg',
'episode_number': 12, 'series': 'How I Escaped My Cult',
'upload_date': '20220123', 'season': 'Season 1',
'series': 'The Rookie', 'season_number': 1,
'duration': 2572, 'episode': 'Episode 2',
}, 'episode_number': 2,
'params': { 'timestamp': 1740038400.0,
'geo_bypass_ip_block': '3.244.239.0/24', 'upload_date': '20250220',
# m3u8 download
'skip_download': True,
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://fxnow.fxnetworks.com/shows/better-things/video/vdka12782841', 'url': 'https://www.nationalgeographic.com/tv/episode/ca694661-1186-41ae-8089-82f64d69b16d/playlist/PL554408064',
'info_dict': { 'info_dict': {
'id': 'VDKA12782841', 'id': 'VDKA39492078',
'title': 'First Look: Better Things - Season 2',
'description': 'md5:fa73584a95761c605d9d54904e35b407',
'ext': 'mp4', 'ext': 'mp4',
'age_limit': 14, 'title': 'Heart of the Emperors',
'upload_date': '20170825', 'description': 'md5:4fc50a2878f030bb3a7eac9124dca677',
'duration': 161, 'age_limit': 0,
'series': 'Better Things', 'duration': 2775,
'thumbnail': 'http://cdn1.edgedatg.com/aws/v2/fx/BetterThings/video/12782841/b6b05e58264121cc2c98811318e6d507/1556x876-Q75_b6b05e58264121cc2c98811318e6d507.jpg', 'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1503661074, 'series': 'Secrets of the Penguins',
}, 'season': 'Season 1',
'params': { 'season_number': 1,
'geo_bypass_ip_block': '3.244.239.0/24', 'episode': 'Episode 1',
# m3u8 download 'episode_number': 1,
'skip_download': True, 'timestamp': 1745204400.0,
'upload_date': '20250421',
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'https://www.freeform.com/movies-and-specials/c38281fc-9f8f-47c7-8220-22394f9df2e1',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://abc.go.com/shows/world-news-tonight/episode-guide/2017-02/17-021717-intense-stand-off-between-man-with-rifle-and-police-in-oakland', 'url': 'https://abc.com/video/219a454a-172c-41bf-878a-d169e6bc0bdc/playlist/PL5523098420',
'only_matching': True,
}, {
# brand 004
'url': 'http://disneynow.go.com/shows/big-hero-6-the-series/season-01/episode-10-mr-sparkles-loses-his-sparkle/vdka4637915',
'only_matching': True,
}, {
# brand 008
'url': 'http://disneynow.go.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://disneynow.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://www.freeform.com/shows/cruel-summer/episode-guide/season-01/01-happy-birthday-jeanette-turner',
'only_matching': True, 'only_matching': True,
}] }]
@ -171,58 +167,29 @@ class GoIE(AdobePassIE):
f'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/{brand}/001/-1/{show_id}/-1/{video_id}/-1/-1.json', f'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/{brand}/001/-1/{show_id}/-1/{video_id}/-1/-1.json',
display_id)['video'] display_id)['video']
def _extract_global_var(self, name, webpage, video_id):
return self._search_json(
fr'window\[["\']{re.escape(name)}["\']\]\s*=',
webpage, f'{name.strip("_")} JSON', video_id)
def _real_extract(self, url): def _real_extract(self, url):
mobj = self._match_valid_url(url) site, display_id = self._match_valid_url(url).group('site', 'id')
sub_domain = remove_start(remove_end(mobj.group('sub_domain') or '', '.go'), 'www.') webpage = self._download_webpage(url, display_id)
video_id, display_id = mobj.group('id', 'display_id') config = self._extract_global_var('__CONFIG__', webpage, display_id)
site_info = self._SITE_INFO.get(sub_domain, {}) data = self._extract_global_var(config['globalVar'], webpage, display_id)
brand = site_info.get('brand') video_id = traverse_obj(data, (
if not video_id or not site_info: 'page', 'content', 'video', 'layout', (('video', 'id'), 'videoid'), {str}, any))
webpage = self._download_webpage(url, display_id or video_id) if not video_id:
data = self._parse_json( video_id = self._search_regex([
self._search_regex( # data-track-video_id="VDKA39492078"
r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage, # data-track-video_id_code="vdka39492078"
'data', default='{}'), # data-video-id="'VDKA3609139'"
display_id or video_id, fatal=False) r'data-(?:track-)?video[_-]id(?:_code)?=["\']*((?:vdka|VDKA)\d+)',
# https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot # page.analytics.videoIdCode
layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict) r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\d+)'], webpage, 'video ID')
video_id = None
if layout: site_info = self._SITE_INFO[site]
video_id = try_get( brand = site_info['brand']
layout,
(lambda x: x['videoid'], lambda x: x['video']['id']),
str)
if not video_id:
video_id = self._search_regex(
(
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# page.analytics.videoIdCode
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)',
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',
r'data-page-brand=\s*["\']\s*(\d+)'), webpage, 'brand',
default='004')
site_info = next(
si for _, si in self._SITE_INFO.items()
if si.get('brand') == brand)
if not video_id:
# show extraction works for Disney, DisneyJunior and DisneyXD
# ABC and Freeform has different layout
show_id = self._search_regex(r'data-show-id=["\']*(SH\d+)', webpage, 'show id')
videos = self._extract_videos(brand, show_id=show_id)
show_title = self._search_regex(r'data-show-title="([^"]+)"', webpage, 'show title', fatal=False)
entries = []
for video in videos:
entries.append(self.url_result(
video['url'], 'Go', video.get('id'), video.get('title')))
entries.reverse()
return self.playlist_result(entries, show_id, show_title)
video_data = self._extract_videos(brand, video_id)[0] video_data = self._extract_videos(brand, video_id)[0]
video_id = video_data['id'] video_id = video_data['id']
title = video_data['title'] title = video_data['title']
@ -238,26 +205,31 @@ class GoIE(AdobePassIE):
if ext == 'm3u8': if ext == 'm3u8':
video_type = video_data.get('type') video_type = video_data.get('type')
data = { data = {
'video_id': video_data['id'], 'video_id': video_id,
'video_type': video_type, 'video_type': video_type,
'brand': brand, 'brand': brand,
'device': '001', 'device': '001',
'app_name': 'webplayer-abc',
} }
if video_data.get('accesslevel') == '1': if video_data.get('accesslevel') == '1':
requestor_id = site_info.get('requestor_id', 'DisneyChannels') provider_id = site_info['provider_id']
software_statement = traverse_obj(data, ('app', 'config', (
('features', 'auth', 'softwareStatement'),
('tvAuth', 'SOFTWARE_STATEMENTS', 'PRODUCTION'),
), {str}, any)) or site_info['software_statement']
resource = site_info.get('resource_id') or self._get_mvpd_resource( resource = site_info.get('resource_id') or self._get_mvpd_resource(
requestor_id, title, video_id, None) provider_id, title, video_id, None)
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, site_info['requestor_id'], resource, software_statement)
data.update({ data.update({
'token': auth, 'token': auth,
'token_type': 'ap', 'token_type': 'ap',
'adobe_requestor_id': requestor_id, 'adobe_requestor_id': provider_id,
}) })
else: else:
self._initialize_geo_bypass({'countries': ['US']}) self._initialize_geo_bypass({'countries': ['US']})
entitlement = self._download_json( entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json', 'https://prod.gatekeeper.us-abc.symphony.edgedatg.go.com/vp2/ws-secure/entitlement/2020/playmanifest_secure.json',
video_id, data=urlencode_postdata(data)) video_id, data=urlencode_postdata(data))
errors = entitlement.get('errors', {}).get('errors', []) errors = entitlement.get('errors', {}).get('errors', [])
if errors: if errors:
@ -267,7 +239,7 @@ class GoIE(AdobePassIE):
error['message'], countries=['US']) error['message'], countries=['US'])
error_message = ', '.join([error['message'] for error in errors]) error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError(f'{self.IE_NAME} said: {error_message}', expected=True) raise ExtractorError(f'{self.IE_NAME} said: {error_message}', expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey'] asset_url += '?' + entitlement['entitlement']['uplynkData']['sessionKey']
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False) asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False)
formats.extend(fmts) formats.extend(fmts)

View File

@ -19,7 +19,8 @@ from ..utils import (
class NBACVPBaseIE(TurnerBaseIE): class NBACVPBaseIE(TurnerBaseIE):
def _extract_nba_cvp_info(self, path, video_id, fatal=False): def _extract_nba_cvp_info(self, path, video_id, fatal=False):
return self._extract_cvp_info( return self._extract_cvp_info(
f'http://secure.nba.com/{path}', video_id, { # XXX: The 3rd argument (None) needs to be the AdobePass software_statement
f'http://secure.nba.com/{path}', video_id, None, {
'default': { 'default': {
'media_src': 'http://nba.cdn.turner.com/nba/big', 'media_src': 'http://nba.cdn.turner.com/nba/big',
}, },
@ -94,6 +95,7 @@ class NBAWatchBaseIE(NBACVPBaseIE):
class NBAWatchEmbedIE(NBAWatchBaseIE): class NBAWatchEmbedIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch:embed' IE_NAME = 'nba:watch:embed'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'embed\?.*?\bid=(?P<id>\d+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'embed\?.*?\bid=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -115,6 +117,7 @@ class NBAWatchEmbedIE(NBAWatchBaseIE):
class NBAWatchIE(NBAWatchBaseIE): class NBAWatchIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch' IE_NAME = 'nba:watch'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'(?:nba/)?video/(?P<id>.+?(?=/index\.html)|(?:[^/]+/)*[^/?#&]+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'(?:nba/)?video/(?P<id>.+?(?=/index\.html)|(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -167,6 +170,7 @@ class NBAWatchIE(NBAWatchBaseIE):
class NBAWatchCollectionIE(NBAWatchBaseIE): class NBAWatchCollectionIE(NBAWatchBaseIE):
_WORKING = False
IE_NAME = 'nba:watch:collection' IE_NAME = 'nba:watch:collection'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'list/collection/(?P<id>[^/?#&]+)' _VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'list/collection/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -336,6 +340,7 @@ class NBABaseIE(NBACVPBaseIE):
class NBAEmbedIE(NBABaseIE): class NBAEmbedIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba:embed' IE_NAME = 'nba:embed'
_VALID_URL = r'https?://secure\.nba\.com/assets/amp/include/video/(?:topI|i)frame\.html\?.*?\bcontentId=(?P<id>[^?#&]+)' _VALID_URL = r'https?://secure\.nba\.com/assets/amp/include/video/(?:topI|i)frame\.html\?.*?\bcontentId=(?P<id>[^?#&]+)'
_TESTS = [{ _TESTS = [{
@ -358,6 +363,7 @@ class NBAEmbedIE(NBABaseIE):
class NBAIE(NBABaseIE): class NBAIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba' IE_NAME = 'nba'
_VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?!{NBABaseIE._CHANNEL_PATH_REGEX})video/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?!{NBABaseIE._CHANNEL_PATH_REGEX})video/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
@ -385,6 +391,7 @@ class NBAIE(NBABaseIE):
class NBAChannelIE(NBABaseIE): class NBAChannelIE(NBABaseIE):
_WORKING = False
IE_NAME = 'nba:channel' IE_NAME = 'nba:channel'
_VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?:{NBABaseIE._CHANNEL_PATH_REGEX})/(?P<id>[^/?#&]+)' _VALID_URL = NBABaseIE._VALID_URL_BASE + f'(?:{NBABaseIE._CHANNEL_PATH_REGEX})/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{

View File

@ -6,7 +6,7 @@ import xml.etree.ElementTree
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE, default_ns from .theplatform import ThePlatformBaseIE, ThePlatformIE, default_ns
from ..networking import HEADRequest from ..networking import HEADRequest
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -14,26 +14,130 @@ from ..utils import (
UserNotLive, UserNotLive,
clean_html, clean_html,
determine_ext, determine_ext,
extract_attributes,
float_or_none, float_or_none,
get_element_html_by_class,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
make_archive_id,
mimetype2ext, mimetype2ext,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
parse_iso8601,
remove_end, remove_end,
smuggle_url,
traverse_obj,
try_get, try_get,
unescapeHTML, unescapeHTML,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_basename, url_basename,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE class NBCUniversalBaseIE(ThePlatformBaseIE):
_VALID_URL = r'https?(?P<permalink>://(?:www\.)?nbc\.com/(?:classic-tv/)?[^/]+/video/[^/]+/(?P<id>(?:NBCE|n)?\d+))' _GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_M3U8_RE = r'https?://[^/?#]+/prod/[\w-]+/(?P<folders>[^?#]+/)cmaf/mpeg_(?:cbcs|cenc)\w*/master_cmaf\w*\.m3u8'
def _download_nbcu_smil_and_extract_m3u8_url(self, tp_path, video_id, query):
smil = self._download_xml(
f'https://link.theplatform.com/s/{tp_path}', video_id,
'Downloading SMIL manifest', 'Failed to download SMIL manifest', query={
**query,
'format': 'SMIL', # XXX: Do not confuse "format" with "formats"
'manifest': 'm3u',
'switch': 'HLSServiceSecure', # Or else we get broken mp4 http URLs instead of HLS
}, headers=self.geo_verification_headers())
ns = f'//{{{default_ns}}}'
if url := traverse_obj(smil, (f'{ns}video/@src', lambda _, v: determine_ext(v) == 'm3u8', any)):
return url
exc = traverse_obj(smil, (f'{ns}param', lambda _, v: v.get('name') == 'exception', '@value', any))
if exc == 'GeoLocationBlocked':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(traverse_obj(smil, (f'{ns}ref/@abstract', ..., any)), expected=exc == 'Expired')
def _extract_nbcu_formats_and_subtitles(self, tp_path, video_id, query):
# formats='mpeg4' will return either a working m3u8 URL or an m3u8 template for non-DRM HLS
# formats='m3u+none,mpeg4' may return DRM HLS but w/the "folders" needed for non-DRM template
query['formats'] = 'm3u+none,mpeg4'
m3u8_url = self._download_nbcu_smil_and_extract_m3u8_url(tp_path, video_id, query)
if mobj := re.fullmatch(self._M3U8_RE, m3u8_url):
query['formats'] = 'mpeg4'
m3u8_tmpl = self._download_nbcu_smil_and_extract_m3u8_url(tp_path, video_id, query)
# Example: https://vod-lf-oneapp-prd.akamaized.net/prod/video/{folders}master_hls.m3u8
if '{folders}' in m3u8_tmpl:
self.write_debug('Found m3u8 URL template, formatting URL path')
m3u8_url = m3u8_tmpl.format(folders=mobj.group('folders'))
if '/mpeg_cenc' in m3u8_url or '/mpeg_cbcs' in m3u8_url:
self.report_drm(video_id)
return self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
def _extract_nbcu_video(self, url, display_id, old_ie_key=None):
webpage = self._download_webpage(url, display_id)
settings = self._search_json(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>',
webpage, 'settings', display_id)
query = {}
tve = extract_attributes(get_element_html_by_class('tve-video-deck-app', webpage) or '')
if tve:
account_pid = tve.get('data-mpx-media-account-pid') or tve['data-mpx-account-pid']
account_id = tve['data-mpx-media-account-id']
metadata = self._parse_json(
tve.get('data-normalized-video') or '', display_id, fatal=False, transform_source=unescapeHTML)
video_id = tve.get('data-guid') or metadata['guid']
if tve.get('data-entitlement') == 'auth':
auth = settings['tve_adobe_auth']
release_pid = tve['data-release-pid']
resource = self._get_mvpd_resource(
tve.get('data-adobe-pass-resource-id') or auth['adobePassResourceId'],
tve['data-title'], release_pid, tve.get('data-rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, auth['adobePassRequestorId'],
resource, auth['adobePassSoftwareStatement'])
else:
ls_playlist = traverse_obj(settings, (
'ls_playlist', lambda _, v: v['defaultGuid'], any, {require('LS playlist')}))
video_id = ls_playlist['defaultGuid']
account_pid = ls_playlist.get('mpxMediaAccountPid') or ls_playlist['mpxAccountPid']
account_id = ls_playlist['mpxMediaAccountId']
metadata = traverse_obj(ls_playlist, ('videos', lambda _, v: v['guid'] == video_id, any)) or {}
tp_path = f'{account_pid}/media/guid/{account_id}/{video_id}'
formats, subtitles = self._extract_nbcu_formats_and_subtitles(tp_path, video_id, query)
tp_metadata = self._download_theplatform_metadata(tp_path, video_id, fatal=False)
parsed_info = self._parse_theplatform_metadata(tp_metadata)
self._merge_subtitles(parsed_info['subtitles'], target=subtitles)
return {
**parsed_info,
**traverse_obj(metadata, {
'title': ('title', {str}),
'description': ('description', {str}),
'duration': ('durationInSeconds', {int_or_none}),
'timestamp': ('airDate', {parse_iso8601}),
'thumbnail': ('thumbnailUrl', {url_or_none}),
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode': ('episodeTitle', {str}),
'series': ('show', {str}),
}),
'id': video_id,
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id(old_ie_key, video_id)] if old_ie_key else None,
}
class NBCIE(NBCUniversalBaseIE):
_VALID_URL = r'https?(?P<permalink>://(?:www\.)?nbc\.com/(?:classic-tv/)?[^/?#]+/video/[^/?#]+/(?P<id>\w+))'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.nbc.com/the-tonight-show/video/jimmy-fallon-surprises-fans-at-ben-jerrys/2848237', 'url': 'http://www.nbc.com/the-tonight-show/video/jimmy-fallon-surprises-fans-at-ben-jerrys/2848237',
@ -49,47 +153,20 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'episode_number': 86, 'episode_number': 86,
'season': 'Season 2', 'season': 'Season 2',
'season_number': 2, 'season_number': 2,
'series': 'Tonight Show: Jimmy Fallon', 'series': 'Tonight',
'duration': 237.0, 'duration': 236.504,
'chapters': 'count:1', 'tags': 'count:2',
'tags': 'count:4',
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'categories': ['Series/The Tonight Show Starring Jimmy Fallon'], 'categories': ['Series/The Tonight Show Starring Jimmy Fallon'],
'media_type': 'Full Episode', 'media_type': 'Full Episode',
'age_limit': 14,
'_old_archive_ids': ['theplatform 2848237'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{ {
'url': 'http://www.nbc.com/saturday-night-live/video/star-wars-teaser/2832821',
'info_dict': {
'id': '2832821',
'ext': 'mp4',
'title': 'Star Wars Teaser',
'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
'timestamp': 1417852800,
'upload_date': '20141206',
'uploader': 'NBCU-COM',
},
'skip': 'page not found',
},
{
# HLS streams requires the 'hdnea3' cookie
'url': 'http://www.nbc.com/Kings/video/goliath/n1806',
'info_dict': {
'id': '101528f5a9e8127b107e98c5e6ce4638',
'ext': 'mp4',
'title': 'Goliath',
'description': 'When an unknown soldier saves the life of the King\'s son in battle, he\'s thrust into the limelight and politics of the kingdom.',
'timestamp': 1237100400,
'upload_date': '20090315',
'uploader': 'NBCU-COM',
},
'skip': 'page not found',
},
{
# manifest url does not have extension
'url': 'https://www.nbc.com/the-golden-globe-awards/video/oprah-winfrey-receives-cecil-b-de-mille-award-at-the-2018-golden-globes/3646439', 'url': 'https://www.nbc.com/the-golden-globe-awards/video/oprah-winfrey-receives-cecil-b-de-mille-award-at-the-2018-golden-globes/3646439',
'info_dict': { 'info_dict': {
'id': '3646439', 'id': '3646439',
@ -99,48 +176,47 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'episode_number': 1, 'episode_number': 1,
'season': 'Season 75', 'season': 'Season 75',
'season_number': 75, 'season_number': 75,
'series': 'The Golden Globe Awards', 'series': 'Golden Globes',
'description': 'Oprah Winfrey receives the Cecil B. de Mille Award at the 75th Annual Golden Globe Awards.', 'description': 'Oprah Winfrey receives the Cecil B. de Mille Award at the 75th Annual Golden Globe Awards.',
'uploader': 'NBCU-COM', 'uploader': 'NBCU-COM',
'upload_date': '20180107', 'upload_date': '20180107',
'timestamp': 1515312000, 'timestamp': 1515312000,
'duration': 570.0, 'duration': 569.703,
'tags': 'count:8', 'tags': 'count:8',
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'chapters': 'count:1', 'media_type': 'Highlight',
'age_limit': 0,
'categories': ['Series/The Golden Globe Awards'],
'_old_archive_ids': ['theplatform 3646439'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{ {
# new video_id format # Needs to be extracted from webpage instead of GraphQL
'url': 'https://www.nbc.com/quantum-leap/video/bens-first-leap-nbcs-quantum-leap/NBCE125189978', 'url': 'https://www.nbc.com/paris2024/video/ali-truwit-found-purpose-pool-after-her-life-changed/para24_sww_alitruwittodayshow_240823',
'info_dict': { 'info_dict': {
'id': 'NBCE125189978', 'id': 'para24_sww_alitruwittodayshow_240823',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ben\'s First Leap | NBC\'s Quantum Leap', 'title': 'Ali Truwit found purpose in the pool after her life changed',
'description': 'md5:a82762449b7ec4bb83291a7b355ebf8e', 'description': 'md5:c16d7489e1516593de1cc5d3f39b9bdb',
'uploader': 'NBCU-COM', 'uploader': 'NBCU-SPORTS',
'series': 'Quantum Leap', 'duration': 311.077,
'season': 'Season 1',
'season_number': 1,
'episode': 'Ben\'s First Leap | NBC\'s Quantum Leap',
'episode_number': 1,
'duration': 170.171,
'chapters': [],
'timestamp': 1663956155,
'upload_date': '20220923',
'tags': 'count:10',
'age_limit': 0,
'thumbnail': r're:https?://.+\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'categories': ['Series/Quantum Leap 2022'], 'episode': 'Ali Truwit found purpose in the pool after her life changed',
'media_type': 'Highlight', 'timestamp': 1724435902.0,
'upload_date': '20240823',
'_old_archive_ids': ['theplatform para24_sww_alitruwittodayshow_240823'],
}, },
'params': { 'params': {
'skip_download': 'm3u8', 'skip_download': 'm3u8',
}, },
}, },
{
'url': 'https://www.nbc.com/quantum-leap/video/bens-first-leap-nbcs-quantum-leap/NBCE125189978',
'only_matching': True,
},
{ {
'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310', 'url': 'https://www.nbc.com/classic-tv/charles-in-charge/video/charles-in-charge-pilot/n3310',
'only_matching': True, 'only_matching': True,
@ -151,6 +227,7 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'only_matching': True, 'only_matching': True,
}, },
] ]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI1Yzg2YjdkYy04NDI3LTRjNDUtOGQwZi1iNDkzYmE3MmQwYjQiLCJuYmYiOjE1Nzg3MDM2MzEsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTc4NzAzNjMxfQ.QQKIsBhAjGQTMdAqRTqhcz2Cddr4Y2hEjnSiOeKKki4nLrkDOsjQMmqeTR0hSRarraxH54wBgLvsxI7LHwKMvr7G8QpynNAxylHlQD3yhN9tFhxt4KR5wW3as02B-W2TznK9bhNWPKIyHND95Uo2Mi6rEQoq8tM9O09WPWaanE5BX_-r6Llr6dPq5F0Lpx2QOn2xYRb1T4nFxdFTNoss8GBds8OvChTiKpXMLHegLTc1OS4H_1a8tO_37jDwSdJuZ8iTyRLV4kZ2cpL6OL5JPMObD4-HQiec_dfcYgMKPiIfP9ZqdXpec2SVaCLsWEk86ZYvD97hLIQrK5rrKd1y-A'
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = self._match_valid_url(url).groups() permalink, video_id = self._match_valid_url(url).groups()
@ -196,62 +273,50 @@ class NBCIE(ThePlatformIE): # XXX: Do not subclass from concrete IE
'userId': '0', 'userId': '0',
}), }),
})['data']['bonanzaPage']['metadata'] })['data']['bonanzaPage']['metadata']
query = {
'mbr': 'true', if not video_data:
'manifest': 'm3u', # Some videos are not available via GraphQL API
'switch': 'HLSServiceSecure', webpage = self._download_webpage(url, video_id)
} video_data = self._search_json(
r'<script>\s*PRELOAD\s*=', webpage, 'video data',
video_id)['pages'][urllib.parse.urlparse(url).path]['base']['metadata']
video_id = video_data['mpxGuid'] video_id = video_data['mpxGuid']
tp_path = 'NnzsPC/media/guid/{}/{}'.format(video_data.get('mpxAccountId') or '2410887629', video_id) tp_path = f'NnzsPC/media/guid/{video_data["mpxAccountId"]}/{video_id}'
tpm = self._download_theplatform_metadata(tp_path, video_id) tpm = self._download_theplatform_metadata(tp_path, video_id, fatal=False)
title = tpm.get('title') or video_data.get('secondaryTitle') title = traverse_obj(tpm, ('title', {str})) or video_data.get('secondaryTitle')
query = {}
if video_data.get('locked'): if video_data.get('locked'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
video_data.get('resourceId') or 'nbcentertainment', video_data['resourceId'], title, video_id, video_data.get('rating'))
title, video_id, video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource) url, video_id, 'nbcentertainment', resource, self._SOFTWARE_STATEMENT)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/{}/{}'.format(video_data.get('mpxAccountId') or '2410887629', video_id),
query), {'force_smil_url': True})
# Empty string or 0 can be valid values for these. So the check must be `is None` formats, subtitles = self._extract_nbcu_formats_and_subtitles(tp_path, video_id, query)
description = video_data.get('description') parsed_info = self._parse_theplatform_metadata(tpm)
if description is None: self._merge_subtitles(parsed_info['subtitles'], target=subtitles)
description = tpm.get('description')
episode_number = int_or_none(video_data.get('episodeNumber'))
if episode_number is None:
episode_number = int_or_none(tpm.get('nbcu$airOrder'))
rating = video_data.get('rating')
if rating is None:
try_get(tpm, lambda x: x['ratings'][0]['rating'])
season_number = int_or_none(video_data.get('seasonNumber'))
if season_number is None:
season_number = int_or_none(tpm.get('nbcu$seasonNumber'))
series = video_data.get('seriesShortTitle')
if series is None:
series = tpm.get('nbcu$seriesShortTitle')
tags = video_data.get('keywords')
if tags is None or len(tags) == 0:
tags = tpm.get('keywords')
return { return {
'_type': 'url_transparent', **traverse_obj(video_data, {
'age_limit': parse_age_limit(rating), 'description': ('description', {str}, filter),
'description': description, 'episode': ('secondaryTitle', {str}, filter),
'episode': title, 'episode_number': ('episodeNumber', {int_or_none}),
'episode_number': episode_number, 'season_number': ('seasonNumber', {int_or_none}),
'age_limit': ('rating', {parse_age_limit}),
'tags': ('keywords', ..., {str}, filter, all, filter),
'series': ('seriesShortTitle', {str}),
}),
**parsed_info,
'id': video_id, 'id': video_id,
'ie_key': 'ThePlatform',
'season_number': season_number,
'series': series,
'tags': tags,
'title': title, 'title': title,
'url': theplatform_url, 'formats': formats,
'subtitles': subtitles,
'_old_archive_ids': [make_archive_id('ThePlatform', video_id)],
} }
class NBCSportsVPlayerIE(InfoExtractor): class NBCSportsVPlayerIE(InfoExtractor):
_WORKING = False
_VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/' _VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/'
_VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)' _VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)'
_EMBED_REGEX = [rf'(?:iframe[^>]+|var video|div[^>]+data-(?:mpx-)?)[sS]rc\s?=\s?"(?P<url>{_VALID_URL_BASE}[^\"]+)'] _EMBED_REGEX = [rf'(?:iframe[^>]+|var video|div[^>]+data-(?:mpx-)?)[sS]rc\s?=\s?"(?P<url>{_VALID_URL_BASE}[^\"]+)']
@ -286,6 +351,7 @@ class NBCSportsVPlayerIE(InfoExtractor):
class NBCSportsIE(InfoExtractor): class NBCSportsIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TESTS = [{ _TESTS = [{
@ -321,6 +387,7 @@ class NBCSportsIE(InfoExtractor):
class NBCSportsStreamIE(AdobePassIE): class NBCSportsStreamIE(AdobePassIE):
_WORKING = False
_VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)' _VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559', 'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559',
@ -354,7 +421,7 @@ class NBCSportsStreamIE(AdobePassIE):
source_url = video_source['ottStreamUrl'] source_url = video_source['ottStreamUrl']
is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live' is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live'
resource = self._get_mvpd_resource('nbcsports', title, video_id, '') resource = self._get_mvpd_resource('nbcsports', title, video_id, '')
token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource) token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource, None) # XXX: None arg needs to be software_statement
tokenized_url = self._download_json( tokenized_url = self._download_json(
'https://token.playmakerservices.com/cdn', 'https://token.playmakerservices.com/cdn',
video_id, data=json.dumps({ video_id, data=json.dumps({
@ -534,22 +601,26 @@ class NBCOlympicsIE(InfoExtractor):
IE_NAME = 'nbcolympics' IE_NAME = 'nbcolympics'
_VALID_URL = r'https?://www\.nbcolympics\.com/videos?/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://www\.nbcolympics\.com/videos?/(?P<id>[0-9a-z-]+)'
_TEST = { _TESTS = [{
# Geo-restricted to US # Geo-restricted to US
'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold', 'url': 'https://www.nbcolympics.com/videos/watch-final-minutes-team-usas-mens-basketball-gold',
'md5': '54fecf846d05429fbaa18af557ee523a',
'info_dict': { 'info_dict': {
'id': 'WjTBzDXx5AUq', 'id': 'SAwGfPlQ1q01',
'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Rose\'s son Leo was in tears after his dad won gold', 'display_id': 'watch-final-minutes-team-usas-mens-basketball-gold',
'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.', 'title': 'Watch the final minutes of Team USA\'s men\'s basketball gold',
'timestamp': 1471274964, 'description': 'md5:f704f591217305c9559b23b877aa8d31',
'upload_date': '20160815',
'uploader': 'NBCU-SPORTS', 'uploader': 'NBCU-SPORTS',
'duration': 387.053,
'thumbnail': r're:https://.+/.+\.jpg',
'chapters': [],
'timestamp': 1723346984,
'upload_date': '20240811',
}, },
'skip': '404 Not Found', }, {
} 'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -578,6 +649,7 @@ class NBCOlympicsIE(InfoExtractor):
class NBCOlympicsStreamIE(AdobePassIE): class NBCOlympicsStreamIE(AdobePassIE):
_WORKING = False
IE_NAME = 'nbcolympics:stream' IE_NAME = 'nbcolympics:stream'
_VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)'
_TESTS = [ _TESTS = [
@ -630,7 +702,8 @@ class NBCOlympicsStreamIE(AdobePassIE):
event_config.get('resourceId', 'NBCOlympics'), event_config.get('resourceId', 'NBCOlympics'),
re.sub(r'[^\w\d ]+', '', event_config['eventTitle']), pid, re.sub(r'[^\w\d ]+', '', event_config['eventTitle']), pid,
event_config.get('ratingId', 'NO VALUE')) event_config.get('ratingId', 'NO VALUE'))
media_token = self._extract_mvpd_auth(url, pid, event_config.get('requestorId', 'NBCOlympics'), ap_resource) # XXX: The None arg below needs to be the software_statement for this requestor
media_token = self._extract_mvpd_auth(url, pid, event_config.get('requestorId', 'NBCOlympics'), ap_resource, None)
source_url = self._download_json( source_url = self._download_json(
'https://tokens.playmakerservices.com/', pid, 'Retrieving tokenized URL', 'https://tokens.playmakerservices.com/', pid, 'Retrieving tokenized URL',
@ -848,3 +921,178 @@ class NBCStationsIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
**info, **info,
} }
class BravoTVIE(NBCUniversalBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:bravotv|oxygen)\.com/(?:[^/?#]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'info_dict': {
'id': '3923059',
'ext': 'mp4',
'title': 'The Top Chef Season 16 Winner Is...',
'display_id': 'the-top-chef-season-16-winner-is',
'description': 'Find out who takes the title of Top Chef!',
'upload_date': '20190315',
'timestamp': 1552618860,
'season_number': 16,
'episode_number': 15,
'series': 'Top Chef',
'episode': 'Finale',
'duration': 190,
'season': 'Season 16',
'thumbnail': r're:^https://.+\.jpg',
'uploader': 'NBCU-BRAV',
'categories': ['Series', 'Series/Top Chef'],
'tags': 'count:10',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/top-chef/season-20/episode-1/london-calling',
'info_dict': {
'id': '9000234570',
'ext': 'mp4',
'title': 'London Calling',
'display_id': 'london-calling',
'description': 'md5:5af95a8cbac1856bd10e7562f86bb759',
'upload_date': '20230310',
'timestamp': 1678418100,
'season_number': 20,
'episode_number': 1,
'series': 'Top Chef',
'episode': 'London Calling',
'duration': 3266,
'season': 'Season 20',
'chapters': 'count:7',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Top Chef'],
'tags': 'count:10',
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-1/closing-night',
'info_dict': {
'id': '3692045',
'ext': 'mp4',
'title': 'Closing Night',
'display_id': 'closing-night',
'description': 'md5:c8a5bb523c8ef381f3328c6d9f1e4632',
'upload_date': '20230126',
'timestamp': 1674709200,
'season_number': 1,
'episode_number': 1,
'series': 'In Ice Cold Blood',
'episode': 'Closing Night',
'duration': 2629,
'season': 'Season 1',
'chapters': 'count:6',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/In Ice Cold Blood'],
'tags': ['ice-t', 'in ice cold blood', 'law and order', 'oxygen', 'true crime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'info_dict': {
'id': '3974019',
'ext': 'mp4',
'title': '\'Handling The Horwitz House After The Murder (Season 2, Episode 16)',
'display_id': 'handling-the-horwitz-house-after-the-murder-season-2',
'description': 'md5:f9d638dd6946a1c1c0533a9c6100eae5',
'upload_date': '20190618',
'timestamp': 1560819600,
'season_number': 2,
'episode_number': 16,
'series': 'In Ice Cold Blood',
'episode': 'Mother Vs Son',
'duration': 68,
'season': 'Season 2',
'thumbnail': r're:^https://.+\.jpg',
'age_limit': 14,
'uploader': 'NBCU-OXY',
'categories': ['Series/In Ice Cold Blood'],
'tags': ['in ice cold blood', 'ice-t', 'law and order', 'true crime', 'oxygen'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_nbcu_video(url, display_id)
class SyfyIE(NBCUniversalBaseIE):
_VALID_URL = r'https?://(?:www\.)?syfy\.com/[^/?#]+/(?:season-\d+/episode-\d+/(?:videos/)?|videos/)(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.syfy.com/face-off/season-13/episode-10/videos/keyed-up',
'info_dict': {
'id': '3774403',
'ext': 'mp4',
'display_id': 'keyed-up',
'title': 'Keyed Up',
'description': 'md5:feafd15bee449f212dcd3065bbe9a755',
'age_limit': 14,
'duration': 169,
'thumbnail': r're:https://www\.syfy\.com/.+/.+\.jpg',
'series': 'Face Off',
'season': 'Season 13',
'season_number': 13,
'episode': 'Through the Looking Glass Part 2',
'episode_number': 10,
'timestamp': 1533711618,
'upload_date': '20180808',
'media_type': 'Excerpt',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Face Off'],
'tags': 'count:15',
'_old_archive_ids': ['theplatform 3774403'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.syfy.com/face-off/season-13/episode-10/through-the-looking-glass-part-2',
'info_dict': {
'id': '3772391',
'ext': 'mp4',
'display_id': 'through-the-looking-glass-part-2',
'title': 'Through the Looking Glass Pt.2',
'description': 'md5:90bd5dcbf1059fe3296c263599af41d2',
'age_limit': 0,
'duration': 2599,
'thumbnail': r're:https://www\.syfy\.com/.+/.+\.jpg',
'chapters': [{'start_time': 0.0, 'end_time': 679.0, 'title': '<Untitled Chapter 1>'},
{'start_time': 679.0, 'end_time': 1040.967, 'title': '<Untitled Chapter 2>'},
{'start_time': 1040.967, 'end_time': 1403.0, 'title': '<Untitled Chapter 3>'},
{'start_time': 1403.0, 'end_time': 1870.0, 'title': '<Untitled Chapter 4>'},
{'start_time': 1870.0, 'end_time': 2496.967, 'title': '<Untitled Chapter 5>'},
{'start_time': 2496.967, 'end_time': 2599, 'title': '<Untitled Chapter 6>'}],
'series': 'Face Off',
'season': 'Season 13',
'season_number': 13,
'episode': 'Through the Looking Glass Part 2',
'episode_number': 10,
'timestamp': 1672570800,
'upload_date': '20230101',
'media_type': 'Full Episode',
'uploader': 'NBCU-MPAT',
'categories': ['Series/Face Off'],
'tags': 'count:15',
'_old_archive_ids': ['theplatform 3772391'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'This video requires AdobePass MSO credentials',
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._extract_nbcu_video(url, display_id, old_ie_key='ThePlatform')

View File

@ -32,7 +32,7 @@ from ..utils import (
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
from ..utils.traversal import find_element, traverse_obj from ..utils.traversal import find_element, require, traverse_obj
class NiconicoBaseIE(InfoExtractor): class NiconicoBaseIE(InfoExtractor):
@ -283,35 +283,54 @@ class NiconicoIE(NiconicoBaseIE):
lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1 lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1
yield video_fmt yield video_fmt
def _extract_server_response(self, webpage, video_id, fatal=True):
try:
return traverse_obj(
self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id),
('data', 'response', {dict}, {require('server response')}))
except ExtractorError:
if not fatal:
return {}
raise
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
try: try:
webpage, handle = self._download_webpage_handle( webpage, handle = self._download_webpage_handle(
'https://www.nicovideo.jp/watch/' + video_id, video_id) f'https://www.nicovideo.jp/watch/{video_id}', video_id,
headers=self.geo_verification_headers())
if video_id.startswith('so'): if video_id.startswith('so'):
video_id = self._match_id(handle.url) video_id = self._match_id(handle.url)
api_data = traverse_obj( api_data = self._extract_server_response(webpage, video_id)
self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id),
('data', 'response', {dict}))
if not api_data:
raise ExtractorError('Server response data not found')
except ExtractorError as e: except ExtractorError as e:
try: try:
api_data = self._download_json( api_data = self._download_json(
f'https://www.nicovideo.jp/api/watch/v3/{video_id}?_frontendId=6&_frontendVersion=0&actionTrackId=AAAAAAAAAA_{round(time.time() * 1000)}', video_id, f'https://www.nicovideo.jp/api/watch/v3/{video_id}', video_id,
note='Downloading API JSON', errnote='Unable to fetch data')['data'] 'Downloading API JSON', 'Unable to fetch data', query={
'_frontendId': '6',
'_frontendVersion': '0',
'actionTrackId': f'AAAAAAAAAA_{round(time.time() * 1000)}',
}, headers=self.geo_verification_headers())['data']
except ExtractorError: except ExtractorError:
if not isinstance(e.cause, HTTPError): if not isinstance(e.cause, HTTPError):
# Raise if original exception was from _parse_json or utils.traversal.require
raise raise
# The webpage server response has more detailed error info than the API response
webpage = e.cause.response.read().decode('utf-8', 'replace') webpage = e.cause.response.read().decode('utf-8', 'replace')
error_msg = self._html_search_regex( reason_code = self._extract_server_response(
r'(?s)<section\s+class="(?:(?:ErrorMessage|WatchExceptionPage-message)\s*)+">(.+?)</section>', webpage, video_id, fatal=False).get('reasonCode')
webpage, 'error reason', default=None) if not reason_code:
if not error_msg:
raise raise
raise ExtractorError(clean_html(error_msg), expected=True) if reason_code in ('DOMESTIC_VIDEO', 'HIGH_RISK_COUNTRY_VIDEO'):
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
elif reason_code == 'HIDDEN_VIDEO':
raise ExtractorError(
'The viewing period of this video has expired', expected=True)
elif reason_code == 'DELETED_VIDEO':
raise ExtractorError('This video has been deleted', expected=True)
raise ExtractorError(f'Niconico says: {reason_code}')
availability = self._availability(**(traverse_obj(api_data, ('payment', 'video', { availability = self._availability(**(traverse_obj(api_data, ('payment', 'video', {
'needs_premium': ('isPremium', {bool}), 'needs_premium': ('isPremium', {bool}),

View File

@ -340,8 +340,9 @@ class PatreonIE(PatreonBaseIE):
'channel_follower_count': ('attributes', 'patron_count', {int_or_none}), 'channel_follower_count': ('attributes', 'patron_count', {int_or_none}),
})) }))
# all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, Vimeo # Must be all-lowercase 'referer' so we can smuggle it to Generic, SproutVideo, and Vimeo.
headers = {'referer': 'https://patreon.com/'} # patreon.com URLs redirect to www.patreon.com; this matters when requesting mux.com m3u8s
headers = {'referer': 'https://www.patreon.com/'}
# handle Vimeo embeds # handle Vimeo embeds
if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo': if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo':
@ -352,7 +353,7 @@ class PatreonIE(PatreonBaseIE):
v_url, video_id, 'Checking Vimeo embed URL', headers=headers, v_url, video_id, 'Checking Vimeo embed URL', headers=headers,
fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection
entries.append(self.url_result( entries.append(self.url_result(
VimeoIE._smuggle_referrer(v_url, 'https://patreon.com/'), VimeoIE._smuggle_referrer(v_url, headers['referer']),
VimeoIE, url_transparent=True)) VimeoIE, url_transparent=True))
embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none})) embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none}))
@ -379,11 +380,13 @@ class PatreonIE(PatreonBaseIE):
'url': post_file['url'], 'url': post_file['url'],
}) })
elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8': elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8':
formats, subtitles = self._extract_m3u8_formats_and_subtitles(post_file['url'], video_id) formats, subtitles = self._extract_m3u8_formats_and_subtitles(
post_file['url'], video_id, headers=headers)
entries.append({ entries.append({
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'http_headers': headers,
}) })
can_view_post = traverse_obj(attributes, 'current_user_can_view') can_view_post = traverse_obj(attributes, 'current_user_can_view')

View File

@ -10,7 +10,8 @@ from ..utils import (
class PicartoIE(InfoExtractor): class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)' IE_NAME = 'picarto'
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[^/#?]+)/?(?:$|[?#])'
_TEST = { _TEST = {
'url': 'https://picarto.tv/Setz', 'url': 'https://picarto.tv/Setz',
'info_dict': { 'info_dict': {
@ -89,7 +90,8 @@ class PicartoIE(InfoExtractor):
class PicartoVodIE(InfoExtractor): class PicartoVodIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?picarto\.tv/(?:videopopout|\w+/videos)/(?P<id>[^/?#&]+)' IE_NAME = 'picarto:vod'
_VALID_URL = r'https?://(?:www\.)?picarto\.tv/(?:videopopout|\w+(?:/profile)?/videos)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://picarto.tv/videopopout/ArtofZod_2017.12.12.00.13.23.flv', 'url': 'https://picarto.tv/videopopout/ArtofZod_2017.12.12.00.13.23.flv',
'md5': '3ab45ba4352c52ee841a28fb73f2d9ca', 'md5': '3ab45ba4352c52ee841a28fb73f2d9ca',
@ -111,6 +113,18 @@ class PicartoVodIE(InfoExtractor):
'channel': 'ArtofZod', 'channel': 'ArtofZod',
'age_limit': 18, 'age_limit': 18,
}, },
}, {
'url': 'https://picarto.tv/DrechuArt/profile/videos/400347',
'md5': 'f9ea54868b1d9dec40eb554b484cc7bf',
'info_dict': {
'id': '400347',
'ext': 'mp4',
'title': 'Welcome to the Show',
'thumbnail': r're:^https?://.*\.jpg',
'channel': 'DrechuArt',
'age_limit': 0,
},
}, { }, {
'url': 'https://picarto.tv/videopopout/Plague', 'url': 'https://picarto.tv/videopopout/Plague',
'only_matching': True, 'only_matching': True,

View File

@ -9,11 +9,10 @@ from ..utils import (
int_or_none, int_or_none,
join_nonempty, join_nonempty,
parse_qs, parse_qs,
traverse_obj,
update_url_query, update_url_query,
urlencode_postdata, urlencode_postdata,
) )
from ..utils.traversal import unpack from ..utils.traversal import traverse_obj, unpack
class PlaySuisseIE(InfoExtractor): class PlaySuisseIE(InfoExtractor):

View File

@ -5,11 +5,13 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
OnDemandPagedList, OnDemandPagedList,
float_or_none, float_or_none,
int_or_none,
orderedSet,
str_or_none, str_or_none,
str_to_int,
traverse_obj,
unified_timestamp, unified_timestamp,
url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class PodchaserIE(InfoExtractor): class PodchaserIE(InfoExtractor):
@ -21,24 +23,25 @@ class PodchaserIE(InfoExtractor):
'id': '104365585', 'id': '104365585',
'title': 'Ep. 285 freeze me off', 'title': 'Ep. 285 freeze me off',
'description': 'cam ahn', 'description': 'cam ahn',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:https?://.+/.+\.jpg',
'ext': 'mp3', 'ext': 'mp3',
'categories': ['Comedy'], 'categories': ['Comedy', 'News', 'Politics', 'Arts'],
'tags': ['comedy', 'dark humor'], 'tags': ['comedy', 'dark humor'],
'series': 'Cum Town', 'series': 'The Adam Friedland Show Podcast',
'duration': 3708, 'duration': 3708,
'timestamp': 1636531259, 'timestamp': 1636531259,
'upload_date': '20211110', 'upload_date': '20211110',
'average_rating': 4.0, 'average_rating': 4.0,
'series_id': '36924',
}, },
}, { }, {
'url': 'https://www.podchaser.com/podcasts/the-bone-zone-28853', 'url': 'https://www.podchaser.com/podcasts/the-bone-zone-28853',
'info_dict': { 'info_dict': {
'id': '28853', 'id': '28853',
'title': 'The Bone Zone', 'title': 'The Bone Zone',
'description': 'Podcast by The Bone Zone', 'description': r're:The official home of the Bone Zone podcast.+',
}, },
'playlist_count': 275, 'playlist_mincount': 275,
}, { }, {
'url': 'https://www.podchaser.com/podcasts/sean-carrolls-mindscape-scienc-699349/episodes', 'url': 'https://www.podchaser.com/podcasts/sean-carrolls-mindscape-scienc-699349/episodes',
'info_dict': { 'info_dict': {
@ -51,19 +54,33 @@ class PodchaserIE(InfoExtractor):
@staticmethod @staticmethod
def _parse_episode(episode, podcast): def _parse_episode(episode, podcast):
return { info = traverse_obj(episode, {
'id': str(episode.get('id')), 'id': ('id', {int}, {str_or_none}, {require('episode ID')}),
'title': episode.get('title'), 'title': ('title', {str}),
'description': episode.get('description'), 'description': ('description', {str}),
'url': episode.get('audio_url'), 'url': ('audio_url', {url_or_none}),
'thumbnail': episode.get('image_url'), 'thumbnail': ('image_url', {url_or_none}),
'duration': str_to_int(episode.get('length')), 'duration': ('length', {int_or_none}),
'timestamp': unified_timestamp(episode.get('air_date')), 'timestamp': ('air_date', {unified_timestamp}),
'average_rating': float_or_none(episode.get('rating')), 'average_rating': ('rating', {float_or_none}),
'categories': list(set(traverse_obj(podcast, (('summary', None), 'categories', ..., 'text')))), })
'tags': traverse_obj(podcast, ('tags', ..., 'text')), info.update(traverse_obj(podcast, {
'series': podcast.get('title'), 'series': ('title', {str}),
} 'series_id': ('id', {int}, {str_or_none}),
'categories': (('summary', None), 'categories', ..., 'text', {str}, filter, all, {orderedSet}),
'tags': ('tags', ..., 'text', {str}),
}))
info['vcodec'] = 'none'
if info.get('series_id'):
podcast_slug = traverse_obj(podcast, ('slug', {str})) or 'podcast'
episode_slug = traverse_obj(episode, ('slug', {str})) or 'episode'
info['webpage_url'] = '/'.join((
'https://www.podchaser.com/podcasts',
'-'.join((podcast_slug[:30].rstrip('-'), info['series_id'])),
'-'.join((episode_slug[:30].rstrip('-'), info['id']))))
return info
def _call_api(self, path, *args, **kwargs): def _call_api(self, path, *args, **kwargs):
return self._download_json(f'https://api.podchaser.com/{path}', *args, **kwargs) return self._download_json(f'https://api.podchaser.com/{path}', *args, **kwargs)
@ -93,5 +110,5 @@ class PodchaserIE(InfoExtractor):
OnDemandPagedList(functools.partial(self._fetch_page, podcast_id, podcast), self._PAGE_SIZE), OnDemandPagedList(functools.partial(self._fetch_page, podcast_id, podcast), self._PAGE_SIZE),
str_or_none(podcast.get('id')), podcast.get('title'), podcast.get('description')) str_or_none(podcast.get('id')), podcast.get('title'), podcast.get('description'))
episode = self._call_api(f'episodes/{episode_id}', episode_id) episode = self._call_api(f'podcasts/{podcast_id}/episodes/{episode_id}/player_ids', episode_id)
return self._parse_episode(episode, podcast) return self._parse_episode(episode, podcast)

View File

@ -697,7 +697,7 @@ class SoundcloudIE(SoundcloudBaseIE):
try: try:
return self._extract_info_dict(info, full_title, token) return self._extract_info_dict(info, full_title, token)
except ExtractorError as e: except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429: if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
raise raise
self.report_warning( self.report_warning(
'You have reached the API rate limit, which is ~600 requests per ' 'You have reached the API rate limit, which is ~600 requests per '

View File

@ -1,58 +0,0 @@
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class SyfyIE(AdobePassIE):
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
'info_dict': {
'id': '2968097',
'ext': 'mp4',
'title': 'The Internet Ruined My Life: Season 1 Trailer',
'description': 'One tweet, one post, one click, can destroy everything.',
'uploader': 'NBCU-MPAT',
'upload_date': '20170113',
'timestamp': 1484345640,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
'skip': 'Redirects to main page',
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
syfy_mpx = next(iter(self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
display_id)['syfy']['syfy_mpx'].values()))
video_id = syfy_mpx['mpxGUID']
title = syfy_mpx['episodeTitle']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
if syfy_mpx.get('entitlement') == 'auth':
resource = self._get_mvpd_resource(
'syfy', title, video_id,
syfy_mpx.get('mpxRating', 'TV-14'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'syfy', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
self._proto_relative_url(syfy_mpx['releaseURL']), query),
{'force_smil_url': True}),
'title': title,
'id': video_id,
'display_id': display_id,
}

View File

@ -32,6 +32,10 @@ class TBSIE(TurnerBaseIE):
'url': 'http://www.tntdrama.com/movies/star-wars-a-new-hope', 'url': 'http://www.tntdrama.com/movies/star-wars-a-new-hope',
'only_matching': True, 'only_matching': True,
}] }]
_SOFTWARE_STATEMENT_MAP = {
'tbs': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJkZTA0NTYxZS1iMTFhLTRlYTgtYTg5NC01NjI3MGM1NmM2MWIiLCJuYmYiOjE1MzcxODkzOTAsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTg5MzkwfQ.Z7ny66kaqNDdCHf9Y9KsV12LrBxrLkGGxlYe2XGm6qsw2T-k1OCKC1TMzeqiZP735292MMRAQkcJDKrMIzNbAuf9nCdIcv4kE1E2nqUnjPMBduC1bHffZp8zlllyrN2ElDwM8Vhwv_5nElLRwWGEt0Kaq6KJAMZA__WDxKWC18T-wVtsOZWXQpDqO7nByhfj2t-Z8c3TUNVsA_wHgNXlkzJCZ16F2b7yGLT5ZhLPupOScd3MXC5iPh19HSVIok22h8_F_noTmGzmMnIRQi6bWYWK2zC7TQ_MsYHfv7V6EaG5m1RKZTV6JAwwoJQF_9ByzarLV1DGwZxD9-eQdqswvg',
'tntdrama': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwOTMxYTU4OS1jZjEzLTRmNjMtYTJmYy03MzhjMjE1NWU5NjEiLCJuYmYiOjE1MzcxOTA4MjcsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwODI3fQ.AucKvtws7oekTXi80_zX4-BlgJD9GLvlOI9FlBCjdlx7Pa3eJ0AqbogynKMiatMbnLOTMHGjd7tTiq422unmZjBz70dhePAe9BbW0dIo7oQ57vZ-VBYw_tWYRPmON61MwAbLVlqROD3n_zURs85S8TlkQx9aNx9x_riGGELjd8l05CVa_pOluNhYvuIFn6wmrASOKI1hNEblBDWh468UWP571-fe4zzi0rlYeeHd-cjvtWvOB3bQsWrUVbK4pRmqvzEH59j0vNF-ihJF9HncmUicYONe47Mib3elfMok23v4dB1_UAlQY_oawfNcynmEnJQCcqFmbHdEwTW6gMiYsA',
}
def _real_extract(self, url): def _real_extract(self, url):
site, path, display_id = self._match_valid_url(url).groups() site, path, display_id = self._match_valid_url(url).groups()
@ -48,7 +52,7 @@ class TBSIE(TurnerBaseIE):
drupal_settings['ngtv_token_url']).query) drupal_settings['ngtv_token_url']).query)
info = self._extract_ngtv_info( info = self._extract_ngtv_info(
media_id, tokenizer_query, { media_id, tokenizer_query, self._SOFTWARE_STATEMENT_MAP[site], {
'url': url, 'url': url,
'site_name': site[:3].upper(), 'site_name': site[:3].upper(),
'auth_required': video_data.get('authRequired') == '1' or is_live, 'auth_required': video_data.get('authRequired') == '1' or is_live,

View File

@ -156,6 +156,7 @@ class TeamcocoIE(TeamcocoBaseIE):
class ConanClassicIE(TeamcocoBaseIE): class ConanClassicIE(TeamcocoBaseIE):
_WORKING = False
_VALID_URL = r'https?://(?:(?:www\.)?conanclassic|conan25\.teamcoco)\.com/(?P<id>([^/]+/)*[^/?#]+)' _VALID_URL = r'https?://(?:(?:www\.)?conanclassic|conan25\.teamcoco)\.com/(?P<id>([^/]+/)*[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://conanclassic.com/video/ice-cube-kevin-hart-conan-share-lyft', 'url': 'https://conanclassic.com/video/ice-cube-kevin-hart-conan-share-lyft',
@ -263,7 +264,7 @@ class ConanClassicIE(TeamcocoBaseIE):
info.update(self._extract_ngtv_info(media_id, { info.update(self._extract_ngtv_info(media_id, {
'accessToken': token, 'accessToken': token,
'accessTokenType': 'jws', 'accessTokenType': 'jws',
})) }, None)) # TODO: the None arg needs to be the AdobePass software_statement
else: else:
formats, subtitles = self._get_formats_and_subtitles( formats, subtitles = self._get_formats_and_subtitles(
traverse_obj(response, ('data', 'findRecordVideoMetadata')), video_id) traverse_obj(response, ('data', 'findRecordVideoMetadata')), video_id)

View File

@ -6,32 +6,32 @@ from ..utils import int_or_none, traverse_obj, url_or_none, urljoin
class TenPlayIE(InfoExtractor): class TenPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})' IE_NAME = '10play'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/?#]+/)+(?P<id>tpv\d{6}[a-z]{5})'
_NETRC_MACHINE = '10play' _NETRC_MACHINE = '10play'
_TESTS = [{ _TESTS = [{
'url': 'https://10play.com.au/neighbours/web-extras/season-41/heres-a-first-look-at-mischa-bartons-neighbours-debut/tpv230911hyxnz', # Geo-restricted to Australia
'url': 'https://10play.com.au/australian-survivor/web-extras/season-10-brains-v-brawn-ii/myless-journey/tpv250414jdmtf',
'info_dict': { 'info_dict': {
'id': '6336940246112', 'id': '7440980000013868',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Here\'s A First Look At Mischa Barton\'s Neighbours Debut', 'title': 'Myles\'s Journey',
'alt_title': 'Here\'s A First Look At Mischa Barton\'s Neighbours Debut', 'alt_title': 'Myles\'s Journey',
'description': 'Neighbours Premieres Monday, September 18 At 4:30pm On 10 And 10 Play And 6:30pm On 10 Peach', 'description': 'Relive Myles\'s epic Brains V Brawn II journey to reach the game\'s final two',
'duration': 74,
'season': 'Season 41',
'season_number': 41,
'series': 'Neighbours',
'thumbnail': r're:https://.*\.jpg',
'uploader': 'Channel 10', 'uploader': 'Channel 10',
'age_limit': 15,
'timestamp': 1694386800,
'upload_date': '20230910',
'uploader_id': '2199827728001', 'uploader_id': '2199827728001',
'age_limit': 15,
'duration': 249,
'thumbnail': r're:https://.+/.+\.jpg',
'series': 'Australian Survivor',
'season': 'Season 10',
'season_number': 10,
'timestamp': 1744629420,
'upload_date': '20250414',
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': True,
},
'skip': 'Only available in Australia',
}, { }, {
# Geo-restricted to Australia
'url': 'https://10play.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp', 'url': 'https://10play.com.au/neighbours/episodes/season-42/episode-9107/tpv240902nzqyp',
'info_dict': { 'info_dict': {
'id': '9000000000091177', 'id': '9000000000091177',
@ -45,17 +45,38 @@ class TenPlayIE(InfoExtractor):
'season': 'Season 42', 'season': 'Season 42',
'season_number': 42, 'season_number': 42,
'series': 'Neighbours', 'series': 'Neighbours',
'thumbnail': r're:https://.*\.jpg', 'thumbnail': r're:https://.+/.+\.jpg',
'age_limit': 15, 'age_limit': 15,
'timestamp': 1725517860, 'timestamp': 1725517860,
'upload_date': '20240905', 'upload_date': '20240905',
'uploader': 'Channel 10', 'uploader': 'Channel 10',
'uploader_id': '2199827728001', 'uploader_id': '2199827728001',
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': True, }, {
# Geo-restricted to Australia; upgrading the m3u8 quality fails and we need the fallback
'url': 'https://10play.com.au/tiny-chef-show/episodes/season-1/episode-2/tpv240228pofvt',
'info_dict': {
'id': '9000000000084116',
'ext': 'mp4',
'uploader': 'Channel 10',
'uploader_id': '2199827728001',
'duration': 1297,
'title': 'The Tiny Chef Show - S1 Ep. 2',
'alt_title': 'S1 Ep. 2 - Popcorn/banana',
'description': 'md5:d4758b52b5375dfaa67a78261dcb5763',
'age_limit': 0,
'series': 'The Tiny Chef Show',
'season_number': 1,
'episode_number': 2,
'timestamp': 1747957740,
'thumbnail': r're:https://.+/.+\.jpg',
'upload_date': '20250522',
'season': 'Season 1',
'episode': 'Episode 2',
}, },
'skip': 'Only available in Australia', 'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information: HTTP Error 502'],
}, { }, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc', 'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True, 'only_matching': True,
@ -86,8 +107,11 @@ class TenPlayIE(InfoExtractor):
if '10play-not-in-oz' in m3u8_url: if '10play-not-in-oz' in m3u8_url:
self.raise_geo_restricted(countries=['AU']) self.raise_geo_restricted(countries=['AU'])
# Attempt to get a higher quality stream # Attempt to get a higher quality stream
m3u8_url = m3u8_url.replace(',150,75,55,0000', ',300,150,75,55,0000') formats = self._extract_m3u8_formats(
formats = self._extract_m3u8_formats(m3u8_url, content_id, 'mp4') m3u8_url.replace(',150,75,55,0000', ',300,150,75,55,0000'),
content_id, 'mp4', fatal=False)
if not formats:
formats = self._extract_m3u8_formats(m3u8_url, content_id, 'mp4')
return { return {
'id': content_id, 'id': content_id,
@ -112,21 +136,22 @@ class TenPlayIE(InfoExtractor):
class TenPlaySeasonIE(InfoExtractor): class TenPlaySeasonIE(InfoExtractor):
IE_NAME = '10play:season'
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?P<show>[^/?#]+)/episodes/(?P<season>[^/?#]+)/?(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-14', 'url': 'https://10play.com.au/masterchef/episodes/season-15',
'info_dict': { 'info_dict': {
'title': 'Season 14', 'title': 'Season 15',
'id': 'MjMyOTIy', 'id': 'MTQ2NjMxOQ==',
}, },
'playlist_mincount': 64, 'playlist_mincount': 50,
}, { }, {
'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2022', 'url': 'https://10play.com.au/the-bold-and-the-beautiful-fast-tracked/episodes/season-2024',
'info_dict': { 'info_dict': {
'title': 'Season 2022', 'title': 'Season 2024',
'id': 'Mjc0OTIw', 'id': 'Mjc0OTIw',
}, },
'playlist_mincount': 256, 'playlist_mincount': 159,
}] }]
def _entries(self, load_more_url, display_id=None): def _entries(self, load_more_url, display_id=None):

View File

@ -12,11 +12,13 @@ from ..utils import (
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
parse_age_limit,
parse_qs, parse_qs,
traverse_obj, traverse_obj,
unsmuggle_url, unsmuggle_url,
update_url, update_url,
update_url_query, update_url_query,
url_or_none,
urlhandle_detect_ext, urlhandle_detect_ext,
xpath_with_ns, xpath_with_ns,
) )
@ -63,62 +65,53 @@ class ThePlatformBaseIE(AdobePassIE):
return formats, subtitles return formats, subtitles
def _download_theplatform_metadata(self, path, video_id): def _download_theplatform_metadata(self, path, video_id, fatal=True):
info_url = f'http://link.theplatform.{self._TP_TLD}/s/{path}?format=preview' return self._download_json(
return self._download_json(info_url, video_id) f'https://link.theplatform.{self._TP_TLD}/s/{path}', video_id,
fatal=fatal, query={'format': 'preview'}) or {}
def _parse_theplatform_metadata(self, info): @staticmethod
subtitles = {} def _parse_theplatform_metadata(tp_metadata):
captions = info.get('captions') def site_specific_filter(*fields):
if isinstance(captions, list): return lambda k, v: v and k.endswith(tuple(f'${f}' for f in fields))
for caption in captions:
lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
subtitles.setdefault(lang, []).append({
'ext': mimetype2ext(mime),
'url': src,
})
duration = info.get('duration') info = traverse_obj(tp_metadata, {
tp_chapters = info.get('chapters', []) 'title': ('title', {str}),
chapters = [] 'episode': ('title', {str}),
if tp_chapters: 'description': ('description', {str}),
def _add_chapter(start_time, end_time): 'thumbnail': ('defaultThumbnailUrl', {url_or_none}),
start_time = float_or_none(start_time, 1000) 'duration': ('duration', {float_or_none(scale=1000)}),
end_time = float_or_none(end_time, 1000) 'timestamp': ('pubDate', {float_or_none(scale=1000)}),
if start_time is None or end_time is None: 'uploader': ('billingCode', {str}),
return 'creators': ('author', {str}, filter, all, filter),
chapters.append({ 'categories': (
'start_time': start_time, 'categories', lambda _, v: v.get('label') in ['category', None],
'end_time': end_time, 'name', {str}, filter, all, filter),
}) 'tags': ('keywords', {str}, filter, {lambda x: re.split(r'[;,]\s?', x)}, filter),
'age_limit': ('ratings', ..., 'rating', {parse_age_limit}, any),
'season_number': (site_specific_filter('seasonNumber'), {int_or_none}, any),
'episode_number': (site_specific_filter('episodeNumber', 'airOrder'), {int_or_none}, any),
'series': (site_specific_filter('show', 'seriesTitle', 'seriesShortTitle'), (None, ...), {str}, any),
'location': (site_specific_filter('region'), {str}, any),
'media_type': (site_specific_filter('programmingType', 'type'), {str}, any),
})
for chapter in tp_chapters[:-1]: chapters = traverse_obj(tp_metadata, ('chapters', ..., {
_add_chapter(chapter.get('startTime'), chapter.get('endTime')) 'start_time': ('startTime', {float_or_none(scale=1000)}),
_add_chapter(tp_chapters[-1].get('startTime'), tp_chapters[-1].get('endTime') or duration) 'end_time': ('endTime', {float_or_none(scale=1000)}),
}))
# Ignore pointless single chapters from short videos that span the entire video's duration
if len(chapters) > 1 or traverse_obj(chapters, (0, 'end_time')):
info['chapters'] = chapters
def extract_site_specific_field(field): info['subtitles'] = {}
# A number of sites have custom-prefixed keys, e.g. 'cbc$seasonNumber' for caption in traverse_obj(tp_metadata, ('captions', lambda _, v: url_or_none(v['src']))):
return traverse_obj(info, lambda k, v: v and k.endswith(f'${field}'), get_all=False) info['subtitles'].setdefault(caption.get('lang') or 'en', []).append({
'url': caption['src'],
'ext': mimetype2ext(caption.get('type')),
})
return { return info
'title': info['title'],
'subtitles': subtitles,
'description': info['description'],
'thumbnail': info['defaultThumbnailUrl'],
'duration': float_or_none(duration, 1000),
'timestamp': int_or_none(info.get('pubDate'), 1000) or None,
'uploader': info.get('billingCode'),
'chapters': chapters,
'creator': traverse_obj(info, ('author', {str})) or None,
'categories': traverse_obj(info, (
'categories', lambda _, v: v.get('label') in ('category', None), 'name', {str})) or None,
'tags': traverse_obj(info, ('keywords', {lambda x: re.split(r'[;,]\s?', x) if x else None})),
'location': extract_site_specific_field('region'),
'series': extract_site_specific_field('show') or extract_site_specific_field('seriesTitle'),
'season_number': int_or_none(extract_site_specific_field('seasonNumber')),
'episode_number': int_or_none(extract_site_specific_field('episodeNumber')),
'media_type': extract_site_specific_field('programmingType') or extract_site_specific_field('type'),
}
def _extract_theplatform_metadata(self, path, video_id): def _extract_theplatform_metadata(self, path, video_id):
info = self._download_theplatform_metadata(path, video_id) info = self._download_theplatform_metadata(path, video_id)

121
yt_dlp/extractor/toutiao.py Normal file
View File

@ -0,0 +1,121 @@
import json
import urllib.parse
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
str_or_none,
try_call,
url_or_none,
)
from ..utils.traversal import find_element, traverse_obj
class ToutiaoIE(InfoExtractor):
IE_NAME = 'toutiao'
IE_DESC = '今日头条'
_VALID_URL = r'https?://www\.toutiao\.com/video/(?P<id>\d+)/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.toutiao.com/video/7505382061495176511/',
'info_dict': {
'id': '7505382061495176511',
'ext': 'mp4',
'title': '新疆多地现不明飞行物,目击者称和月亮一样亮,几秒内突然加速消失,气象部门回应',
'comment_count': int,
'duration': 9.753,
'like_count': int,
'release_date': '20250517',
'release_timestamp': 1747483344,
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
'uploader': '极目新闻',
'uploader_id': 'MS4wLjABAAAAeateBb9Su8I3MJOZozmvyzWktmba5LMlliRDz1KffnM',
'view_count': int,
},
}, {
'url': 'https://www.toutiao.com/video/7479446610359878153/',
'info_dict': {
'id': '7479446610359878153',
'ext': 'mp4',
'title': '小伙竟然利用两块磁铁制作成磁力减震器,简直太有创意了!',
'comment_count': int,
'duration': 118.374,
'like_count': int,
'release_date': '20250308',
'release_timestamp': 1741444368,
'thumbnail': r're:https?://p\d+-sign\.toutiaoimg\.com/.+$',
'uploader': '小莉创意发明',
'uploader_id': 'MS4wLjABAAAA4f7d4mwtApALtHIiq-QM20dwXqe32NUz0DeWF7wbHKw',
'view_count': int,
},
}]
def _real_initialize(self):
if self._get_cookies('https://www.toutiao.com').get('ttwid'):
return
urlh = self._request_webpage(
'https://ttwid.bytedance.com/ttwid/union/register/', None,
'Fetching ttwid', 'Unable to fetch ttwid', headers={
'Content-Type': 'application/json',
}, data=json.dumps({
'aid': 24,
'needFid': False,
'region': 'cn',
'service': 'www.toutiao.com',
'union': True,
}).encode(),
)
if ttwid := try_call(lambda: self._get_cookies(urlh.url)['ttwid'].value):
self._set_cookie('.toutiao.com', 'ttwid', ttwid)
return
self.raise_login_required()
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = traverse_obj(webpage, (
{find_element(tag='script', id='RENDER_DATA')},
{urllib.parse.unquote}, {json.loads}, 'data', 'initialVideo',
))
formats = []
for video in traverse_obj(video_data, (
'videoPlayInfo', 'video_list', lambda _, v: v['main_url'],
)):
formats.append({
'url': video['main_url'],
**traverse_obj(video, ('video_meta', {
'acodec': ('audio_profile', {str}),
'asr': ('audio_sample_rate', {int_or_none}),
'audio_channels': ('audio_channels', {float_or_none}, {int_or_none}),
'ext': ('vtype', {str}),
'filesize': ('size', {int_or_none}),
'format_id': ('definition', {str}),
'fps': ('fps', {int_or_none}),
'height': ('vheight', {int_or_none}),
'tbr': ('real_bitrate', {float_or_none(scale=1000)}),
'vcodec': ('codec_type', {str}),
'width': ('vwidth', {int_or_none}),
})),
})
return {
'id': video_id,
'formats': formats,
**traverse_obj(video_data, {
'comment_count': ('commentCount', {int_or_none}),
'duration': ('videoPlayInfo', 'video_duration', {float_or_none}),
'like_count': ('repinCount', {int_or_none}),
'release_timestamp': ('publishTime', {int_or_none}),
'thumbnail': (('poster', 'coverUrl'), {url_or_none}, any),
'title': ('title', {str}),
'uploader': ('userInfo', 'name', {str}),
'uploader_id': ('userInfo', 'userId', {str_or_none}),
'view_count': ('playCount', {int_or_none}),
'webpage_url': ('detailUrl', {url_or_none}),
}),
}

View File

@ -20,6 +20,7 @@ class TruTVIE(TurnerBaseIE):
'skip_download': True, 'skip_download': True,
}, },
} }
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhYzQyOTkwMi0xMDYzLTQyNTQtYWJlYS1iZTY2ODM4MTVmZGIiLCJuYmYiOjE1MzcxOTA4NjgsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNTM3MTkwODY4fQ.ewXl5LDMDvvx3nDXV4jCdSwUq_sOluKoOVsIjznAo6Zo4zrGe9rjlZ9DOmQKW66g6VRMexJsJ5vM1EkY8TC5-YcQw_BclK1FPGO1rH3Wf7tX_l0b1BVbSJQKIj9UgqDp_QbGcBXz24kN4So3U22mhs6di9PYyyfG68ccKL2iRprcVKWCslIHwUF-T7FaEqb0K57auilxeW1PONG2m-lIAcZ62DUwqXDWvw0CRoWI08aVVqkkhnXaSsQfLs5Ph1Pfh9Oq3g_epUm9Ss45mq6XM7gbOb5omTcKLADRKK-PJVB_JXnZnlsXbG0ttKE1cTKJ738qu7j4aipYTf-W0nKF5Q'
def _real_extract(self, url): def _real_extract(self, url):
series_slug, clip_slug, video_id = self._match_valid_url(url).groups() series_slug, clip_slug, video_id = self._match_valid_url(url).groups()
@ -39,7 +40,7 @@ class TruTVIE(TurnerBaseIE):
title = video_data['title'].strip() title = video_data['title'].strip()
info = self._extract_ngtv_info( info = self._extract_ngtv_info(
media_id, {}, { media_id, {}, self._SOFTWARE_STATEMENT, {
'url': url, 'url': url,
'site_name': 'truTV', 'site_name': 'truTV',
'auth_required': video_data.get('isAuthRequired'), 'auth_required': video_data.get('isAuthRequired'),

View File

@ -22,7 +22,7 @@ class TurnerBaseIE(AdobePassIE):
def _extract_timestamp(self, video_data): def _extract_timestamp(self, video_data):
return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts')) return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, custom_tokenizer_query=None): def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data, software_statement, custom_tokenizer_query=None):
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*' secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
token = self._AKAMAI_SPE_TOKEN_CACHE.get(secure_path) token = self._AKAMAI_SPE_TOKEN_CACHE.get(secure_path)
if not token: if not token:
@ -34,7 +34,8 @@ class TurnerBaseIE(AdobePassIE):
else: else:
query['videoId'] = content_id query['videoId'] = content_id
if ap_data.get('auth_required'): if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], content_id, ap_data['site_name'], ap_data['site_name']) query['accessToken'] = self._extract_mvpd_auth(
ap_data['url'], content_id, ap_data['site_name'], ap_data['site_name'], software_statement)
auth = self._download_xml( auth = self._download_xml(
tokenizer_src, content_id, query=query) tokenizer_src, content_id, query=query)
error_msg = xpath_text(auth, 'error/msg') error_msg = xpath_text(auth, 'error/msg')
@ -46,7 +47,7 @@ class TurnerBaseIE(AdobePassIE):
self._AKAMAI_SPE_TOKEN_CACHE[secure_path] = token self._AKAMAI_SPE_TOKEN_CACHE[secure_path] = token
return video_url + '?hdnea=' + token return video_url + '?hdnea=' + token
def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}, fatal=False): def _extract_cvp_info(self, data_src, video_id, software_statement, path_data={}, ap_data={}, fatal=False):
video_data = self._download_xml( video_data = self._download_xml(
data_src, video_id, data_src, video_id,
transform_source=lambda s: fix_xml_ampersands(s).strip(), transform_source=lambda s: fix_xml_ampersands(s).strip(),
@ -101,7 +102,7 @@ class TurnerBaseIE(AdobePassIE):
video_url = self._add_akamai_spe_token( video_url = self._add_akamai_spe_token(
secure_path_data['tokenizer_src'], secure_path_data['tokenizer_src'],
secure_path_data['media_src'] + video_url, secure_path_data['media_src'] + video_url,
content_id, ap_data) content_id, ap_data, software_statement)
elif not re.match('https?://', video_url): elif not re.match('https?://', video_url):
base_path_data = path_data.get(ext, path_data.get('default', {})) base_path_data = path_data.get(ext, path_data.get('default', {}))
media_src = base_path_data.get('media_src') media_src = base_path_data.get('media_src')
@ -215,10 +216,12 @@ class TurnerBaseIE(AdobePassIE):
'is_live': is_live, 'is_live': is_live,
} }
def _extract_ngtv_info(self, media_id, tokenizer_query, ap_data=None): def _extract_ngtv_info(self, media_id, tokenizer_query, software_statement, ap_data=None):
if not isinstance(ap_data, dict):
ap_data = {}
is_live = ap_data.get('is_live') is_live = ap_data.get('is_live')
streams_data = self._download_json( streams_data = self._download_json(
f'http://medium.ngtv.io/media/{media_id}/tv', f'https://medium.ngtv.io/media/{media_id}/tv',
media_id)['media']['tv'] media_id)['media']['tv']
duration = None duration = None
chapters = [] chapters = []
@ -230,8 +233,8 @@ class TurnerBaseIE(AdobePassIE):
continue continue
if stream_data.get('playlistProtection') == 'spe': if stream_data.get('playlistProtection') == 'spe':
m3u8_url = self._add_akamai_spe_token( m3u8_url = self._add_akamai_spe_token(
'http://token.ngtv.io/token/token_spe', 'https://token.ngtv.io/token/token_spe',
m3u8_url, media_id, ap_data or {}, tokenizer_query) m3u8_url, media_id, ap_data, software_statement, tokenizer_query)
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, media_id, 'mp4', m3u8_id='hls', live=is_live, fatal=False)) m3u8_url, media_id, 'mp4', m3u8_id='hls', live=is_live, fatal=False))

View File

@ -1,4 +1,5 @@
import base64 import base64
import hashlib
import itertools import itertools
import re import re
@ -16,6 +17,7 @@ from ..utils import (
str_to_int, str_to_int,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
@ -171,6 +173,10 @@ class TwitCastingIE(InfoExtractor):
'player': 'pc_web', 'player': 'pc_web',
}) })
password_params = {
'word': hashlib.md5(video_password.encode()).hexdigest(),
} if video_password else None
formats = [] formats = []
# low: 640x360, medium: 1280x720, high: 1920x1080 # low: 640x360, medium: 1280x720, high: 1920x1080
qq = qualities(['low', 'medium', 'high']) qq = qualities(['low', 'medium', 'high'])
@ -178,7 +184,7 @@ class TwitCastingIE(InfoExtractor):
'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]), 'tc-hls', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
)): )):
formats.append({ formats.append({
'url': m3u8_url, 'url': update_url_query(m3u8_url, password_params),
'format_id': f'hls-{quality}', 'format_id': f'hls-{quality}',
'ext': 'mp4', 'ext': 'mp4',
'quality': qq(quality), 'quality': qq(quality),
@ -192,7 +198,7 @@ class TwitCastingIE(InfoExtractor):
'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]), 'llfmp4', 'streams', {dict.items}, lambda _, v: url_or_none(v[1]),
)): )):
formats.append({ formats.append({
'url': ws_url, 'url': update_url_query(ws_url, password_params),
'format_id': f'ws-{mode}', 'format_id': f'ws-{mode}',
'ext': 'mp4', 'ext': 'mp4',
'quality': qq(mode), 'quality': qq(mode),

View File

@ -187,7 +187,7 @@ class TwitchBaseIE(InfoExtractor):
'url': thumbnail, 'url': thumbnail,
}] if thumbnail else None }] if thumbnail else None
def _extract_twitch_m3u8_formats(self, path, video_id, token, signature): def _extract_twitch_m3u8_formats(self, path, video_id, token, signature, live_from_start=False):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={ f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={
'allow_source': 'true', 'allow_source': 'true',
@ -204,7 +204,10 @@ class TwitchBaseIE(InfoExtractor):
for fmt in formats: for fmt in formats:
if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'): if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'):
# mpegts does not yet have proper support for av1 # mpegts does not yet have proper support for av1
fmt['downloader_options'] = {'ffmpeg_args_out': ['-f', 'mp4']} fmt.setdefault('downloader_options', {}).update({'ffmpeg_args_out': ['-f', 'mp4']})
if live_from_start:
fmt.setdefault('downloader_options', {}).update({'ffmpeg_args': ['-live_start_index', '0']})
fmt['is_from_start'] = True
return formats return formats
@ -550,7 +553,8 @@ class TwitchVodIE(TwitchBaseIE):
access_token = self._download_access_token(vod_id, 'video', 'id') access_token = self._download_access_token(vod_id, 'video', 'id')
formats = self._extract_twitch_m3u8_formats( formats = self._extract_twitch_m3u8_formats(
'vod', vod_id, access_token['value'], access_token['signature']) 'vod', vod_id, access_token['value'], access_token['signature'],
live_from_start=self.get_param('live_from_start'))
formats.extend(self._extract_storyboard(vod_id, video.get('storyboard'), info.get('duration'))) formats.extend(self._extract_storyboard(vod_id, video.get('storyboard'), info.get('duration')))
self._prefer_source(formats) self._prefer_source(formats)
@ -633,6 +637,10 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
_PAGE_LIMIT = 100 _PAGE_LIMIT = 100
def _entries(self, channel_name, *args): def _entries(self, channel_name, *args):
"""
Subclasses must define _make_variables() and _extract_entry(),
as well as set _OPERATION_NAME, _ENTRY_KIND, _EDGE_KIND, and _NODE_KIND
"""
cursor = None cursor = None
variables_common = self._make_variables(channel_name, *args) variables_common = self._make_variables(channel_name, *args)
entries_key = f'{self._ENTRY_KIND}s' entries_key = f'{self._ENTRY_KIND}s'
@ -672,7 +680,22 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
break break
class TwitchVideosIE(TwitchPlaylistBaseIE): class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
_OPERATION_NAME = 'FilterableVideoTower_Videos'
_ENTRY_KIND = 'video'
_EDGE_KIND = 'VideoEdge'
_NODE_KIND = 'Video'
@staticmethod
def _make_variables(channel_name, broadcast_type, sort):
return {
'channelOwnerLogin': channel_name,
'broadcastType': broadcast_type,
'videoSort': sort.upper(),
}
class TwitchVideosIE(TwitchVideosBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)' _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)'
_TESTS = [{ _TESTS = [{
@ -751,11 +774,6 @@ class TwitchVideosIE(TwitchPlaylistBaseIE):
'views': 'Popular', 'views': 'Popular',
} }
_OPERATION_NAME = 'FilterableVideoTower_Videos'
_ENTRY_KIND = 'video'
_EDGE_KIND = 'VideoEdge'
_NODE_KIND = 'Video'
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return (False return (False
@ -764,14 +782,6 @@ class TwitchVideosIE(TwitchPlaylistBaseIE):
TwitchVideosCollectionsIE)) TwitchVideosCollectionsIE))
else super().suitable(url)) else super().suitable(url))
@staticmethod
def _make_variables(channel_name, broadcast_type, sort):
return {
'channelOwnerLogin': channel_name,
'broadcastType': broadcast_type,
'videoSort': sort.upper(),
}
@staticmethod @staticmethod
def _extract_entry(node): def _extract_entry(node):
return _make_video_result(node) return _make_video_result(node)
@ -919,7 +929,7 @@ class TwitchVideosCollectionsIE(TwitchPlaylistBaseIE):
playlist_title=f'{channel_name} - Collections') playlist_title=f'{channel_name} - Collections')
class TwitchStreamIE(TwitchBaseIE): class TwitchStreamIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:stream' IE_NAME = 'twitch:stream'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -982,6 +992,7 @@ class TwitchStreamIE(TwitchBaseIE):
'skip_download': 'Livestream', 'skip_download': 'Livestream',
}, },
}] }]
_PAGE_LIMIT = 1
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
@ -995,6 +1006,20 @@ class TwitchStreamIE(TwitchBaseIE):
TwitchClipsIE)) TwitchClipsIE))
else super().suitable(url)) else super().suitable(url))
@staticmethod
def _extract_entry(node):
if not isinstance(node, dict) or not node.get('id'):
return None
video_id = node['id']
return {
'_type': 'url',
'ie_key': TwitchVodIE.ie_key(),
'id': 'v' + video_id,
'url': f'https://www.twitch.tv/videos/{video_id}',
'title': node.get('title'),
'timestamp': unified_timestamp(node.get('publishedAt')) or 0,
}
def _real_extract(self, url): def _real_extract(self, url):
channel_name = self._match_id(url).lower() channel_name = self._match_id(url).lower()
@ -1029,6 +1054,16 @@ class TwitchStreamIE(TwitchBaseIE):
if not stream: if not stream:
raise UserNotLive(video_id=channel_name) raise UserNotLive(video_id=channel_name)
timestamp = unified_timestamp(stream.get('createdAt'))
if self.get_param('live_from_start'):
self.to_screen(f'{channel_name}: Extracting VOD to download live from start')
entry = next(self._entries(channel_name, None, 'time'), None)
if entry and entry.pop('timestamp') >= (timestamp or float('inf')):
return entry
self.report_warning(
'Unable to extract the VOD associated with this livestream', video_id=channel_name)
access_token = self._download_access_token( access_token = self._download_access_token(
channel_name, 'stream', 'channelName') channel_name, 'stream', 'channelName')
@ -1038,7 +1073,6 @@ class TwitchStreamIE(TwitchBaseIE):
self._prefer_source(formats) self._prefer_source(formats)
view_count = stream.get('viewers') view_count = stream.get('viewers')
timestamp = unified_timestamp(stream.get('createdAt'))
sq_user = try_get(gql, lambda x: x[1]['data']['user'], dict) or {} sq_user = try_get(gql, lambda x: x[1]['data']['user'], dict) or {}
uploader = sq_user.get('displayName') uploader = sq_user.get('displayName')

View File

@ -20,7 +20,6 @@ from ..utils import (
remove_end, remove_end,
str_or_none, str_or_none,
strip_or_none, strip_or_none,
traverse_obj,
truncate_string, truncate_string,
try_call, try_call,
try_get, try_get,
@ -29,6 +28,7 @@ from ..utils import (
url_or_none, url_or_none,
xpath_text, xpath_text,
) )
from ..utils.traversal import require, traverse_obj
class TwitterBaseIE(InfoExtractor): class TwitterBaseIE(InfoExtractor):
@ -1342,7 +1342,7 @@ class TwitterIE(TwitterBaseIE):
'tweet_mode': 'extended', 'tweet_mode': 'extended',
}) })
except ExtractorError as e: except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429: if not isinstance(e.cause, HTTPError) or e.cause.status != 429:
raise raise
self.report_warning('Rate-limit exceeded; falling back to syndication endpoint') self.report_warning('Rate-limit exceeded; falling back to syndication endpoint')
status = self._call_syndication_api(twid) status = self._call_syndication_api(twid)
@ -1596,8 +1596,8 @@ class TwitterAmplifyIE(TwitterBaseIE):
class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE): class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
IE_NAME = 'twitter:broadcast' IE_NAME = 'twitter:broadcast'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/(?P<type>broadcasts|events)/(?P<id>\w+)'
_TESTS = [{ _TESTS = [{
# untitled Periscope video # untitled Periscope video
'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj', 'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
@ -1605,6 +1605,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1yNGaQLWpejGj', 'id': '1yNGaQLWpejGj',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Andrea May Sahouri - Periscope Broadcast', 'title': 'Andrea May Sahouri - Periscope Broadcast',
'display_id': '1yNGaQLWpejGj',
'uploader': 'Andrea May Sahouri', 'uploader': 'Andrea May Sahouri',
'uploader_id': 'andreamsahouri', 'uploader_id': 'andreamsahouri',
'uploader_url': 'https://twitter.com/andreamsahouri', 'uploader_url': 'https://twitter.com/andreamsahouri',
@ -1612,6 +1613,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20200601', 'upload_date': '20200601',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
}, },
}, { }, {
'url': 'https://twitter.com/i/broadcasts/1ZkKzeyrPbaxv', 'url': 'https://twitter.com/i/broadcasts/1ZkKzeyrPbaxv',
@ -1619,6 +1622,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1ZkKzeyrPbaxv', 'id': '1ZkKzeyrPbaxv',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starship | SN10 | High-Altitude Flight Test', 'title': 'Starship | SN10 | High-Altitude Flight Test',
'display_id': '1ZkKzeyrPbaxv',
'uploader': 'SpaceX', 'uploader': 'SpaceX',
'uploader_id': 'SpaceX', 'uploader_id': 'SpaceX',
'uploader_url': 'https://twitter.com/SpaceX', 'uploader_url': 'https://twitter.com/SpaceX',
@ -1626,6 +1630,8 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20210303', 'upload_date': '20210303',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
}, },
}, { }, {
'url': 'https://twitter.com/i/broadcasts/1OyKAVQrgzwGb', 'url': 'https://twitter.com/i/broadcasts/1OyKAVQrgzwGb',
@ -1633,6 +1639,7 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'id': '1OyKAVQrgzwGb', 'id': '1OyKAVQrgzwGb',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Starship Flight Test', 'title': 'Starship Flight Test',
'display_id': '1OyKAVQrgzwGb',
'uploader': 'SpaceX', 'uploader': 'SpaceX',
'uploader_id': 'SpaceX', 'uploader_id': 'SpaceX',
'uploader_url': 'https://twitter.com/SpaceX', 'uploader_url': 'https://twitter.com/SpaceX',
@ -1640,21 +1647,58 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
'upload_date': '20230420', 'upload_date': '20230420',
'thumbnail': r're:^https?://[^?#]+\.jpg\?token=', 'thumbnail': r're:^https?://[^?#]+\.jpg\?token=',
'view_count': int, 'view_count': int,
'concurrent_view_count': int,
'live_status': 'was_live',
},
}, {
'url': 'https://x.com/i/events/1910629646300762112',
'info_dict': {
'id': '1LyxBWDRNqyKN',
'ext': 'mp4',
'title': '#ガンニバル ウォッチパーティー',
'concurrent_view_count': int,
'display_id': '1910629646300762112',
'live_status': 'was_live',
'release_date': '20250423',
'release_timestamp': 1745409000,
'tags': ['ガンニバル'],
'thumbnail': r're:https?://[^?#]+\.jpg\?token=',
'timestamp': 1745403328,
'upload_date': '20250423',
'uploader': 'ディズニープラス公式',
'uploader_id': 'DisneyPlusJP',
'uploader_url': 'https://twitter.com/DisneyPlusJP',
'view_count': int,
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
broadcast_id = self._match_id(url) broadcast_type, display_id = self._match_valid_url(url).group('type', 'id')
if broadcast_type == 'events':
timeline = self._call_api(
f'live_event/1/{display_id}/timeline.json', display_id)
broadcast_id = traverse_obj(timeline, (
'twitter_objects', 'broadcasts', ..., ('id', 'broadcast_id'),
{str}, any, {require('broadcast ID')}))
else:
broadcast_id = display_id
broadcast = self._call_api( broadcast = self._call_api(
'broadcasts/show.json', broadcast_id, 'broadcasts/show.json', broadcast_id,
{'ids': broadcast_id})['broadcasts'][broadcast_id] {'ids': broadcast_id})['broadcasts'][broadcast_id]
if not broadcast: if not broadcast:
raise ExtractorError('Broadcast no longer exists', expected=True) raise ExtractorError('Broadcast no longer exists', expected=True)
info = self._parse_broadcast_data(broadcast, broadcast_id) info = self._parse_broadcast_data(broadcast, broadcast_id)
info['title'] = broadcast.get('status') or info.get('title') info.update({
info['uploader_id'] = broadcast.get('twitter_username') or info.get('uploader_id') 'display_id': display_id,
info['uploader_url'] = format_field(broadcast, 'twitter_username', 'https://twitter.com/%s', default=None) 'title': broadcast.get('status') or info.get('title'),
'uploader_id': broadcast.get('twitter_username') or info.get('uploader_id'),
'uploader_url': format_field(
broadcast, 'twitter_username', 'https://twitter.com/%s', default=None),
})
if info['live_status'] == 'is_upcoming': if info['live_status'] == 'is_upcoming':
self.raise_no_formats('This live broadcast has not yet started', expected=True)
return info return info
media_key = broadcast['media_key'] media_key = broadcast['media_key']

View File

@ -32,6 +32,7 @@ class ViceBaseIE(InfoExtractor):
class ViceIE(ViceBaseIE, AdobePassIE): class ViceIE(ViceBaseIE, AdobePassIE):
_WORKING = False
IE_NAME = 'vice' IE_NAME = 'vice'
_VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})' _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
_EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})'] _EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})']
@ -99,6 +100,7 @@ class ViceIE(ViceBaseIE, AdobePassIE):
'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1', 'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1',
'only_matching': True, 'only_matching': True,
}] }]
_SOFTWARE_STATEMENT = 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiIwMTVjODBlZC04ZDcxLTQ4ZGEtOTZkZi00NzU5NjIwNzJlYTQiLCJuYmYiOjE2NjgwMTM0ODQsImlzcyI6ImF1dGguYWRvYmUuY29tIiwiaWF0IjoxNjY4MDEzNDg0fQ.CjhUnTrlh-bmYnEFHyC2Y4it5Y_Zfza1x66O4-ki5gBR7JT6aUunYI_YflXomQPACriMpObkITFz4grVaDwdd8Xp9hrQ2R0SwRBdaklkdy1_j68RqSP5PnexJIa0q_ThtOwfRBd5uGcb33nMJ9Qs92W4kVXuca0Ta-i7SJyWgXUaPDlRDdgyCL3hKj5wuM7qUIwrd9A5CMm-j3dMIBCDgw7X6TwRK65eUQe6gTWqcvL2yONHHTpmIfeOTUxGwwKFr29COOTBowm0VJ6HE08xjXCShP08Neusu-JsgkjzhkEbiDE2531EKgfAki_7WCd2JUZVsAsCusv4a1maokk6NA'
def _real_extract(self, url): def _real_extract(self, url):
locale, video_id = self._match_valid_url(url).groups() locale, video_id = self._match_valid_url(url).groups()
@ -116,7 +118,7 @@ class ViceIE(ViceBaseIE, AdobePassIE):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
'VICELAND', title, video_id, rating) 'VICELAND', title, video_id, rating)
query['tvetoken'] = self._extract_mvpd_auth( query['tvetoken'] = self._extract_mvpd_auth(
url, video_id, 'VICELAND', resource) url, video_id, 'VICELAND', resource, self._SOFTWARE_STATEMENT)
# signature generation algorithm is reverse engineered from signatureGenerator in # signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in # webpack:///../shared/~/vice-player/dist/js/vice-player.js in
@ -181,6 +183,7 @@ class ViceIE(ViceBaseIE, AdobePassIE):
class ViceShowIE(ViceBaseIE): class ViceShowIE(ViceBaseIE):
_WORKING = False
IE_NAME = 'vice:show' IE_NAME = 'vice:show'
_VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
_PAGE_SIZE = 25 _PAGE_SIZE = 25
@ -221,6 +224,7 @@ class ViceShowIE(ViceBaseIE):
class ViceArticleIE(ViceBaseIE): class ViceArticleIE(ViceBaseIE):
_WORKING = False
IE_NAME = 'vice:article' IE_NAME = 'vice:article'
_VALID_URL = r'https?://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)' _VALID_URL = r'https?://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'

View File

@ -3,6 +3,7 @@ import functools
import itertools import itertools
import json import json
import re import re
import time
import urllib.parse import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
@ -13,10 +14,12 @@ from ..utils import (
OnDemandPagedList, OnDemandPagedList,
clean_html, clean_html,
determine_ext, determine_ext,
filter_dict,
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
js_to_json, js_to_json,
jwt_decode_hs256,
merge_dicts, merge_dicts,
parse_filesize, parse_filesize,
parse_iso8601, parse_iso8601,
@ -39,6 +42,9 @@ class VimeoBaseInfoExtractor(InfoExtractor):
_NETRC_MACHINE = 'vimeo' _NETRC_MACHINE = 'vimeo'
_LOGIN_REQUIRED = False _LOGIN_REQUIRED = False
_LOGIN_URL = 'https://vimeo.com/log_in' _LOGIN_URL = 'https://vimeo.com/log_in'
_REFERER_HINT = (
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
'with the URL of the page that embeds this video.')
_IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw==' _IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw=='
_IOS_CLIENT_HEADERS = { _IOS_CLIENT_HEADERS = {
'Accept': 'application/vnd.vimeo.*+json; version=3.4.10', 'Accept': 'application/vnd.vimeo.*+json; version=3.4.10',
@ -47,6 +53,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
} }
_IOS_OAUTH_CACHE_KEY = 'oauth-token-ios' _IOS_OAUTH_CACHE_KEY = 'oauth-token-ios'
_ios_oauth_token = None _ios_oauth_token = None
_viewer_info = None
@staticmethod @staticmethod
def _smuggle_referrer(url, referrer_url): def _smuggle_referrer(url, referrer_url):
@ -60,8 +67,21 @@ class VimeoBaseInfoExtractor(InfoExtractor):
headers['Referer'] = data['referer'] headers['Referer'] = data['referer']
return url, data, headers return url, data, headers
def _jwt_is_expired(self, token):
return jwt_decode_hs256(token)['exp'] - time.time() < 120
def _fetch_viewer_info(self, display_id=None, fatal=True):
if self._viewer_info and not self._jwt_is_expired(self._viewer_info['jwt']):
return self._viewer_info
self._viewer_info = self._download_json(
'https://vimeo.com/_next/viewer', display_id, 'Downloading web token info',
'Failed to download web token info', fatal=fatal, headers={'Accept': 'application/json'})
return self._viewer_info
def _perform_login(self, username, password): def _perform_login(self, username, password):
viewer = self._download_json('https://vimeo.com/_next/viewer', None, 'Downloading login token') viewer = self._fetch_viewer_info()
data = { data = {
'action': 'login', 'action': 'login',
'email': username, 'email': username,
@ -96,11 +116,10 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True) expected=True)
return password return password
def _verify_video_password(self, video_id): def _verify_video_password(self, video_id, path=None):
video_password = self._get_video_password() video_password = self._get_video_password()
token = self._download_json( token = self._fetch_viewer_info(video_id)['xsrft']
'https://vimeo.com/_next/viewer', video_id, 'Downloading viewer info')['xsrft'] url = join_nonempty('https://vimeo.com', path, video_id, delim='/')
url = f'https://vimeo.com/{video_id}'
try: try:
self._request_webpage( self._request_webpage(
f'{url}/password', video_id, f'{url}/password', video_id,
@ -117,6 +136,10 @@ class VimeoBaseInfoExtractor(InfoExtractor):
raise ExtractorError('Wrong password', expected=True) raise ExtractorError('Wrong password', expected=True)
raise raise
def _extract_config_url(self, webpage, **kwargs):
return self._html_search_regex(
r'\bdata-config-url="([^"]+)"', webpage, 'config URL', **kwargs)
def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs): def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
vimeo_config = self._search_regex( vimeo_config = self._search_regex(
r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));', r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));',
@ -164,6 +187,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
sep_pattern = r'/sep/video/' sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'): for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items(): for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
# TODO: Also extract 'avc_url'? Investigate if there are 'hevc_url', 'av1_url'?
manifest_url = cdn_data.get('url') manifest_url = cdn_data.get('url')
if not manifest_url: if not manifest_url:
continue continue
@ -244,7 +268,10 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'live_status': live_status, 'live_status': live_status,
'release_timestamp': traverse_obj(live_event, ('ingest', 'scheduled_start_time', {parse_iso8601})), 'release_timestamp': traverse_obj(live_event, ('ingest', (
('scheduled_start_time', {parse_iso8601}),
('start_time', {int_or_none}),
), any)),
# Note: Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps # Note: Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. # at the same time without actual units specified.
'_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'), '_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'),
@ -353,7 +380,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
(?: (?:
(?P<u>user)| (?P<u>user)|
(?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/) (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?:.*?/)?? (?:(?!event/).*?/)??
(?P<q> (?P<q>
(?: (?:
play_redirect_hls| play_redirect_hls|
@ -933,8 +960,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'vimeo\.com/(?:album|showcase)/([^/]+)', url, 'album id', default=None) r'vimeo\.com/(?:album|showcase)/([^/]+)', url, 'album id', default=None)
if not album_id: if not album_id:
return return
viewer = self._download_json( viewer = self._fetch_viewer_info(album_id, fatal=False)
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
if not viewer: if not viewer:
webpage = self._download_webpage(url, album_id) webpage = self._download_webpage(url, album_id)
viewer = self._parse_json(self._search_regex( viewer = self._parse_json(self._search_regex(
@ -992,9 +1018,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
raise raise
errmsg = error.cause.response.read() errmsg = error.cause.response.read()
if b'Because of its privacy settings, this video cannot be played here' in errmsg: if b'Because of its privacy settings, this video cannot be played here' in errmsg:
raise ExtractorError( raise ExtractorError(self._REFERER_HINT, expected=True)
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
'with the URL of the page that embeds this video.', expected=True)
# 403 == vimeo.com TLS fingerprint or DC IP block; 429 == player.vimeo.com TLS FP block # 403 == vimeo.com TLS fingerprint or DC IP block; 429 == player.vimeo.com TLS FP block
status = error.cause.status status = error.cause.status
dcip_msg = 'If you are using a data center IP or VPN/proxy, your IP may be blocked' dcip_msg = 'If you are using a data center IP or VPN/proxy, your IP may be blocked'
@ -1039,8 +1063,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
channel_id = self._search_regex( channel_id = self._search_regex(
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None) r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
if channel_id: if channel_id:
config_url = self._html_search_regex( config_url = self._extract_config_url(webpage, default=None)
r'\bdata-config-url="([^"]+)"', webpage, 'config URL', default=None)
video_description = clean_html(get_element_by_class('description', webpage)) video_description = clean_html(get_element_by_class('description', webpage))
info_dict.update({ info_dict.update({
'channel_id': channel_id, 'channel_id': channel_id,
@ -1333,8 +1356,7 @@ class VimeoAlbumIE(VimeoBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
album_id = self._match_id(url) album_id = self._match_id(url)
viewer = self._download_json( viewer = self._fetch_viewer_info(album_id, fatal=False)
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
if not viewer: if not viewer:
webpage = self._download_webpage(url, album_id) webpage = self._download_webpage(url, album_id)
viewer = self._parse_json(self._search_regex( viewer = self._parse_json(self._search_regex(
@ -1626,3 +1648,377 @@ class VimeoProIE(VimeoBaseInfoExtractor):
return self.url_result(vimeo_url, VimeoIE, video_id, url_transparent=True, return self.url_result(vimeo_url, VimeoIE, video_id, url_transparent=True,
description=description) description=description)
class VimeoEventIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:event'
_VALID_URL = r'''(?x)
https?://(?:www\.)?vimeo\.com/event/(?P<id>\d+)(?:/
(?:
(?:embed/)?(?P<unlisted_hash>[\da-f]{10})|
videos/(?P<video_id>\d+)
)
)?'''
_EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=["\'](?P<url>https?://vimeo\.com/event/\d+/embed(?:[/?][^"\']*)?)["\'][^>]*>']
_TESTS = [{
# stream_privacy.view: 'anybody'
'url': 'https://vimeo.com/event/5116195',
'info_dict': {
'id': '1082194134',
'ext': 'mp4',
'display_id': '5116195',
'title': 'Skidmore College Commencement 2025',
'description': 'md5:1902dd5165d21f98aa198297cc729d23',
'uploader': 'Skidmore College',
'uploader_id': 'user116066434',
'uploader_url': 'https://vimeo.com/user116066434',
'comment_count': int,
'like_count': int,
'duration': 9810,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'timestamp': 1747502974,
'upload_date': '20250517',
'release_timestamp': 1747502998,
'release_date': '20250517',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# stream_privacy.view: 'embed_only'
'url': 'https://vimeo.com/event/5034253/embed',
'info_dict': {
'id': '1071439154',
'ext': 'mp4',
'display_id': '5034253',
'title': 'Advancing Humans with AI',
'description': r're:AI is here to stay, but how do we ensure that people flourish in a world of pervasive AI use.{322}$',
'uploader': 'MIT Media Lab',
'uploader_id': 'mitmedialab',
'uploader_url': 'https://vimeo.com/mitmedialab',
'duration': 23235,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'chapters': 'count:37',
'release_timestamp': 1744290000,
'release_date': '20250410',
'live_status': 'was_live',
},
'params': {
'skip_download': 'm3u8',
'http_headers': {'Referer': 'https://www.media.mit.edu/events/aha-symposium/'},
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# Last entry on 2nd page of the 37 video playlist, but use clip_to_play_id API param shortcut
'url': 'https://vimeo.com/event/4753126/videos/1046153257',
'info_dict': {
'id': '1046153257',
'ext': 'mp4',
'display_id': '4753126',
'title': 'January 12, 2025 The True Vine (Pastor John Mindrup)',
'description': 'The True Vine (Pastor \tJohn Mindrup)',
'uploader': 'Salem United Church of Christ',
'uploader_id': 'user230181094',
'uploader_url': 'https://vimeo.com/user230181094',
'comment_count': int,
'like_count': int,
'duration': 4962,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'timestamp': 1736702464,
'upload_date': '20250112',
'release_timestamp': 1736702543,
'release_date': '20250112',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# "24/7" livestream
'url': 'https://vimeo.com/event/4768062',
'info_dict': {
'id': '1079901414',
'ext': 'mp4',
'display_id': '4768062',
'title': r're:GRACELAND CAM \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'description': '24/7 camera at Graceland Mansion',
'uploader': 'Elvis Presley\'s Graceland',
'uploader_id': 'visitgraceland',
'uploader_url': 'https://vimeo.com/visitgraceland',
'release_timestamp': 1745975450,
'release_date': '20250430',
'live_status': 'is_live',
},
'params': {'skip_download': 'livestream'},
}, {
# stream_privacy.view: 'unlisted' with unlisted_hash in URL path (stream_privacy.embed: 'whitelist')
'url': 'https://vimeo.com/event/4259978/3db517c479',
'info_dict': {
'id': '939104114',
'ext': 'mp4',
'display_id': '4259978',
'title': 'Enhancing Credibility in Your Community Science Project',
'description': 'md5:eab953341168b9c146bc3cfe3f716070',
'uploader': 'NOAA Research',
'uploader_id': 'noaaresearch',
'uploader_url': 'https://vimeo.com/noaaresearch',
'comment_count': int,
'like_count': int,
'duration': 3961,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'timestamp': 1716408008,
'upload_date': '20240522',
'release_timestamp': 1716408062,
'release_date': '20240522',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# "done" event with video_id in URL and unlisted_hash in VimeoIE URL
'url': 'https://vimeo.com/event/595460/videos/498149131/',
'info_dict': {
'id': '498149131',
'ext': 'mp4',
'display_id': '595460',
'title': '2021 Eighth Annual John Cardinal Foley Lecture on Social Communications',
'description': 'Replay: https://vimeo.com/catholicphilly/review/498149131/544f26a12f',
'uploader': 'Kearns Media Consulting LLC',
'uploader_id': 'kearnsmediaconsulting',
'uploader_url': 'https://vimeo.com/kearnsmediaconsulting',
'comment_count': int,
'like_count': int,
'duration': 4466,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'timestamp': 1612228466,
'upload_date': '20210202',
'release_timestamp': 1612228538,
'release_date': '20210202',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# stream_privacy.view: 'password'; stream_privacy.embed: 'public'
'url': 'https://vimeo.com/event/4940578',
'info_dict': {
'id': '1059263570',
'ext': 'mp4',
'display_id': '4940578',
'title': 'TMAC AKC AGILITY 2-22-2025',
'uploader': 'Paws \'N Effect',
'uploader_id': 'pawsneffect',
'uploader_url': 'https://vimeo.com/pawsneffect',
'comment_count': int,
'like_count': int,
'duration': 33115,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'timestamp': 1740261836,
'upload_date': '20250222',
'release_timestamp': 1740261873,
'release_date': '20250222',
'live_status': 'was_live',
},
'params': {
'videopassword': '22',
'skip_download': 'm3u8',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
# API serves a playlist of 37 videos, but the site only streams the newest one (changes every Sunday)
'url': 'https://vimeo.com/event/4753126',
'only_matching': True,
}, {
# Scheduled for 2025.05.15 but never started; "unavailable"; stream_privacy.view: "anybody"
'url': 'https://vimeo.com/event/5120811/embed',
'only_matching': True,
}, {
'url': 'https://vimeo.com/event/5112969/embed?muted=1',
'only_matching': True,
}, {
'url': 'https://vimeo.com/event/5097437/embed/interaction?muted=1',
'only_matching': True,
}, {
'url': 'https://vimeo.com/event/5113032/embed?autoplay=1&muted=1',
'only_matching': True,
}, {
# Ended livestream with video_id
'url': 'https://vimeo.com/event/595460/videos/507329569/',
'only_matching': True,
}, {
# stream_privacy.view: 'unlisted' with unlisted_hash in URL path (stream_privacy.embed: 'public')
'url': 'https://vimeo.com/event/4606123/embed/358d60ce2e',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
# Same result as https://vimeo.com/event/5034253/embed
'url': 'https://www.media.mit.edu/events/aha-symposium/',
'info_dict': {
'id': '1071439154',
'ext': 'mp4',
'display_id': '5034253',
'title': 'Advancing Humans with AI',
'description': r're:AI is here to stay, but how do we ensure that people flourish in a world of pervasive AI use.{322}$',
'uploader': 'MIT Media Lab',
'uploader_id': 'mitmedialab',
'uploader_url': 'https://vimeo.com/mitmedialab',
'duration': 23235,
'thumbnail': r're:https://i\.vimeocdn\.com/video/\d+-[\da-f]+-d',
'chapters': 'count:37',
'release_timestamp': 1744290000,
'release_date': '20250410',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}]
_EVENT_FIELDS = (
'title', 'uri', 'schedule', 'stream_description', 'stream_privacy.embed', 'stream_privacy.view',
'clip_to_play.name', 'clip_to_play.uri', 'clip_to_play.config_url', 'clip_to_play.live.status',
'clip_to_play.privacy.embed', 'clip_to_play.privacy.view', 'clip_to_play.password',
'streamable_clip.name', 'streamable_clip.uri', 'streamable_clip.config_url', 'streamable_clip.live.status',
)
_VIDEOS_FIELDS = ('items', 'uri', 'name', 'config_url', 'duration', 'live.status')
def _call_events_api(
self, event_id, ep=None, unlisted_hash=None, note=None,
fields=(), referrer=None, query=None, headers=None,
):
resource = join_nonempty('event', ep, note, 'API JSON', delim=' ')
return self._download_json(
join_nonempty(
'https://api.vimeo.com/live_events',
join_nonempty(event_id, unlisted_hash, delim=':'), ep, delim='/'),
event_id, f'Downloading {resource}', f'Failed to download {resource}',
query=filter_dict({
'fields': ','.join(fields) or [],
# Correct spelling with 4 R's is deliberate
'referrer': referrer,
**(query or {}),
}), headers=filter_dict({
'Accept': 'application/json',
'Authorization': f'jwt {self._fetch_viewer_info(event_id)["jwt"]}',
'Referer': referrer,
**(headers or {}),
}))
@staticmethod
def _extract_video_id_and_unlisted_hash(video):
if not traverse_obj(video, ('uri', {lambda x: x.startswith('/videos/')})):
return None, None
video_id, _, unlisted_hash = video['uri'][8:].partition(':')
return video_id, unlisted_hash or None
def _vimeo_url_result(self, video_id, unlisted_hash=None, event_id=None):
# VimeoIE can extract more metadata and formats for was_live event videos
return self.url_result(
join_nonempty('https://vimeo.com', video_id, unlisted_hash, delim='/'), VimeoIE,
video_id, display_id=event_id, live_status='was_live', url_transparent=True)
@classmethod
def _extract_embed_urls(cls, url, webpage):
for embed_url in super()._extract_embed_urls(url, webpage):
yield cls._smuggle_referrer(embed_url, url)
def _real_extract(self, url):
url, _, headers = self._unsmuggle_headers(url)
# XXX: Keep key name in sync with _unsmuggle_headers
referrer = headers.get('Referer')
event_id, unlisted_hash, video_id = self._match_valid_url(url).group('id', 'unlisted_hash', 'video_id')
for retry in (False, True):
try:
live_event_data = self._call_events_api(
event_id, unlisted_hash=unlisted_hash, fields=self._EVENT_FIELDS,
referrer=referrer, query={'clip_to_play_id': video_id or '0'},
headers={'Accept': 'application/vnd.vimeo.*+json;version=3.4.9'})
break
except ExtractorError as e:
if retry or not isinstance(e.cause, HTTPError) or e.cause.status not in (400, 403):
raise
response = traverse_obj(e.cause.response.read(), ({json.loads}, {dict})) or {}
error_code = response.get('error_code')
if error_code == 2204:
self._verify_video_password(event_id, path='event')
continue
if error_code == 3200:
raise ExtractorError(self._REFERER_HINT, expected=True)
if error_msg := response.get('error'):
raise ExtractorError(f'Vimeo says: {error_msg}', expected=True)
raise
# stream_privacy.view can be: 'anybody', 'embed_only', 'nobody', 'password', 'unlisted'
view_policy = live_event_data['stream_privacy']['view']
if view_policy == 'nobody':
raise ExtractorError('This event has not been made available to anyone', expected=True)
clip_data = traverse_obj(live_event_data, ('clip_to_play', {dict})) or {}
# live.status can be: 'streaming' (is_live), 'done' (was_live), 'unavailable' (is_upcoming OR dead)
clip_status = traverse_obj(clip_data, ('live', 'status', {str}))
start_time = traverse_obj(live_event_data, ('schedule', 'start_time', {str}))
release_timestamp = parse_iso8601(start_time)
if clip_status == 'unavailable' and release_timestamp and release_timestamp > time.time():
self.raise_no_formats(f'This live event is scheduled for {start_time}', expected=True)
live_status = 'is_upcoming'
config_url = None
elif view_policy == 'embed_only':
webpage = self._download_webpage(
join_nonempty('https://vimeo.com/event', event_id, 'embed', unlisted_hash, delim='/'),
event_id, 'Downloading embed iframe webpage', impersonate=True, headers=headers)
# The _parse_config result will overwrite live_status w/ 'is_live' if livestream is active
live_status = 'was_live'
config_url = self._extract_config_url(webpage)
else: # view_policy in ('anybody', 'password', 'unlisted')
if video_id:
clip_id, clip_hash = self._extract_video_id_and_unlisted_hash(clip_data)
if video_id == clip_id and clip_status == 'done' and (clip_hash or view_policy != 'unlisted'):
return self._vimeo_url_result(clip_id, clip_hash, event_id)
video_filter = lambda _, v: self._extract_video_id_and_unlisted_hash(v)[0] == video_id
else:
video_filter = lambda _, v: v['live']['status'] in ('streaming', 'done')
for page in itertools.count(1):
videos_data = self._call_events_api(
event_id, 'videos', unlisted_hash=unlisted_hash, note=f'page {page}',
fields=self._VIDEOS_FIELDS, referrer=referrer, query={'page': page},
headers={'Accept': 'application/vnd.vimeo.*;version=3.4.1'})
video = traverse_obj(videos_data, ('data', video_filter, any))
if video or not traverse_obj(videos_data, ('paging', 'next', {str})):
break
live_status = {
'streaming': 'is_live',
'done': 'was_live',
}.get(traverse_obj(video, ('live', 'status', {str})))
if not live_status: # requested video_id is unavailable or no videos are available
raise ExtractorError('This event video is unavailable', expected=True)
elif live_status == 'was_live':
return self._vimeo_url_result(*self._extract_video_id_and_unlisted_hash(video), event_id)
config_url = video['config_url']
if config_url: # view_policy == 'embed_only' or live_status == 'is_live'
info = filter_dict(self._parse_config(
self._download_json(config_url, event_id, 'Downloading config JSON'), event_id))
else: # live_status == 'is_upcoming'
info = {'id': event_id}
if info.get('live_status') == 'post_live':
self.report_warning('This live event recently ended and some formats may not yet be available')
return {
**traverse_obj(live_event_data, {
'title': ('title', {str}),
'description': ('stream_description', {str}),
}),
'display_id': event_id,
'live_status': live_status,
'release_timestamp': release_timestamp,
**info,
}

View File

@ -1,4 +1,5 @@
import base64 import base64
import functools
import hashlib import hashlib
import hmac import hmac
import itertools import itertools
@ -17,99 +18,227 @@ from ..utils import (
UserNotLive, UserNotLive,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty,
jwt_decode_hs256,
str_or_none, str_or_none,
traverse_obj,
try_call, try_call,
update_url_query, update_url_query,
url_or_none, url_or_none,
) )
from ..utils.traversal import require, traverse_obj
class WeverseBaseIE(InfoExtractor): class WeverseBaseIE(InfoExtractor):
_NETRC_MACHINE = 'weverse' _NETRC_MACHINE = 'weverse'
_ACCOUNT_API_BASE = 'https://accountapi.weverse.io/web/api' _ACCOUNT_API_BASE = 'https://accountapi.weverse.io'
_CLIENT_PLATFORM = 'WEB'
_SIGNING_KEY = b'1b9cb6378d959b45714bec49971ade22e6e24e42'
_ACCESS_TOKEN_KEY = 'we2_access_token'
_REFRESH_TOKEN_KEY = 'we2_refresh_token'
_DEVICE_ID_KEY = 'we2_device_id'
_API_HEADERS = { _API_HEADERS = {
'Accept': 'application/json', 'Accept': 'application/json',
'Origin': 'https://weverse.io',
'Referer': 'https://weverse.io/', 'Referer': 'https://weverse.io/',
'WEV-device-Id': str(uuid.uuid4()),
} }
_LOGIN_HINT_TMPL = (
'You can log in using your refresh token with --username "{}" --password "REFRESH_TOKEN" '
'(replace REFRESH_TOKEN with the actual value of the "{}" cookie found in your web browser). '
'You can add an optional username suffix, e.g. --username "{}" , '
'if you need to manage multiple accounts. ')
_LOGIN_ERRORS_MAP = {
'login_required': 'This content is only available for logged-in users. ',
'invalid_username': '"{}" is not valid login username for this extractor. ',
'invalid_password': (
'Your password is not a valid refresh token. Make sure that '
'you are passing the refresh token, and NOT the access token. '),
'no_refresh_token': (
'Your access token has expired and there is no refresh token available. '
'Refresh your session/cookies in the web browser and try again. '),
'expired_refresh_token': (
'Your refresh token has expired. Log in to the site again using '
'your web browser to get a new refresh token or export fresh cookies. '),
}
_OAUTH_PREFIX = 'oauth'
_oauth_tokens = {}
_device_id = None
def _perform_login(self, username, password): @property
if self._API_HEADERS.get('Authorization'): def _oauth_headers(self):
return return {
**self._API_HEADERS,
headers = { 'X-ACC-APP-SECRET': '5419526f1c624b38b10787e5c10b2a7a',
'x-acc-app-secret': '5419526f1c624b38b10787e5c10b2a7a', 'X-ACC-SERVICE-ID': 'weverse',
'x-acc-app-version': '3.3.6', 'X-ACC-TRACE-ID': str(uuid.uuid4()),
'x-acc-language': 'en',
'x-acc-service-id': 'weverse',
'x-acc-trace-id': str(uuid.uuid4()),
'x-clog-user-device-id': str(uuid.uuid4()),
} }
valid_username = traverse_obj(self._download_json(
f'{self._ACCOUNT_API_BASE}/v2/signup/email/status', None, note='Checking username',
query={'email': username}, headers=headers, expected_status=(400, 404)), 'hasPassword')
if not valid_username:
raise ExtractorError('Invalid username provided', expected=True)
headers['content-type'] = 'application/json' @functools.cached_property
def _oauth_cache_key(self):
username = self._get_login_info()[0]
if not username:
return 'cookies'
return join_nonempty(self._OAUTH_PREFIX, username.partition('+')[2])
@property
def _is_logged_in(self):
return bool(self._oauth_tokens.get(self._ACCESS_TOKEN_KEY))
def _access_token_is_valid(self):
response = self._download_json(
f'{self._ACCOUNT_API_BASE}/api/v1/token/validate', None,
'Validating access token', 'Unable to valid access token',
expected_status=401, headers={
**self._oauth_headers,
'Authorization': f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}',
})
return traverse_obj(response, ('expiresIn', {int}), default=0) > 60
def _token_is_expired(self, key):
is_expired = jwt_decode_hs256(self._oauth_tokens[key])['exp'] - time.time() < 3600
if key == self._REFRESH_TOKEN_KEY or not is_expired:
return is_expired
return not self._access_token_is_valid()
def _refresh_access_token(self):
if not self._oauth_tokens.get(self._REFRESH_TOKEN_KEY):
self._report_login_error('no_refresh_token')
if self._token_is_expired(self._REFRESH_TOKEN_KEY):
self._report_login_error('expired_refresh_token')
headers = {'Content-Type': 'application/json'}
if self._is_logged_in:
headers['Authorization'] = f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}'
try: try:
auth = self._download_json( response = self._download_json(
f'{self._ACCOUNT_API_BASE}/v3/auth/token/by-credentials', None, data=json.dumps({ f'{self._ACCOUNT_API_BASE}/api/v1/token/refresh', None,
'email': username, 'Refreshing access token', 'Unable to refresh access token',
'otpSessionId': 'BY_PASS', headers={**self._oauth_headers, **headers},
'password': password, data=json.dumps({
}, separators=(',', ':')).encode(), headers=headers, note='Logging in') 'refreshToken': self._oauth_tokens[self._REFRESH_TOKEN_KEY],
}, separators=(',', ':')).encode())
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401: if isinstance(e.cause, HTTPError) and e.cause.status == 401:
raise ExtractorError('Invalid password provided', expected=True) self._oauth_tokens.clear()
if self._oauth_cache_key == 'cookies':
self.cookiejar.clear(domain='.weverse.io', path='/', name=self._ACCESS_TOKEN_KEY)
self.cookiejar.clear(domain='.weverse.io', path='/', name=self._REFRESH_TOKEN_KEY)
else:
self.cache.store(self._NETRC_MACHINE, self._oauth_cache_key, self._oauth_tokens)
self._report_login_error('expired_refresh_token')
raise raise
WeverseBaseIE._API_HEADERS['Authorization'] = f'Bearer {auth["accessToken"]}' self._oauth_tokens.update(traverse_obj(response, {
self._ACCESS_TOKEN_KEY: ('accessToken', {str}, {require('access token')}),
self._REFRESH_TOKEN_KEY: ('refreshToken', {str}, {require('refresh token')}),
}))
def _real_initialize(self): if self._oauth_cache_key == 'cookies':
if self._API_HEADERS.get('Authorization'): self._set_cookie('.weverse.io', self._ACCESS_TOKEN_KEY, self._oauth_tokens[self._ACCESS_TOKEN_KEY])
self._set_cookie('.weverse.io', self._REFRESH_TOKEN_KEY, self._oauth_tokens[self._REFRESH_TOKEN_KEY])
else:
self.cache.store(self._NETRC_MACHINE, self._oauth_cache_key, self._oauth_tokens)
def _get_authorization_header(self):
if not self._is_logged_in:
return {}
if self._token_is_expired(self._ACCESS_TOKEN_KEY):
self._refresh_access_token()
return {'Authorization': f'Bearer {self._oauth_tokens[self._ACCESS_TOKEN_KEY]}'}
def _report_login_error(self, error_id):
error_msg = self._LOGIN_ERRORS_MAP[error_id]
username = self._get_login_info()[0]
if error_id == 'invalid_username':
error_msg = error_msg.format(username)
username = f'{self._OAUTH_PREFIX}+{username}'
elif not username:
username = f'{self._OAUTH_PREFIX}+USERNAME'
raise ExtractorError(join_nonempty(
error_msg, self._LOGIN_HINT_TMPL.format(self._OAUTH_PREFIX, self._REFRESH_TOKEN_KEY, username),
'Or else you can u', self._login_hint(method='session_cookies')[1:], delim=''), expected=True)
def _perform_login(self, username, password):
if self._is_logged_in:
return return
token = try_call(lambda: self._get_cookies('https://weverse.io/')['we2_access_token'].value) if username.partition('+')[0] != self._OAUTH_PREFIX:
if token: self._report_login_error('invalid_username')
WeverseBaseIE._API_HEADERS['Authorization'] = f'Bearer {token}'
self._oauth_tokens.update(self.cache.load(self._NETRC_MACHINE, self._oauth_cache_key, default={}))
if self._is_logged_in and self._access_token_is_valid():
return
rt_key = self._REFRESH_TOKEN_KEY
if not self._oauth_tokens.get(rt_key) or self._token_is_expired(rt_key):
if try_call(lambda: jwt_decode_hs256(password)['scope']) != 'refresh':
self._report_login_error('invalid_password')
self._oauth_tokens[rt_key] = password
self._refresh_access_token()
def _real_initialize(self):
cookies = self._get_cookies('https://weverse.io/')
if not self._device_id:
self._device_id = traverse_obj(cookies, (self._DEVICE_ID_KEY, 'value')) or str(uuid.uuid4())
if self._is_logged_in:
return
self._oauth_tokens.update(traverse_obj(cookies, {
self._ACCESS_TOKEN_KEY: (self._ACCESS_TOKEN_KEY, 'value'),
self._REFRESH_TOKEN_KEY: (self._REFRESH_TOKEN_KEY, 'value'),
}))
if self._is_logged_in and not self._access_token_is_valid():
self._refresh_access_token()
def _call_api(self, ep, video_id, data=None, note='Downloading API JSON'): def _call_api(self, ep, video_id, data=None, note='Downloading API JSON'):
# Ref: https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/2488.a09b41ff.chunk.js # Ref: https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/2488.a09b41ff.chunk.js
# From https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/main.e206f7c1.js: # From https://ssl.pstatic.net/static/wevweb/2_3_2_11101725/public/static/js/main.e206f7c1.js:
key = b'1b9cb6378d959b45714bec49971ade22e6e24e42'
api_path = update_url_query(ep, { api_path = update_url_query(ep, {
# 'gcc': 'US', # 'gcc': 'US',
'appId': 'be4d79eb8fc7bd008ee82c8ec4ff6fd4', 'appId': 'be4d79eb8fc7bd008ee82c8ec4ff6fd4',
'language': 'en', 'language': 'en',
'os': 'WEB', 'os': self._CLIENT_PLATFORM,
'platform': 'WEB', 'platform': self._CLIENT_PLATFORM,
'wpf': 'pc', 'wpf': 'pc',
}) })
wmsgpad = int(time.time() * 1000) for is_retry in (False, True):
wmd = base64.b64encode(hmac.HMAC( wmsgpad = int(time.time() * 1000)
key, f'{api_path[:255]}{wmsgpad}'.encode(), digestmod=hashlib.sha1).digest()).decode() wmd = base64.b64encode(hmac.HMAC(
headers = {'Content-Type': 'application/json'} if data else {} self._SIGNING_KEY, f'{api_path[:255]}{wmsgpad}'.encode(),
try: digestmod=hashlib.sha1).digest()).decode()
return self._download_json(
f'https://global.apis.naver.com/weverse/wevweb{api_path}', video_id, note=note, try:
data=data, headers={**self._API_HEADERS, **headers}, query={ return self._download_json(
'wmsgpad': wmsgpad, f'https://global.apis.naver.com/weverse/wevweb{api_path}', video_id, note=note,
'wmd': wmd, data=data, headers={
}) **self._API_HEADERS,
except ExtractorError as e: **self._get_authorization_header(),
if isinstance(e.cause, HTTPError) and e.cause.status == 401: **({'Content-Type': 'application/json'} if data else {}),
self.raise_login_required( 'WEV-device-Id': self._device_id,
'Session token has expired. Log in again or refresh cookies in browser') }, query={
elif isinstance(e.cause, HTTPError) and e.cause.status == 403: 'wmsgpad': wmsgpad,
if 'Authorization' in self._API_HEADERS: 'wmd': wmd,
raise ExtractorError('Your account does not have access to this content', expected=True) })
self.raise_login_required() except ExtractorError as e:
raise if is_retry or not isinstance(e.cause, HTTPError):
raise
elif self._is_logged_in and e.cause.status == 401:
self._refresh_access_token()
continue
elif e.cause.status == 403:
if self._is_logged_in:
raise ExtractorError(
'Your account does not have access to this content', expected=True)
self._report_login_error('login_required')
raise
def _call_post_api(self, video_id): def _call_post_api(self, video_id):
path = '' if 'Authorization' in self._API_HEADERS else '/preview' path = '' if self._is_logged_in else '/preview'
return self._call_api(f'/post/v1.0/post-{video_id}{path}?fieldSet=postV1', video_id) return self._call_api(f'/post/v1.0/post-{video_id}{path}?fieldSet=postV1', video_id)
def _get_community_id(self, channel): def _get_community_id(self, channel):

View File

@ -45,7 +45,7 @@ class XinpianchangIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id=video_id) webpage = self._download_webpage(url, video_id=video_id, headers={'Referer': url})
video_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['detail']['video'] video_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['detail']['video']
data = self._download_json( data = self._download_json(

View File

@ -35,6 +35,7 @@ from ...utils import (
class _PoTokenContext(enum.Enum): class _PoTokenContext(enum.Enum):
PLAYER = 'player' PLAYER = 'player'
GVS = 'gvs' GVS = 'gvs'
SUBS = 'subs'
# any clients starting with _ cannot be explicitly requested by the user # any clients starting with _ cannot be explicitly requested by the user
@ -787,6 +788,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
def _download_ytcfg(self, client, video_id): def _download_ytcfg(self, client, video_id):
url = { url = {
'mweb': 'https://m.youtube.com',
'web': 'https://www.youtube.com', 'web': 'https://www.youtube.com',
'web_music': 'https://music.youtube.com', 'web_music': 'https://music.youtube.com',
'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1', 'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1',

View File

@ -72,6 +72,9 @@ from ...utils.networking import clean_headers, clean_proxies, select_proxy
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client' STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token' STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token'
STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context'
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide' PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
@ -2225,7 +2228,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _extract_n_function_name(self, jscode, player_url=None): def _extract_n_function_name(self, jscode, player_url=None):
varname, global_list = self._interpret_player_js_global_var(jscode, player_url) varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if debug_str := traverse_obj(global_list, (lambda _, v: v.endswith('_w8_'), any)): if debug_str := traverse_obj(global_list, (lambda _, v: v.endswith('-_w8_'), any)):
funcname = self._search_regex( funcname = self._search_regex(
r'''(?xs) r'''(?xs)
[;\n](?: [;\n](?:
@ -2286,8 +2289,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode, rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode,
f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)] f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)]
def _extract_player_js_global_var(self, jscode, player_url): def _interpret_player_js_global_var(self, jscode, player_url):
"""Returns tuple of strings: variable assignment code, variable name, variable value code""" """Returns tuple of: variable name string, variable value list"""
extract_global_var = self._cached(self._search_regex, 'js global array', player_url) extract_global_var = self._cached(self._search_regex, 'js global array', player_url)
varcode, varname, varvalue = extract_global_var( varcode, varname, varvalue = extract_global_var(
r'''(?x) r'''(?x)
@ -2305,27 +2308,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.write_debug(join_nonempty( self.write_debug(join_nonempty(
'No global array variable found in player JS', 'No global array variable found in player JS',
player_url and f' player = {player_url}', delim='\n'), only_once=True) player_url and f' player = {player_url}', delim='\n'), only_once=True)
return varcode, varname, varvalue return None, None
def _interpret_player_js_global_var(self, jscode, player_url): jsi = JSInterpreter(varcode)
"""Returns tuple of: variable name string, variable value list"""
_, varname, array_code = self._extract_player_js_global_var(jscode, player_url)
jsi = JSInterpreter(array_code)
interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url) interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url)
return varname, interpret_global_var(array_code, {}, allow_recursion=10) return varname, interpret_global_var(varvalue, {}, allow_recursion=10)
def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url): def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url):
varcode, varname, _ = self._extract_player_js_global_var(jscode, player_url) varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if varcode and varname: if varname and global_list:
nsig_code = varcode + '; ' + nsig_code nsig_code = f'var {varname}={json.dumps(global_list)}; {nsig_code}'
_, global_list = self._interpret_player_js_global_var(jscode, player_url)
else: else:
varname = 'dlp_wins' varname = 'dlp_wins'
global_list = [] global_list = []
undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+' undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+'
fixed_code = re.sub( fixed_code = re.sub(
rf'''(?x) fr'''(?x)
;\s*if\s*\(\s*typeof\s+[a-zA-Z0-9_$]+\s*===?\s*(?: ;\s*if\s*\(\s*typeof\s+[a-zA-Z0-9_$]+\s*===?\s*(?:
(["\'])undefined\1| (["\'])undefined\1|
{re.escape(varname)}\[{undefined_idx}\] {re.escape(varname)}\[{undefined_idx}\]
@ -2399,6 +2398,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return sts return sts
def _mark_watched(self, video_id, player_responses): def _mark_watched(self, video_id, player_responses):
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join(CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(16))
for is_full, key in enumerate(('videostatsPlaybackUrl', 'videostatsWatchtimeUrl')): for is_full, key in enumerate(('videostatsPlaybackUrl', 'videostatsWatchtimeUrl')):
label = 'fully ' if is_full else '' label = 'fully ' if is_full else ''
url = get_first(player_responses, ('playbackTracking', key, 'baseUrl'), url = get_first(player_responses, ('playbackTracking', key, 'baseUrl'),
@ -2409,11 +2413,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
parsed_url = urllib.parse.urlparse(url) parsed_url = urllib.parse.urlparse(url)
qs = urllib.parse.parse_qs(parsed_url.query) qs = urllib.parse.parse_qs(parsed_url.query)
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join(CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(16))
# # more consistent results setting it to right before the end # # more consistent results setting it to right before the end
video_length = [str(float((qs.get('len') or ['1.5'])[0]) - 1)] video_length = [str(float((qs.get('len') or ['1.5'])[0]) - 1)]
@ -2863,7 +2862,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
continue continue
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None, def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None,
data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None, **kwargs): data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None,
required=False, **kwargs):
""" """
Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client. Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client.
@ -2878,6 +2878,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
@param player_url: player URL. @param player_url: player URL.
@param video_id: video ID. @param video_id: video ID.
@param webpage: video webpage. @param webpage: video webpage.
@param required: Whether the PO Token is required (i.e. try to fetch unless policy is "never").
@param kwargs: Additional arguments to pass down. May be more added in the future. @param kwargs: Additional arguments to pass down. May be more added in the future.
@return: The fetched PO Token. None if it could not be fetched. @return: The fetched PO Token. None if it could not be fetched.
""" """
@ -2926,6 +2927,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
player_url=player_url, player_url=player_url,
video_id=video_id, video_id=video_id,
video_webpage=webpage, video_webpage=webpage,
required=required,
**kwargs, **kwargs,
) )
@ -2945,6 +2947,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
or ( or (
fetch_pot_policy == 'auto' fetch_pot_policy == 'auto'
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS'] and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
and not kwargs.get('required', False)
) )
): ):
return None return None
@ -3133,6 +3136,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
player_url = self._download_player_url(video_id) player_url = self._download_player_url(video_id)
tried_iframe_fallback = True tried_iframe_fallback = True
pr = initial_pr if client == 'web' else None
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg) visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg) data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg)
@ -3147,12 +3152,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ytcfg': player_ytcfg or self._get_default_ytcfg(client), 'ytcfg': player_ytcfg or self._get_default_ytcfg(client),
} }
player_po_token = self.fetch_po_token( # Don't need a player PO token for WEB if using player response from webpage
player_po_token = None if pr else self.fetch_po_token(
context=_PoTokenContext.PLAYER, **fetch_po_token_args) context=_PoTokenContext.PLAYER, **fetch_po_token_args)
gvs_po_token = self.fetch_po_token( gvs_po_token = self.fetch_po_token(
context=_PoTokenContext.GVS, **fetch_po_token_args) context=_PoTokenContext.GVS, **fetch_po_token_args)
fetch_subs_po_token_func = functools.partial(
self.fetch_po_token,
context=_PoTokenContext.SUBS,
**fetch_po_token_args,
)
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS'] required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
if ( if (
@ -3179,7 +3191,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
only_once=True) only_once=True)
deprioritize_pr = True deprioritize_pr = True
pr = initial_pr if client == 'web' else None
try: try:
pr = pr or self._extract_player_response( pr = pr or self._extract_player_response(
client, video_id, client, video_id,
@ -3197,10 +3208,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if pr_id := self._invalid_player_response(pr, video_id): if pr_id := self._invalid_player_response(pr, video_id):
skipped_clients[client] = pr_id skipped_clients[client] = pr_id
elif pr: elif pr:
# Save client name for introspection later # Save client details for introspection later
sd = traverse_obj(pr, ('streamingData', {dict})) or {} innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT')
sd = pr.setdefault('streamingData', {})
sd[STREAMING_DATA_CLIENT_NAME] = client sd[STREAMING_DATA_CLIENT_NAME] = client
sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context
sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})): for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
f[STREAMING_DATA_CLIENT_NAME] = client f[STREAMING_DATA_CLIENT_NAME] = client
f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
@ -3262,6 +3276,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
self.report_warning(msg, only_once=True) self.report_warning(msg, only_once=True)
def _report_pot_subtitles_skipped(self, video_id, client_name, msg=None):
msg = msg or (
f'{video_id}: Some {client_name} client subtitles require a PO Token which was not provided. '
'They will be discarded since they are not downloadable as-is. '
f'You can manually pass a Subtitles PO Token for this client with '
f'--extractor-args "youtube:po_token={client_name}.subs+XXX" . '
f'For more information, refer to {PO_TOKEN_GUIDE_URL}')
subs_wanted = any((
self.get_param('writesubtitles'),
self.get_param('writeautomaticsub'),
self.get_param('listsubtitles')))
# Only raise a warning for non-default clients, to not confuse users.
if not subs_wanted or client_name in (*self._DEFAULT_CLIENTS, *self._DEFAULT_AUTHED_CLIENTS):
self.write_debug(msg, only_once=True)
else:
self.report_warning(msg, only_once=True)
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration): def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
CHUNK_SIZE = 10 << 20 CHUNK_SIZE = 10 << 20
PREFERRED_LANG_VALUE = 10 PREFERRED_LANG_VALUE = 10
@ -3365,8 +3398,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._decrypt_signature(encrypted_sig, video_id, player_url), self._decrypt_signature(encrypted_sig, video_id, player_url),
) )
except ExtractorError as e: except ExtractorError as e:
self.report_warning('Signature extraction failed: Some formats may be missing', self.report_warning(
video_id=video_id, only_once=True) f'Signature extraction failed: Some formats may be missing\n'
f' player = {player_url}\n'
f' {bug_reports_message(before="")}',
video_id=video_id, only_once=True)
self.write_debug(
f'{video_id}: Signature extraction failure info:\n'
f' encrypted sig = {encrypted_sig}\n'
f' player = {player_url}')
self.write_debug(e, only_once=True) self.write_debug(e, only_once=True)
continue continue
@ -3553,6 +3593,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}' hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}'
fmts, subs = self._extract_m3u8_formats_and_subtitles( fmts, subs = self._extract_m3u8_formats_and_subtitles(
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live') hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
for sub in traverse_obj(subs, (..., ..., {dict})):
# HLS subs (m3u8) do not need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) subtitles = self._merge_subtitles(subs, subtitles)
for f in fmts: for f in fmts:
if process_manifest_format(f, 'hls', client_name, self._search_regex( if process_manifest_format(f, 'hls', client_name, self._search_regex(
@ -3564,6 +3607,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if po_token: if po_token:
dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}' dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}'
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False) formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
for sub in traverse_obj(subs, (..., ..., {dict})):
# TODO: Investigate if DASH subs ever need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
for f in formats: for f in formats:
if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token): if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token):
@ -3755,7 +3801,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
reason = self._get_text(pemr, 'reason') or get_first(playability_statuses, 'reason') reason = self._get_text(pemr, 'reason') or get_first(playability_statuses, 'reason')
subreason = clean_html(self._get_text(pemr, 'subreason') or '') subreason = clean_html(self._get_text(pemr, 'subreason') or '')
if subreason: if subreason:
if subreason == 'The uploader has not made this video available in your country.': if subreason.startswith('The uploader has not made this video available in your country'):
countries = get_first(microformats, 'availableCountries') countries = get_first(microformats, 'availableCountries')
if not countries: if not countries:
regions_allowed = search_meta('regionsAllowed') regions_allowed = search_meta('regionsAllowed')
@ -3890,47 +3936,85 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'), 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'),
} }
def get_lang_code(track):
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
or track.get('languageCode'))
def process_language(container, base_url, lang_code, sub_name, client_name, query):
lang_subs = container.setdefault(lang_code, [])
for fmt in self._SUBTITLE_FORMATS:
query = {**query, 'fmt': fmt}
lang_subs.append({
'ext': fmt,
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
'name': sub_name,
STREAMING_DATA_CLIENT_NAME: client_name,
})
subtitles = {} subtitles = {}
pctr = traverse_obj(player_responses, (..., 'captions', 'playerCaptionsTracklistRenderer'), expected_type=dict) skipped_subs_clients = set()
if pctr:
def get_lang_code(track):
return (remove_start(track.get('vssId') or '', '.').replace('.', '-')
or track.get('languageCode'))
# Converted into dicts to remove duplicates # Only web/mweb clients provide translationLanguages, so include initial_pr in the traversal
captions = { translation_languages = {
get_lang_code(sub): sub lang['languageCode']: self._get_text(lang['languageName'], max_runs=1)
for sub in traverse_obj(pctr, (..., 'captionTracks', ...))} for lang in traverse_obj(player_responses, (
translation_languages = { ..., 'captions', 'playerCaptionsTracklistRenderer', 'translationLanguages',
lang.get('languageCode'): self._get_text(lang.get('languageName'), max_runs=1) lambda _, v: v['languageCode'] and v['languageName']))
for lang in traverse_obj(pctr, (..., 'translationLanguages', ...))} }
# NB: Constructing the full subtitle dictionary is slow
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and (
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles'))
def process_language(container, base_url, lang_code, sub_name, query): # Filter out initial_pr which does not have streamingData (smuggled client context)
lang_subs = container.setdefault(lang_code, []) prs = traverse_obj(player_responses, (
for fmt in self._SUBTITLE_FORMATS: lambda _, v: v['streamingData'] and v['captions']['playerCaptionsTracklistRenderer']))
query.update({ all_captions = traverse_obj(prs, (
'fmt': fmt, ..., 'captions', 'playerCaptionsTracklistRenderer', 'captionTracks', ..., {dict}))
}) need_subs_langs = {get_lang_code(sub) for sub in all_captions if sub.get('kind') != 'asr'}
lang_subs.append({ need_caps_langs = {
'ext': fmt, remove_start(get_lang_code(sub), 'a-')
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)), for sub in all_captions if sub.get('kind') == 'asr'}
'name': sub_name,
})
# NB: Constructing the full subtitle dictionary is slow for pr in prs:
get_translated_subs = 'translated_subs' not in self._configuration_arg('skip') and ( pctr = pr['captions']['playerCaptionsTracklistRenderer']
self.get_param('writeautomaticsub', False) or self.get_param('listsubtitles')) client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME]
for lang_code, caption_track in captions.items(): innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName']
base_url = caption_track.get('baseUrl') required_contexts = self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
orig_lang = parse_qs(base_url).get('lang', [None])[-1] fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN]
if not base_url:
continue pot_params = {}
already_fetched_pot = False
for caption_track in traverse_obj(pctr, ('captionTracks', lambda _, v: v['baseUrl'])):
base_url = caption_track['baseUrl']
qs = parse_qs(base_url)
lang_code = get_lang_code(caption_track)
requires_pot = (
# We can detect the experiment for now
any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv'))
or _PoTokenContext.SUBS in required_contexts)
if not already_fetched_pot:
already_fetched_pot = True
if subs_po_token := fetch_subs_po_token_func(required=requires_pot):
pot_params.update({
'pot': subs_po_token,
'potc': '1',
'c': innertube_client_name,
})
if not pot_params and requires_pot:
skipped_subs_clients.add(client_name)
self._report_pot_subtitles_skipped(video_id, client_name)
break
orig_lang = qs.get('lang', [None])[-1]
lang_name = self._get_text(caption_track, 'name', max_runs=1) lang_name = self._get_text(caption_track, 'name', max_runs=1)
if caption_track.get('kind') != 'asr': if caption_track.get('kind') != 'asr':
if not lang_code: if not lang_code:
continue continue
process_language( process_language(
subtitles, base_url, lang_code, lang_name, {}) subtitles, base_url, lang_code, lang_name, client_name, pot_params)
if not caption_track.get('isTranslatable'): if not caption_track.get('isTranslatable'):
continue continue
for trans_code, trans_name in translation_languages.items(): for trans_code, trans_name in translation_languages.items():
@ -3950,10 +4034,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Add an "-orig" label to the original language so that it can be distinguished. # Add an "-orig" label to the original language so that it can be distinguished.
# The subs are returned without "-orig" as well for compatibility # The subs are returned without "-orig" as well for compatibility
process_language( process_language(
automatic_captions, base_url, f'{trans_code}-orig', f'{trans_name} (Original)', {}) automatic_captions, base_url, f'{trans_code}-orig',
f'{trans_name} (Original)', client_name, pot_params)
# Setting tlang=lang returns damaged subtitles. # Setting tlang=lang returns damaged subtitles.
process_language(automatic_captions, base_url, trans_code, trans_name, process_language(
{} if orig_lang == orig_trans_code else {'tlang': trans_code}) automatic_captions, base_url, trans_code, trans_name, client_name,
pot_params if orig_lang == orig_trans_code else {'tlang': trans_code, **pot_params})
# Avoid duplication if we've already got everything we need
need_subs_langs.difference_update(subtitles)
need_caps_langs.difference_update(automatic_captions)
if not (need_subs_langs or need_caps_langs):
break
if skipped_subs_clients and (need_subs_langs or need_caps_langs):
self._report_pot_subtitles_skipped(video_id, True, msg=join_nonempty(
f'{video_id}: There are missing subtitles languages because a PO token was not provided.',
need_subs_langs and f'Subtitles for these languages are missing: {", ".join(need_subs_langs)}.',
need_caps_langs and f'Automatic captions for {len(need_caps_langs)} languages are missing.',
delim=' '))
info['automatic_captions'] = automatic_captions info['automatic_captions'] = automatic_captions
info['subtitles'] = subtitles info['subtitles'] = subtitles

View File

@ -39,6 +39,7 @@ __all__ = [
class PoTokenContext(enum.Enum): class PoTokenContext(enum.Enum):
GVS = 'gvs' GVS = 'gvs'
PLAYER = 'player' PLAYER = 'player'
SUBS = 'subs'
@dataclasses.dataclass @dataclasses.dataclass

View File

@ -51,7 +51,7 @@ def get_webpo_content_binding(
return visitor_id, ContentBindingType.VISITOR_ID return visitor_id, ContentBindingType.VISITOR_ID
return request.visitor_data, ContentBindingType.VISITOR_DATA return request.visitor_data, ContentBindingType.VISITOR_DATA
elif request.context == PoTokenContext.PLAYER or client_name != 'WEB_REMIX': elif request.context in (PoTokenContext.PLAYER, PoTokenContext.SUBS):
return request.video_id, ContentBindingType.VIDEO_ID return request.video_id, ContentBindingType.VIDEO_ID
return None, None return None, None

View File

@ -590,39 +590,12 @@ class JSInterpreter:
return ret, True return ret, True
return ret, False return ret, False
for m in re.finditer(rf'''(?x)
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
var = m.group('var1') or m.group('var2')
start, end = m.span()
sign = m.group('pre_sign') or m.group('post_sign')
ret = local_vars[var]
local_vars[var] += 1 if sign[0] == '+' else -1
if m.group('pre_sign'):
ret = local_vars[var]
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
if not expr:
return None, should_return
m = re.match(fr'''(?x) m = re.match(fr'''(?x)
(?P<assign>
(?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s* (?P<out>{_NAME_RE})(?:\[(?P<index>{_NESTED_BRACKETS})\])?\s*
(?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})? (?P<op>{"|".join(map(re.escape, set(_OPERATORS) - _COMP_OPERATORS))})?
=(?!=)(?P<expr>.*)$ =(?!=)(?P<expr>.*)$
)|(?P<return> ''', expr)
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$ if m: # We are assigning a value to a variable
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:
(?P<nullish>\?)?\.(?P<member>[^(]+)|
\[(?P<member2>{_NESTED_BRACKETS})\]
)\s*
)|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr)
if m and m.group('assign'):
left_val = local_vars.get(m.group('out')) left_val = local_vars.get(m.group('out'))
if not m.group('index'): if not m.group('index'):
@ -640,7 +613,35 @@ class JSInterpreter:
m.group('op'), self._index(left_val, idx), m.group('expr'), expr, local_vars, allow_recursion) m.group('op'), self._index(left_val, idx), m.group('expr'), expr, local_vars, allow_recursion)
return left_val[idx], should_return return left_val[idx], should_return
elif expr.isdigit(): for m in re.finditer(rf'''(?x)
(?P<pre_sign>\+\+|--)(?P<var1>{_NAME_RE})|
(?P<var2>{_NAME_RE})(?P<post_sign>\+\+|--)''', expr):
var = m.group('var1') or m.group('var2')
start, end = m.span()
sign = m.group('pre_sign') or m.group('post_sign')
ret = local_vars[var]
local_vars[var] += 1 if sign[0] == '+' else -1
if m.group('pre_sign'):
ret = local_vars[var]
expr = expr[:start] + self._dump(ret, local_vars) + expr[end:]
if not expr:
return None, should_return
m = re.match(fr'''(?x)
(?P<return>
(?!if|return|true|false|null|undefined|NaN)(?P<name>{_NAME_RE})$
)|(?P<attribute>
(?P<var>{_NAME_RE})(?:
(?P<nullish>\?)?\.(?P<member>[^(]+)|
\[(?P<member2>{_NESTED_BRACKETS})\]
)\s*
)|(?P<indexing>
(?P<in>{_NAME_RE})\[(?P<idx>.+)\]$
)|(?P<function>
(?P<fname>{_NAME_RE})\((?P<args>.*)\)$
)''', expr)
if expr.isdigit():
return int(expr), should_return return int(expr), should_return
elif expr == 'break': elif expr == 'break':

View File

@ -230,6 +230,9 @@ class _YoutubeDLOptionParser(optparse.OptionParser):
formatter.indent() formatter.indent()
heading = formatter.format_heading('Preset Aliases') heading = formatter.format_heading('Preset Aliases')
formatter.indent() formatter.indent()
description = formatter.format_description(
'Predefined aliases for convenience and ease of use. Note that future versions of yt-dlp '
'may add or adjust presets, but the existing preset names will not be changed or removed')
result = [] result = []
for name, args in _PRESET_ALIASES.items(): for name, args in _PRESET_ALIASES.items():
option = optparse.Option('-t', help=shlex.join(args)) option = optparse.Option('-t', help=shlex.join(args))
@ -238,7 +241,7 @@ class _YoutubeDLOptionParser(optparse.OptionParser):
formatter.dedent() formatter.dedent()
formatter.dedent() formatter.dedent()
help_lines = '\n'.join(result) help_lines = '\n'.join(result)
return f'{formatted_help}\n{heading}{help_lines}' return f'{formatted_help}\n{heading}{description}\n{help_lines}'
def create_parser(): def create_parser():
@ -470,7 +473,7 @@ def create_parser():
general.add_option( general.add_option(
'--live-from-start', '--live-from-start',
action='store_true', dest='live_from_start', action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently only supported for YouTube (Experimental)') help='Download livestreams from the start. Currently experimental and only supported for YouTube and Twitch')
general.add_option( general.add_option(
'--no-live-from-start', '--no-live-from-start',
action='store_false', dest='live_from_start', action='store_false', dest='live_from_start',
@ -545,9 +548,9 @@ def create_parser():
help=( help=(
'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". ' 'Create aliases for an option string. Unless an alias starts with a dash "-", it is prefixed with "--". '
'Arguments are parsed according to the Python string formatting mini-language. ' 'Arguments are parsed according to the Python string formatting mini-language. '
'E.g. --alias get-audio,-X "-S=aext:{0},abr -x --audio-format {0}" creates options ' 'E.g. --alias get-audio,-X "-S aext:{0},abr -x --audio-format {0}" creates options '
'"--get-audio" and "-X" that takes an argument (ARG0) and expands to ' '"--get-audio" and "-X" that takes an argument (ARG0) and expands to '
'"-S=aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. ' '"-S aext:ARG0,abr -x --audio-format ARG0". All defined aliases are listed in the --help output. '
'Alias options can trigger more aliases; so be careful to avoid defining recursive options. ' 'Alias options can trigger more aliases; so be careful to avoid defining recursive options. '
f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. ' f'As a safety measure, each alias may be triggered a maximum of {_YoutubeDLOptionParser.ALIAS_TRIGGER_LIMIT} times. '
'This option can be used multiple times')) 'This option can be used multiple times'))

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2025.04.30' __version__ = '2025.05.22'
RELEASE_GIT_HEAD = '505b400795af557bdcfd9d4fa7e9133b26ef431c' RELEASE_GIT_HEAD = '7977b329ed97b216e37bd402f4935f28c00eac9e'
VARIANT = None VARIANT = None
@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.04.30' _pkg_version = '2025.05.22'