Compare commits

...

7 Commits

Author SHA1 Message Date
coletdjnz
93f8ffdad0
generalize provider setup 2025-05-17 13:44:49 +12:00
coletdjnz
c1fd13a9b8
reorder extractor args 2025-05-17 13:39:42 +12:00
coletdjnz
37c5b6a6f2
Update README.md
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-05-17 13:38:15 +12:00
coletdjnz
2d594b4744
fixes from review 2025-05-17 13:37:16 +12:00
coletdjnz
e7382323bc
better cache key name for internal versioning 2025-05-17 13:37:16 +12:00
coletdjnz
ac58980b33
Update README.md
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-05-17 13:37:03 +12:00
coletdjnz
b4f0d272be
Update yt_dlp/extractor/youtube/pot/README.md
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-05-17 13:20:19 +12:00
5 changed files with 26 additions and 32 deletions

View File

@ -1795,6 +1795,7 @@ The following extractors use this feature:
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
@ -1804,13 +1805,12 @@ The following extractors use this feature:
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
* `pot_trace`: Enable debug logging for PO Token fetching. Either `true` or `false` (default)
* `fetch_pot`: Policy to use for fetching a PO Token from providers. `always` to always try fetch a PO Token regardless if the client requires one for the given context, `never` to never fetch a PO Token, or `auto` to only fetch a PO Token if the client requires one for the given context (default)
* `fetch_pot`: Policy to use for fetching a PO Token from providers. One of `always` (always try fetch a PO Token regardless if the client requires one for the given context), `never` (never fetch a PO Token), or `auto` (default; only fetch a PO Token if the client requires one for the given context)
#### youtubepot-webpo
* `bind_to_visitor_id`: Whether to use the Visitor ID instead of Visitor Data for caching WebPO tokens. Either `true` or `false` (default `true`)
* `bind_to_visitor_id`: Whether to use the Visitor ID instead of Visitor Data for caching WebPO tokens. Either `true` (default) or `false`
#### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)

View File

@ -578,7 +578,7 @@ class TestPoTokenCache:
assert cache.get(pot_request) is None
cache.store(pot_request, response)
assert len(memorypcp.cache) == 1
assert hashlib.sha256(f'_pExampleProvider_ytv1v{pot_request.video_id}'.encode()).hexdigest() in memorypcp.cache
assert hashlib.sha256(f'_dlp_cachev1_pExampleProviderv{pot_request.video_id}'.encode()).hexdigest() in memorypcp.cache
# The second spec provider returns the exact same key bindings as the first one,
# however the PoTokenCache should use the provider key to differentiate between them
@ -591,7 +591,7 @@ class TestPoTokenCache:
assert cache.get(pot_request) is None
cache.store(pot_request, response)
assert len(memorypcp.cache) == 2
assert hashlib.sha256(f'_pExampleProviderTwo_ytv1v{pot_request.video_id}'.encode()).hexdigest() in memorypcp.cache
assert hashlib.sha256(f'_dlp_cachev1_pExampleProviderTwov{pot_request.video_id}'.encode()).hexdigest() in memorypcp.cache
def test_cache_provider_preferences(self, pot_request, ie, logger):
pcp_one = create_memory_pcp(ie, logger, provider_key='memory_pcp_one')

View File

@ -911,6 +911,7 @@ class YoutubeDL:
def add_close_hook(self, ch):
"""Add a close hook, called when YoutubeDL.close() is called"""
assert callable(ch), 'Close hook must be callable'
self._close_hooks.append(ch)
def add_progress_hook(self, ph):

View File

@ -1,6 +1,6 @@
# YoutubeIE PO Token Provider Framework
As part of the YouTube extractor, we have a framework for providing PO Tokens programmatically. This can be used in plugins.
As part of the YouTube extractor, we have a framework for providing PO Tokens programmatically. This can be used by plugins.
Refer to the [PO Token Guide](https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide) for more information on PO Tokens.

View File

@ -37,7 +37,7 @@ from yt_dlp.extractor.youtube.pot.provider import (
PoTokenResponse,
provider_bug_report_message,
)
from yt_dlp.utils import ExtractorError, bug_reports_message, format_field, join_nonempty, traverse_obj
from yt_dlp.utils import bug_reports_message, format_field, join_nonempty
if typing.TYPE_CHECKING:
from yt_dlp.extractor.youtube.pot.cache import CacheProviderPreference
@ -135,7 +135,7 @@ class PoTokenCache:
bindings_cleaned = {
**{k: v for k, v in spec.key_bindings.items() if v is not None},
# Allow us to invalidate caches if such need arises
'_yt': 'v1',
'_dlp_cache': 'v1',
}
if spec._provider:
bindings_cleaned['_p'] = spec._provider.PROVIDER_KEY
@ -351,36 +351,33 @@ EXTRACTOR_ARG_PREFIX = 'youtubepot'
def initialize_pot_director(ie):
if not ie._downloader:
raise ExtractorError('Downloader not set', expected=False)
assert ie._downloader is not None, 'Downloader not set'
enable_trace = ie._configuration_arg(
'pot_trace', ['false'], ie_key='youtube', casesense=False)[0] == 'true'
if enable_trace:
log_level = IEContentProviderLogger.LogLevel.TRACE
elif ie._downloader.params.get('verbose', False):
elif ie.get_param('verbose', False):
log_level = IEContentProviderLogger.LogLevel.DEBUG
else:
log_level = IEContentProviderLogger.LogLevel.INFO
def get_provider_logger_and_settings(provider, logger_key):
logger_prefix = f'{logger_key}:{provider.PROVIDER_NAME}'
extractor_key = f'{EXTRACTOR_ARG_PREFIX}-{provider.PROVIDER_KEY.lower()}'
return (
YoutubeIEContentProviderLogger(ie, logger_prefix, log_level=log_level),
ie.get_param('extractor_args', {}).get(extractor_key, {}))
cache_providers = []
for cache_provider in _pot_cache_providers.value.values():
settings = traverse_obj(
ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_provider.PROVIDER_KEY.lower()}'))
cache_provider_logger = YoutubeIEContentProviderLogger(
ie, f'pot:cache:{cache_provider.PROVIDER_NAME}', log_level=log_level)
cache_providers.append(cache_provider(ie, cache_provider_logger, settings or {}))
logger, settings = get_provider_logger_and_settings(cache_provider, 'pot:cache')
cache_providers.append(cache_provider(ie, logger, settings))
cache_spec_providers = []
for cache_spec_provider in _pot_pcs_providers.value.values():
settings = traverse_obj(
ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{cache_spec_provider.PROVIDER_KEY.lower()}'))
cache_spec_provider_logger = YoutubeIEContentProviderLogger(
ie, f'pot:cache:spec:{cache_spec_provider.PROVIDER_NAME}', log_level=log_level)
cache_spec_providers.append(cache_spec_provider(ie, cache_spec_provider_logger, settings or {}))
logger, settings = get_provider_logger_and_settings(cache_spec_provider, 'pot:cache:spec')
cache_spec_providers.append(cache_spec_provider(ie, logger, settings))
cache = PoTokenCache(
logger=YoutubeIEContentProviderLogger(ie, 'pot:cache', log_level=log_level),
@ -397,12 +394,8 @@ def initialize_pot_director(ie):
ie._downloader.add_close_hook(director.close)
for provider in _pot_providers.value.values():
settings = traverse_obj(
ie._downloader.params,
('extractor_args', f'{EXTRACTOR_ARG_PREFIX}-{provider.PROVIDER_KEY.lower()}'))
logger = YoutubeIEContentProviderLogger(
ie, f'pot:{provider.PROVIDER_NAME}', log_level=log_level)
director.register_provider(provider(ie, logger, settings or {}))
logger, settings = get_provider_logger_and_settings(provider, 'pot')
director.register_provider(provider(ie, logger, settings))
for preference in _ptp_preferences.value:
director.register_preference(preference)
@ -448,8 +441,8 @@ def clean_pot(po_token: str):
def validate_response(response: PoTokenResponse | None):
if (
not isinstance(response, PoTokenResponse)
or not response.po_token
or not isinstance(response.po_token, str)
or not response.po_token
): # noqa: SIM103
return False
@ -478,5 +471,5 @@ def validate_cache_spec(spec: PoTokenCacheSpec):
and isinstance(spec.key_bindings, dict)
and all(isinstance(k, str) for k in spec.key_bindings)
and all(v is None or isinstance(v, str) for v in spec.key_bindings.values())
and len({k for k in spec.key_bindings.values() if k is not None}) > 0
and bool([v for v in spec.key_bindings.values() if v is not None])
)