Extension:External Data/Throttling data retrievals

From Linux Web Expert

Retrieval of web data and program data can be throttled per data source, that is, a delay between calls to the same web service or program can be enforced. If a throttled data source is attempted to be accessed before the specified delay has passed, then, if a cache is being used, a stale cache value will be returned; otherwise, a message informing the users that calls are throttled will be shown, and a job actually fetching the data will be scheduled.

For web sites, soap services and server-side programs, throttling is configured by the settings throttle key and throttle interval within $wgExternalDataSources:

  • throttle key is a string that generates a throttle key. Wildcards within the string like $host$, $url$ or $2nd_lvl_domain$ (for web services) or $param$ (for parameters to programs) will be replaced with their corresponding values. The default value is $2nd_lvl_domain$, meaning that, for example, any call to any page in Wikipedia in any language will have the throttle key wikipedia.org.
  • throttle interval is a float holding the minimal interval, in seconds, between calls to web services or server-side programs with the same throttle key.

As with other settings, these are per data source, which can be:

  • The full URL,
  • host,
  • second-level domain,
  • '*' for the default fallback for any site.

Throttling makes sense when there are numerous calls of parser functions (e.g., caused by a template embedded many times) addressing the same external service or program that either requires much computational resources or is, effectively, a call to an external service, like youtube-dl.

If there is no throttling key, or throttling interval is zero or not set, there will be no throttling. This is the default setting.