Extension:Regex Fun/nan
<translate> This extension is currently not actively maintained!</translate> <translate> Although it may still work, any bug reports or feature requests will more than likely be ignored.</translate> <translate> If you are interested in taking on the task of developing and maintaining this extension, [[<tvar name=request>Special:MyLanguage/Gerrit/Privilege policy#Requesting Gerrit privileges</tvar>|you can request repository ownership]].</translate> <translate> As a courtesy, you may want to contact the author.</translate> <translate> You should also remove this template and list yourself as maintaining the extension in the page's <tvar name=extension>{{Extension }}</tvar> infobox.</translate> |
Regex Fun Release status: unmaintained |
|
---|---|
Implementation | Parser function |
Description | Adds parser functions allowing the use of regular expressions within wiki pages |
Author(s) | Daniel Werner (Danwetalk) |
Latest version | 1.3.0 (2017-07-27) |
MediaWiki | 1.23+ |
PHP | 5.3+ |
Database changes | No |
License | ISC License |
Download | README CHANGELOG |
Quarterly downloads | Lua error in Module:Extension at line 172: bad argument #1 to 'inNamespace' (unrecognized namespace name 'skin'). |
Public wikis using | Lua error in Module:Extension at line 172: bad argument #1 to 'inNamespace' (unrecognized namespace name 'skin'). |
Translate the Regex Fun extension if it is available at translatewiki.net |
The Regex Fun extension provides four new parser functions for performing regular expression handling within wiki pages. The main difference to other regex extensions such as RegexFunctions, besides richer functionality, is that this extension provides a function #regexquote
for encoding user-provided strings for the use within a regular expression, a function #regex_var
to access subexpression results, and it introduces MediaWiki-related regular expression modifiers r
and changes the meaning of the e
modifier in a meaningful and secure way.
Sú-iōng
This section will introduce each of the four regular expression related functions which come along with this extension.
#regex
This function allows simple search via regular expression. It will return the first match within a string. It also has a replacement mode where each match of the expression is being replaced (up to an optional limit). It allows the use of php pcre regex modifiers, in replace mode it changes the meaning of the e
modifier and introduces the r
modifier.
- syntax
- search
{{#regex: text | pattern }}
E.g. "{{#regex: foo 10$, Baa 21$, baa 3$ | /baa\s+\d+\s*\$/i }}
" would return "Baa 21$
"- replace
{{#regex: text | pattern | replacement | limit }}
E.g. "{{#regex: Foo 10, Baa 21 | /(.+?)\s+(\d+)\s*/ | $1 $2\$ }}
" would return "Foo 10$, Baa 21$
"
Parameter limit is optional (default-1
)
Parameter replacement allows back-references "$n
", where n is a number. "\n
" is possible as well but not recommended.
- MW modifiers (flags)
#regex:
can use all pcre regex modifiers with their exact meaning, except for thee
modifier which would be a security risk. Thee
modifier has another meaning instead in repacement mode. In addition, ther
modifier is introduced.
r
- Only in replacement mode. If set, the function will return an empty string if no replacement could be done (if the expression didn't match anything). Without the flag the function would simply return the input string.
e
- Only in replacement mode. Before the replacement of matching strings is done, references in the replacement string (such as
$0
or\1
) will be replaced by their matches. In casee
is set, after this replacement, the replacement string will first be parsed before being inserted in the original string. In addition, the whole wikitext within the replacement parameter will not be parsed before the actual regex. This allows to use parser functions or templates within the replacement string which will run over the references first.
- Example:
{{#regex: Foo 10, Baa 21 | /(.+?)\s+(\d+)\s*/e | {{uc:$1}} $2\$ }}
would return "FOO 10$, BAA 21$"
" (considering Template:(( and Template:)) exist within the wiki)
- Example:
- Two more examples to show the difference (without/with e-flag):
{{#regex: cat, dog, bear |/\w+/ | {{uc:a $0}} }}
returns "A cat, A dog, A bear
" because{{uc:a $0}}
is parsed first, which results just into "A $0
", even before the "#regex
" is being parsed.{{#regex: cat, dog, bear |/\w+/e | {{uc:a $0}} }}
returns "A CAT, A DOG, A BEAR
" because the e-flag delays the parsing of the third parameter to the time it is being used as replacement
- Two more examples to show the difference (without/with e-flag):
#regexall
Searches the whole string for as many matches as possible and returns them separated by a separator. Also gives control from which match to start and where to end. This function can be particularly useful together with extensions Arrays and HashTables.
- syntax
{{#regexall: text | pattern | separator | offset | length }}
E.g. "{{#regexall: 0+11+2+33 | /\d+/ | , | 1 | 2 }}
" would return "11,2
".
- Optional parameters
- separator (default "
,
") - offset (default
0
) If non-negative, first item will come from that offset. If negative, the first item comes that far from the end of all items. - length (default "
") If set and non-negative, the result will contain that many items. If negative, the last item comes that far from the end of all items.
- separator (default "
#regex_var
This function allows to access subexpression references of the last used #regex
function. Subexpressions basically are the parts within parentheses "()
" which can be referenced to as "$n
" (where n is a number) within #regex
in replacement mode. It is possible to set a certain index in parameter 1 or to use a whole string, containing references, following the rules of the #regex
replacement string.
- syntax
- using specific index
{{#regex_var: index | default }}
E.g.{{#regex_var: 0 | nothing }}
- using reference string
{{#regex_var: references | default }}
E.g.{{#regex_var: $3, $1 and $2 }}
- Parameter default can contain a string which will be used in case the given index doesn't exist, the last use of
#regex
failed or#regex
hasn't been called yet. - Accessing named subexpressions (like
(?P<name>expr)
) has not been implemented.
- rules
- There are a few points you should be aware of, using this function:
- default will be used in case
#regex
wasn't executed before or the last execution caused an error. - If there lays a template call between the last
#regex
use and#regex
is being called from within that template,#regex_var
will consider the function call from within the template as reference. This will lead to confusing outputs. This might be fixed in a later version.
- default will be used in case
#regexquote
Important function to escape user provided data which should be used as part of a regular expression. User provided input, for example template parameter provided data, should always run through this function first to make sure that special characters like ".
" or "\
" in the user input won't irritate the regular expression. Technically, this function will run the php function [preg_quote] over the string and in case the first character has a special meaning in MW, it will be replaced it with its hexadecimal notation e.g. "\x23
" instead of "#
" (to prevent the line from becoming a MW list).
- syntax
{{#regexquote: text | delimiter }}
Parameter delimiter is optional and should be the character used as delimiter within the regular expression where the text should be used. By default, the delimiter is set to "/
" since it is the most common delimiter in most examples.
- example
{{#regex: {{{Items|}}} | %(?:^{{!}}(?<=,))(\s*{{#regexquote: {{{Favorite}}} | % }}\s*)(?:${{!}}(?=\,))% | '''$''' }}
This will highlight some item provided by template parameter Favorite within a list of items, separated by comma, provided by parameter Items. If#regexquote
wasn't used here and Favorite would contain some special character, this would break the whole expression and return an error message!
invalid regular expression handling
Instead of outputting a php notice in the event of an invalid regular expression, this will output an inline wiki error message which can be caught by extension ParserFunctions error catching #iferror function.
An-tsong
- <translate> [[<tvar name=2>Special:ExtensionDistributor/RegexFun</tvar>|Download]] and move the extracted <tvar name=name>
RegexFun
</tvar> folder to your <tvar name=ext>extensions/
</tvar> directory.</translate>
<translate> Developers and code contributors should install the extension [[<tvar name=git>Special:MyLanguage/Download from Git</tvar>|from Git]] instead, using:</translate>cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/RegexFun - <translate> Add the following code at the bottom of your <tvar name=1>LocalSettings.php </tvar> file:</translate>
require_once "$IP/extensions/RegexFun/RegexFun.php";
- Configure as required.
- File:OOjs UI icon check-constructive.svg <translate> Done</translate> – <translate> Navigate to <tvar name=special>Special:Version</tvar> on your wiki to verify that the extension is successfully installed.</translate>
Tsoo-thāi
Regex Fun comes with two global customization variables. Their default values can be changed by including them into localsettings.php after the inclusion of RegexFun.php.
$egRegexFunDisabledFunctions
- Array which allows to define functions which should not be available for use within the wiki. For example if you want to prevent users from using
#regex_var
and#regexall
, simply set this to:$egRegexFunDisabledFunctions = array( 'regexall', 'regex_var' );
$egRegexFunMaxRegexPerParse
- Defines the maximum regular expression executions per ongoing parser process. This counts all major executed regular expression usages triggered by this extension. The counter will be increased by '#regex', '#regexall' and by '#regex_var' in case a reference string is given but not if only an index is requested. '#regexquote' is not affected. Before the limit is exceeded, a
#iferror
catchable error message will be put out instead of the result of the called function. By default the limit is set to-1
which disables any limitation. - Note: Instead of using a limit per page, this limit is per
Parser::parse()
process bound to each Parser object. This makes sense to avoid complications on page import or when the job queue is updating pages because a single increasing global counter would not really be per page but rather per session then.
Tsham-ua̍t
- Extension:RegexFunctions - just another regular expression extension with less functionality but with more global customization variables for further limitations.
- Extension:ReplaceSet - Allows several replacement strings within one function call.
- Pages with script errors
- Pages with broken file links
- Unmaintained extensions/nan
- Parser function extensions/nan
- Extensions with manual MediaWiki version
- ISC licensed extensions/nan
- Extensions in Wikimedia version control/nan
- ParserFirstCallInit extensions/nan
- ParserClearState extensions/nan
- ParserLimitReport extensions/nan
- All extensions/nan
- Extensions not in ExtensionJson
- Extensions not using extension registration