Capturing group
(abc){3} matches abcabcabc. (這個){2} matches 這個這個. keywordList = gsub('(\\d{5})', paste0("<span onclick=\'xunbao\\(\"", "\\1\"", "\\)\'>", "\\1", "</span>"), keywordList)Capturing Groups, Non-Captured Group
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. (ssssss)non-capturing group A non-capturing group is to group a set of characters without capturing the matched text. Non-capturing groups tells the engine not to store the matched text in a separate memory slot. syntax: (?:expression) The (?:) syntax denotes a non-capturing group, and expression represents the regular expression pattern to be matched.capturing group example : codechiname = "asdfghjkl" gsub('(^....).*', '\\1', codechiname) "asdf" (^....) is the capturing group and is remembered, \\1 is to call out the remembered groupnon-capturing group example : Regex Code: (?:animal)(?:=)(\w+)(,)\1\2 Search String: Line 1 - animal=cat,dog,cat,tiger,dog Line 2 - animal=cat,cat,dog,dog,tiger Line 3 - animal=dog,dog,cat,cat,tiger (?:animal) --> Non-Captured Group 1 (?:=)--> Non-Captured Group 2 (\w+)--> Captured Group 1 (,)--> Captured Group 2 \1 - captured group 1 In Line 1 is cat, In Line 2 is cat, In Line 3 is dog. \2 - captured group 2 comma (,) So in this code by giving \1 and \2 we recall or repeat the result of captured group 1 and 2 respectively later in the code. As per the order of code (?:animal) should be group 1 and (?:=) should be group 2 and continues.. but by giving the ?: we make the match-group non captured (which do not count off in matched group, so the grouping number starts from the first captured group and not the non captured), so that the repetition of the result of match-group (?:animal) can't be called later in code. Groups that capture you can use later on in the regex to match OR you can use them in the replacement part of the regex. Making a non-capturing group simply exempts that group from being used for either of these reasons. Non-capturing groups are great if you are trying to capture many different things and there are some groups you don't want to capture. Thats pretty much the reason they exist. While you are learning about groups, learn about Atomic Groups, they do a lot! There is also lookaround groups but they are a little more complex and not used so much. Example of using later on in the regex (backreference): <([A-Z][A-Z0-9]*)\b[^>]*>.*?\1> [ Finds an xml tag (without ns support) ] ([A-Z][A-Z0-9]*) is a capturing group (in this case it is the tagname) Later on in the regex is \1 which means it will only match the same text that was in the first group (the ([A-Z][A-Z0-9]*) group) (in this case it is matching the end tag). To explain its significance pertaining to JavaScript. Consider a scenario where you want to match cat is animal when you would like match cat and animal and both should have a is in between them. // this will ignore "is" as that's is what we want "cat is animal".match(/(cat)(?: is )(animal)/) ; result ["cat is animal", "cat", "animal"] // using lookahead pattern it will match only "cat" we can // use lookahead but the problem is we can not give anything // at the back of lookahead pattern "cat is animal".match(/cat(?= is animal)/) ; result ["cat"] //so I gave another grouping parenthesis for animal // in lookahead pattern to match animal as well "cat is animal".match(/(cat)(?= is (animal))/) ; result ["cat", "cat", "animal"] // we got extra cat in above example so removing another grouping "cat is animal".match(/cat(?= is (animal))/) ; result ["cat", "animal"]not containing </a>
not containing 這個 ^((?!這個).)*$ //looking for something NOT precede by 這個 test one two a這個s three not include \t ^[^\t]*$ anything not followed by tab ^((?!\t).)*$ span not followed by / <span class="brown">((?!/).)*$ not include </a> ^((?!</a>).)*$ Finding tags Not Containing img <[^img].+?> to find all instances of "foo" not either preceded by a "." or succeeded by a "/".Lookahead and Lookbehind
(?=subexp) look-ahead (?<=subexp) look-behind (?!subexp) negative look-ahead (?<!subexp) negative look-behind Lookahead seeks following string is foo (?=foo) Lookbehind seeks preceding string is foo (?<=foo) Negative Lookahead seeks following string is not foo (?!foo) Negative Lookbehind seeks preceding string is not foo (?<!foo) test one two a這個s three look這個look這個look這個 (?=這個) look-ahead (?<=這個) look-behind (?!這個) negative look-ahead (?<!這個) negative look-behind look-ahead looking for something follow by 這個, cursor place at before 這個 .(?=這個) look-behind looking for something precede by 這個, cursor place at behind 這個 (?<=這個). negative look-ahead looking for something NOT follow by 這個, . (?!這個) cursor place at before s without 這個 negative look-behind looking for something NOT precede by 這個 (?<!這個) . (?<!\.)foo(?!/) The ^ inside square brackets negates the expression. So to find a "foo" not preceded by a "." would be: [^.]foo <[^img] [^0-9\r\n] matches any character that is not a digit or a line break. q[^u] means: "a q followed by a character that is not a u" Negated Character Classes “And” in regular expressions `&&`Grouping
(x) Matches x and remembers the match. These are called capturing groups. For example, /(foo)/ matches and remembers "foo" in "foo bar". The capturing groups are numbered according to the order of left parentheses of capturing groups, starting from 1. The matched substring can be recalled from the resulting array's elements [1], ..., [n] or from the predefined RegExp object's properties $1, ..., $9. (?:x) Matches x but does not remember the match. These are called non-capturing groups. The matched substring cannot be recalled from the resulting array's elements [1], ..., [n] or from the predefined RegExp object's properties $1, ..., $9. Regular Expression examples Non-capturing groups Capturing Groups and Backreferences
Regular Expression Recipes
Regular Expression Recipes strip all HTML tags <(.|\n)+?> strip digits \d{1,3}.? strip digits with decimals (\d*\.)?\d+
Capturing group | (regex) | Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. | (abc){3} matches abcabcabc. First group matches abc. |
Capturing group | \(regex\) | Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. | \(abc\){3} matches abcabcabc. First group matches abc. |
Non-capturing group | (?:regex) | Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything. | (?:abc){3} matches abcabcabc. No groups. |
Backreference | \1 through \9 | Substituted with the text matched between the 1st through 9th numbered capturing group. | (abc|def)=\1 matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \10 through \99 | Substituted with the text matched between the 10th through 99th numbered capturing group. | |
Backreference | \k<1> through \k<99> | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\k<1> matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \k'1' through \k'99' | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\k'1' matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \g1 through \g99 | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\g1 matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \g{1} through \g{99} | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\g{1} matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \g<1> through \g<99> | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\g<1> matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | \g'1' through \g'99' | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=\g'1' matches abc=abc or def=def, but not abc=def or def=abc. |
Backreference | (?P=1) through (?P=99) | Substituted with the text matched between the 1st through 99th numbered capturing group. | (abc|def)=(?P=1) matches abc=abc or def=def, but not abc=def or def=abc. |
Relative Backreference | \k<-1>, \k<-2>, etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\k<-3> matches abcdb. |
Relative Backreference | \k'-1', \k'-2', etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\k'-3' matches abcdb. |
Relative Backreference | \g-1, \g-2, etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\g-3 matches abcdb. |
Relative Backreference | \g{-1}, \g{-2}, etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\g{-3} matches abcdb. |
Relative Backreference | \g<-1>, \g<-2>, etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\g<-3> matches abcdb. |
Relative Backreference | \g'-1', \g'-2', etc. | Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference. | (a)(b)(c)(d)\g'-3' matches abcdb. |
Failed backreference | Any numbered backreference | Backreferences to groups that did not participate in the match attempt fail to match. | (a)?\1 matches aa but fails to match b. |
Invalid backreference | Any numbered backreference | Backreferences to groups that do not exist at all are valid but fail to match anything. | (a)?\2|b matches b in aab. |
Nested backreference | Any numbered backreference | Backreferences can be used inside the group they reference. | (a\1?){3} matches aaaaaa. |
Forward reference | Any numbered backreference | Backreferences can be used before the group they reference. | (\2?(a)){3} matches aaaaaa. |
Feature | Syntax | Description | Example |
---|
Hello my name is bob
And this search term:
Find what: my name is (\w)+
Replace with: my name used to be $(1)
The search term works just fine but I can't figure out a way to actually do a replace using the regexp group.
$1
or \1
(backslash one) for the first capture group (the first match of a pattern in parentheses). So maybe try:
my name used to be \1
or
my name used to be $1
UPDATE:
As several people have pointed out, your original capture pattern is incorrect and will only capture the final letter of the name rather than the whole name. You should use the following pattern to capture all of the letters of the name:
my name is (\w+)
Find part:
my name is (\w)+
With replace part:
my name used to be \1
Would return:
Hello, my name used to be b
Change find part to:
my name is (\w+)
And replace will be what you expect:
Hello, my name used to be bob
While (\w)+ will match "bob", it is not the grouping you want for replacement.
Use the ( ) parentheses in your search string
There is an important thing to emphasize! All the matched segments in your search string that you want to use in your replacement string must be embraced by ( ) parentheses, otherwise these matched segments won't be reachable with variables such as $1, $2,...nor \1, \2,.. and etc.
EXAMPLE:
We want to replace 'em' with 'px' but preserve the number values:
margin: 10em
margin: 2em
So we use the margin: $1px
as the replacement string.
CORRECT: Embrace the desired $1
matched segment by ( )
parentheses as following:
FIND: margin: ([0-9]*)em
(With parentheses)
REPLACE TO: margin: $1px
RESULT:
margin: 10px
margin: 2px
WRONG: The following regex pattern will match the desired lines but matched segments will not be available in replaced string as variables such as $1
:
FIND: margin: [0-9]*em
(Without parentheses)
REPLACE TO: margin: $1px
RESULT: ($1
is undefined)
margin: px
margin: px
Note that if you use more than 9 capture groups you have to use the syntax ${10}
.
$10
or \10
or \{10}
will not work.