Регулярные выражения для самых маленьких

Содержание:

Syntax
Oracle’s REGEXP Functions
Скобочные группы ― ()
Example 6 : REGEXP_LIKE and REGEXP_SUBSTR
Символы¶
- Простые совпадения
- Непечатные символы (escape-коды)
Oracle REGEXP_COUNT Examples
Why Use Regular Expressions?

Syntax

REGEXP_REPLACE( string, target  ] ] ] )

Parameters

string

The or string to search for a regular expression pattern match. If string exists in a column of a flex or columnar table, cast string to a before searching for pattern.

target

The regular expression to search for within the string. The syntax of the regular expression is compatible with the Perl 5 regular expression syntax. See the Perl Regular Expressions Documentation for details.

replacement

The string to replace matched substrings. If you do not supply a replacement, the function deletes matched substrings. The replacement string can contain backreferences for substrings captured by the regular expression. The first captured substring is inserted into the replacement string using , the second , and so on.

position

The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the nth character you specify.

Default value: 1

occurrence

Controls which occurrence of a pattern match in the string to return. By default, the function returns the position of the first matching substring. Use this parameter to find the position of subsequent matching substrings. For example, setting this parameter to 3 returns the position of the third substring that matches the pattern.

Default value: 1

regexp_modifier

One or more single-character flags that modify how the regular expression finds matches in string:

b	Treat strings as binary octets, rather than UTF-8 characters.
c	Force the match to be case sensitive (the default).
i	Force the match to be case insensitive.
m	Treat the string to match as multiple lines. Using this modifier, the start of line () and end of line ( regular expression operators match line breaks () within the string. Without the modifier, the start and end of line operators match only the start and end of the string.
n	Allow the single character regular expression operator () to match a newline (). Without the modifier, the operator matches any character except a newline.
x	Add comments to your regular expressions. Using the modifier causes the function to ignore all unescaped space characters and comments in the regular expression. Comments start with a hash () character and end with a newline (). All spaces in the regular expression that you want to be matched in strings must be escaped with a backslash () character.

Oracle’s REGEXP Functions

Oracle Database 10g offers four regular expression functions. You can use these equally in your SQL and PL/SQL statements.

REGEXP_LIKE(source, regexp, modes) is probably the one you’ll use most. You can use it in the WHERE and HAVING clauses of a SELECT statement. In a PL/SQL script, it returns a Boolean value. You can also use it in a CHECK constraint. The source parameter is the string or column the regex should be matched against. The regexp parameter is a string with your regular expression. The modes parameter is optional. It sets the matching modes.

SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');
IF REGEXP_LIKE('subject', 'regexp') THEN /* Match */ ELSE /* No match */ END IF;
ALTER TABLE mytable ADD (CONSTRAINT mycolumn_regexp CHECK (REGEXP_LIKE(mycolumn, '^regexp$')));

REGEXP_SUBSTR(source, regexp, position, occurrence, modes) returns a string with the part of source matched by the regular expression. If the match attempt fails, NULL is returned. You can use REGEXP_SUBSTR with a single string or with a column. You can use it in SELECT clauses to retrieve only a certain part of a column. The position parameter specifies the character position in the source string at which the match attempt should start. The first character has position 1. The occurrence parameter specifies which match to get. Set it to 1 to get the first match. If you specify a higher number, Oracle will continue to attempt to match the regex starting at the end of the previous match, until it found as many matches as you specified. The last match is then returned. If there are fewer matches, NULL is returned. Do not confuse this parameter with backreferences. Oracle does not provide a function to return the part of the string matched by a capturing group. The last three parameters are optional.

SELECT REGEXP_SUBSTR(mycolumn, 'regexp') FROM mytable;
match := REGEXP_SUBSTR('subject', 'regexp', 1, 1, 'i')

REGEXP_REPLACE(source, regexp, replacement, position, occurrence, modes) returns the source string with one or all regex matches replaced. If no matches can be found, the original string is replaced. If you specify a positive number for occurrence (see the above paragraph) only that match is replaced. If you specify zero or omit the parameter, all matches are replaced. The last three parameters are optional. The replacement parameter is a string that each regex match will be replaced with. You can use the backreferences \1 through \9 in the replacement text to re-insert text matched by a capturing group. You can reference the same group more than once. There’s no replacement text token to re-insert the whole regex match. To do that, put parentheses around the whole regexp, and use \1 in the replacement. If you want to insert \1 literally, use the string '\\1'. Backslashes only need to be escaped if they’re followed by a digit or another backslash. To insert \\ literally, use the string '\\\\'. While SQL does not require backslashes to be escaped in strings, the REGEXP_REPLACE function does.

SELECT REGEXP_REPLACE(mycolumn, 'regexp', 'replacement') FROM mytable;
result := REGEXP_REPLACE('subject', 'regexp', 'replacement', 1, 0, 'i');

REGEXP_INSTR(source, regexp, position, occurrence, return_option, modes) returns the beginning or ending position of a regex match in the source string. This function takes the same parameters as REGEXP_SUBSTR, plus one more. Set return_option to zero or omit the parameter to get the position of the first character in match. Set it to one to get the position of the first character after the match. The first character in the string has position 1. REGEXP_INSTR returns zero if the match cannot be found. The last 4 parameters are optional.

SELECT REGEXP_INSTR(mycolumn, 'regexp', 1, 1, 0, 'i') FROM mytable;

REGEXP_COUNT(source, regexp, position, modes) returns the number of times the regex can be matched in the source string. It returns zero if the regex finds no matches at all. This function is only available in Oracle 11g and later.

SELECT REGEXP_COUNT(mycolumn, 'regexp', 1, 'i') FROM mytable;

Скобочные группы ― ()

a(bc)       создаём группу со значением bc -> тестa(?:bc)*    оперетор ?: отключает группу -> тестa(?<foo>bc) так, мы можем присвоить имя группе -> тест

Этот оператор очень полезен, когда нужно извлечь информацию из строк или данных, используя ваш любимый язык программирования. Любые множественные совпадения, по нескольким группам, будут представлены в виде классического массива: доступ к их значениям можно получить с помощью индекса из результатов сопоставления.

Если присвоить группам имена (используя ), то можно получить их значения, используя результат сопоставления, как словарь, где ключами будут имена каждой группы.

Example 6 : REGEXP_LIKE and REGEXP_SUBSTR

We have strings containing parentheses. We want to return the text within the parentheses for those rows that contain parentheses.

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);

INSERT INTO t1 VALUES ('This is some text (with parentheses) in it.');
INSERT INTO t1 VALUES ('This text has no parentheses.');
INSERT INTO t1 VALUES ('This text has (parentheses too).');
COMMIT;

The basic pattern for text between parentheses is «\(.*\)». The «\» characters are escapes for the parentheses, making them literals. Without the escapes they would be assumed to define a sub-expression. That pattern alone is fine to identify the rows of interest using a operator, but it is not appropriate in a , as it would return the parentheses also. To omit the parentheses we need to include a sub-expression inside the literal parentheses «\((.*)\)». We can then using the first sub expression.

COLUMN with_parentheses FORMAT A20
COLUMN without_parentheses FORMAT A20

SELECT data,
       REGEXP_SUBSTR(data, '\(.*\)') AS with_parentheses,
       REGEXP_SUBSTR(data, '\((.*)\)', 1, 1, 'i', 1) AS without_parentheses
FROM   t1
WHERE  REGEXP_LIKE(data, '\(.*\)');

DATA                                               WITH_PARENTHESES     WITHOUT_PARENTHESES
-------------------------------------------------- -------------------- --------------------
This is some text (with parentheses) in it.        (with parentheses)   with parentheses
This text has (parentheses too).                   (parentheses too)    parentheses too

2 rows selected.

SQL>

Символы¶

Простые совпадения

Любой отдельный символ соответствует самому себе.

Серия символов соответствует этой серии символов во входной строке.

RegEx	Находит

Непечатные символы (escape-коды)

Для представления непечатаемого символа в регулярном выражении вы используете :

RegEx
Находит

символ с шестнадцатеричным кодом « nn«

символ с шестнадцатеричным кодом (один байт для простого текста и два байта для Unicode)

(обратите внимание на пробел в середине)

Существует ряд предопределенных для непечатных символов, как в языке :

RegEx	Находит
	tab (HT/TAB), тоже что
	символ новой строки (NL), то же что
	car.return (CR), тоже что
	form feed (FF), то же что
	звонок (BEL), тоже что
	escape (ESC), то же что
	Ctrl-комбинация () Например соответствует , поскольку имеет код

Oracle REGEXP_COUNT Examples

Here are some examples of the REGEX_COUNT function.

Example 1

This example finds the number of occurrences that contain two consecutive vowels.

TITLE	REG
Box
Chair	1
Vacuum	1
Desk
Round	1
Under
Waterfall
Dismiss
Superficial	1
Suspicious	1

Example 2

This example finds the number of occurrences that contain vowels.

TITLE	REG
Box	1
Chair	2
Vacuum	3
Desk	1
Round	2
Under	1
Waterfall	3
Dismiss	2
Superficial	5
Suspicious	5

Example 3

This example finds the number of occurrences that contain digits.

TITLE	REG
Wishful
Wellness
Xylophone
Yeti
Zucchini
Summer of 69	2
The year is 2017	4
1955	4

Example 4

This example finds the number of occurrences that contain vowels, starting from position 3.

TITLE	REG
Box
Chair	2
Vacuum	2
Desk
Round	1
Under	1
Waterfall	2
Dismiss	1
Superficial	4
Suspicious	4

Example 5

This example finds the number of occurrences that contain a capital letter, starting from position 2.

TITLE	REG
Mathematical
Press
QUICKLY	6
Roger
Simple
TraVERse	3
Exclaim
Undercover
Vain

Why Use Regular Expressions?

The main reason to use regular expressions is that they provide a very specific level of control over the pattern matching for your strings.

Oracle SQL has many string functions that allow you to do some comparisons. You can use UPPER to find upper case values, for example, and can use a combination of LIKE and wildcard characters % and _ to find certain values.

However, for more complicated checks, these functions are not enough.

This is where regular expressions come in.

They can be used to:

Check phone number formats
Check email address formats
Check URLs match a specific format
Check any other type of string value to see if it matches a desired format.

Many programming languages include regular expressions in their libraries. However, using it at the database level allows for the check to be done closer to where the data is stored, making it more efficient. It’s also more consistent, which means any application that uses the data you’re working with will see and require it in the same format.