preg_replace

(PHP 4, PHP 5, PHP 7, PHP 8)

preg_replace — 执行一个正则表达式的搜索和替换

说明

搜索 subject 中匹配 pattern 的部分，以 replacement 进行替换。

匹配一个精确的字符串，而不是一个模式，可以使用 str_replace() 或 str_ireplace() 代替这个函数。

参数

pattern

要搜索的模式。可以是一个字符串或字符串数组。

可以使用一些 PCRE 修饰符。

replacement

用于替换的字符串或字符串数组。如果这个参数是一个字符串，并且 pattern 是一个数组，那么所有的模式都使用这个字符串进行替换。如果 pattern 和 replacement 都是数组，每个 pattern 使用 replacement 中对应的元素进行替换。如果 replacement 中的元素比 pattern 中的少，多出来的 pattern 使用空字符串进行替换。

replacement 中可以包含后向引用 \\n 或 $n，语法上首选后者。每个这样的引用将被匹配到的第 n 个捕获子组捕获到的文本替换。 n 可以是0-99，\\0 和 $0 代表完整的模式匹配文本。捕获子组的序号计数方式为：代表捕获子组的左括号从左到右，从1开始数。如果要在 replacement 中使用反斜线，必须使用 4 个("\\\\"，译注：因为这首先是 PHP 的字符串，经过转义后，是两个，再经过正则表达式引擎后才被认为是一个原文反斜线)。

当在替换模式下工作并且后向引用后面紧跟着需要是另外一个数字 (比如：在一个匹配模式后紧接着增加一个原文数字)，不能使用 \\1 这样的语法来描述后向引用。比如，\\11将会使preg_replace() 不能理解你希望的是一个 \\1 后向引用紧跟一个原文 1，还是一个 \\11 后向引用后面不跟任何东西。这种情况下解决方案是使用 ${1}1。这创建了一个独立的 $1 后向引用, 一个独立的原文 1。

subject

要进行搜索和替换的字符串或字符串数组。

如果 subject 是一个数组，搜索和替换回在 subject 的每一个元素上进行, 并且返回值也会是一个数组。

如果 subject 是关联数组，则键会保留在返回值中。

limit

每个模式在每个 subject 上进行替换的最大次数。默认是 -1(无限)。

count

如果指定，将会被填充为完成的替换次数。

返回值

如果 subject 是一个数组，preg_replace() 返回一个数组，其他情况下返回一个字符串。

如果匹配被查找到，替换后的 subject 被返回，其他情况下返回没有改变的 subject。如果发生错误，返回 null 。

错误／异常

"\e" 会并忽略，并产生 E_WARNING 错误。

如果传递的正则表达式无法正常解析，会发出 E_WARNING。

示例

示例 #1 使用后向引用紧跟数值原文

<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>

以上示例会输出：

April1,2003

示例 #2 preg_replace() 中使用基于索引的数组

<?php
$string = 'The quick brown fox jumps over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
echo preg_replace($patterns, $replacements, $string);
?>

以上示例会输出：

The bear black slow jumps over the lazy dog.

对模式和替换内容按 key 进行排序我们可以得到期望的结果。

<?php
$string = 'The quick brown fox jumps over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
ksort($patterns);
ksort($replacements);
echo preg_replace($patterns, $replacements, $string);
?>

以上示例会输出：

The slow black bear jumps over the lazy dog.

示例 #3 替换一些值

<?php
$patterns = array ('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/',
                   '/^\s*{(\w+)}\s*=/');
$replace = array ('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
?>

以上示例会输出：

$startDate = 5/27/1999

示例 #4 剥离空白字符

这个例子剥离多余的空白字符

<?php
$str = 'foo   o';
$str = preg_replace('/\s\s+/', ' ', $str);
// 将会改变为'foo o'
echo $str;
?>

示例 #5 使用参数 count

<?php
$count = 0;

echo preg_replace(array('/\d/', '/\s/'), '*', 'xp 4 to', -1 , $count);
echo $count; //3
?>

以上示例会输出：

xp***to
3

注释

注意:
当使用数组形式的pattern和replacement时, 将会按照key在数组中出现的顺序进行处理. 这不一定和数组的索引顺序一致. 如果你期望使用索引对等方式用replacement对pattern 进行替换, 你可以在调用preg_replace()之前对两个数组各进行一次ksort()排序.

注意:
当 pattern 和 replacement 都是数组时，匹配规则将按顺序执行。也就是说第二个 pattern/replacement 将作用于第一个 pattern/replacement 生成的字符串，而不是原始字符串。 If you want to simulate replacements operating in parallel, such as swapping two values, replace one pattern by an intermediary placeholder, then in a later pair replace that intermediary placeholder with the desired replacement.
<?php
$p = array('/a/', '/b/', '/c/');
$r = array('b', 'c', 'd');
print_r(preg_replace($p, $r, 'a'));
// 打印 d
?>

参见

PCRE 模式
preg_quote() - 转义正则表达式字符
preg_filter() - 执行一个正则表达式搜索和替换
preg_match() - 执行匹配正则表达式
preg_replace_callback() - 执行一个正则表达式搜索并且使用一个回调进行替换
preg_split() - 通过一个正则表达式分隔字符串
preg_last_error() - 返回最后一个PCRE正则执行产生的错误代码
str_replace() - 子字符串替换

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 10 notes

down

794

arkani at iol dot pt ¶

17 years ago

Because i search a lot 4 this:

The following should be escaped if you are trying to match that character

\ ^ . $ | ( ) [ ]
* + ? { } ,

Special Character Definitions
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
More Special Character Stuff
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
\a alarm (bell) (BEL)
\e escape (think troff) (ESC)
\033 octal char (think of a PDP-11)
\x1B hex char
\c[ control char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
\Q quote (disable) pattern metacharacters till \E
Even More Special Characters
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
\G Match only where previous m//g left off (works only with /g)

down

Anonymous ¶

2 years ago

You can only use numeric backreferences in the replacement string, but not named ones:
<?php 
echo preg_replace('#(\d+)#', '\1 $1 ${1}', '123');
// 123 123 123
echo preg_replace('#(?<digits>\d+)#', '\digits $digits ${digits}', '123');
// \digits $digits ${digits}
?>

To use named backreferences, you have to use preg_replace_callback:
<?php
echo preg_replace_callback('#(?<digits>\d+)#', function( $m ){
  return "$m[1] $m[digits] {$m['digits']}";
}, '123');
// 123 123 123

echo preg_replace_callback('#(?<digits>\d+)#', fn($m) => "$m[1] $m[digits] {$m['digits']}", '123');
// 123 123 123
?>

See https://bugs.php.net/bug.php?id=81469

down

nik at rolls dot cc ¶

13 years ago

To split Pascal/CamelCase into Title Case (for example, converting descriptive class names for use in human-readable frontends), you can use the below function:

<?php
function expandCamelCase($source) {
  return preg_replace('/(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/', ' $1', $source);
}
?>

Before:
  ExpandCamelCaseAPIDescriptorPHP5_3_4Version3_21Beta
After:
  Expand Camel Case API Descriptor PHP 5_3_4 Version 3_21 Beta

down

ismith at nojunk dot motorola dot com ¶

19 years ago

Be aware that when using the "/u" modifier, if your input text contains any bad UTF-8 code sequences, then preg_replace will return an empty string, regardless of whether there were any matches.

This is due to the PCRE library returning an error code if the string contains bad UTF-8.

down

sternkinder at gmail dot com ¶

18 years ago

From what I can see, the problem is, that if you go straight and substitute all 'A's wit 'T's you can't tell for sure which 'T's to substitute with 'A's afterwards. This can be for instance solved by simply replacing all 'A's by another character (for instance '_' or whatever you like), then replacing all 'T's by 'A's, and then replacing all '_'s (or whatever character you chose) by 'A's:

<?php
$dna = "AGTCTGCCCTAG";
echo str_replace(array("A","G","C","T","_","-"), array("_","-","G","A","T","C"), $dna); //output will be TCAGACGGGATC
?>

Although I don't know how transliteration in perl works (though I remember that is kind of similar to the UNIX command "tr") I would suggest following function for "switching" single chars:

<?php
function switch_chars($subject,$switch_table,$unused_char="_") {
    foreach ( $switch_table as $_1 => $_2 ) {
        $subject = str_replace($_1,$unused_char,$subject);
        $subject = str_replace($_2,$_1,$subject);
        $subject = str_replace($unused_char,$_2,$subject);
    }
    return $subject;
}

echo switch_chars("AGTCTGCCCTAG", array("A"=>"T","G"=>"C")); //output will be TCAGACGGGATC
?>

down

php-comments-REMOVE dot ME at dotancohen dot com ¶

18 years ago

Below is a function for converting Hebrew final characters to their
normal equivelants should they appear in the middle of a word.
The /b argument does not treat Hebrew letters as part of a word,
so I had to work around that limitation.

<?php

$text="עברית מבולגנת";

function hebrewNotWordEndSwitch ($from, $to, $text) {
   $text=
    preg_replace('/'.$from.'([א-ת])/u','$2'.$to.'$1',$text);
   return $text;
}

do {
   $text_before=$text;
   $text=hebrewNotWordEndSwitch("ך","כ",$text);
   $text=hebrewNotWordEndSwitch("ם","מ",$text);
   $text=hebrewNotWordEndSwitch("ן","נ",$text);
   $text=hebrewNotWordEndSwitch("ף","פ",$text);
   $text=hebrewNotWordEndSwitch("ץ","צ",$text);
}   while ( $text_before!=$text );

print $text; // עברית מסודרת!

?>

The do-while is necessary for multiple instances of letters, such
as "אנני" which would start off as "אןןי". Note that there's still the
problem of acronyms with gershiim but that's not a difficult one
to solve. The code is in use at http://gibberish.co.il which you can
use to translate wrongly-encoded Hebrew, transliterize, and some
other Hebrew-related functions.

To ensure that there will be no regular characters at the end of a
word, just convert all regular characters to their final forms, then
run this function. Enjoy!

down

-2

me at perochak dot com ¶

15 years ago

If you would like to remove a tag along with the text inside it then use the following code.

<?php
preg_replace('/(<tag>.+?)+(<\/tag>)/i', '', $string);
?>

example
<?php $string='<span class="normalprice">55 PKR</span>'; ?>

<?php
$string = preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i', '', $string);
?>

This will results a null or empty string.

<?php
$string='My String <span class="normalprice">55 PKR</span>';

$string = preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i', '', $string);
?>

This will results a " My String"

down

-1

razvan_bc at yahoo dot com ¶

3 years ago

How to replace all comments inside code without remove crln  = \r\n or cr \r each line?

<?php
$txt_target=<<<t1
this;//    dsdsds
    nope
    
/*
    ok
    */
is;huge
/*text bla*/
    /*bla*/
 
t1;

/*
=======================================================================
expected result:
=======================================================================
this;
    nope

is;huge
=======================================================================
visualizing in a hex viewer .. to_check_with_a_hex_viewer.txt ...
 t  h  i  s  ; LF TAB n  o  p  e CR LF CR LF  i  s  ;  h  u  g  e CR LF
74 68 69 73 3b 0a 09 6e 6f 70 65 0d 0a 0d 0a 69 73 3b 68 75 67 65 0d 0a
I used F3 (viewer + options 3: hex) in mythical TOTAL COMMANDER!
=======================================================================
*/

echo '<hr><pre>';
echo  $txt_target;
echo '</pre>';

//  a single line '//' comments
$txt_target = preg_replace('![ \t]*//.*[ \t]*!', '', $txt_target);

//  /* comment */
$txt_target = preg_replace('/\/\*([^\/]*)\*\/(\s+)/smi', '', $txt_target);
echo '<hr><pre>';
echo  $txt_target;
echo '</pre><hr>';

file_put_contents('to_check_with_a_hex_viewer.txt',$txt_target);

?>

down

-4

bublifuk at mailinator dot com ¶

8 years ago

A delimiter can be any ASCII non-alphanumeric, non-backslash, non-whitespace character:  !"#$%&'*+,./:;=?@^_`|~-  and  ({[<>]})

down

-4

mail at johanvandemerwe dot nl ¶

6 years ago

Sample for replacing bracketed short-codes

The used short-codes are purely used for educational purposes for they could be shorter as in 'italic' to 'i' or 'bold' to 'b'.

Sample text
----
This sample shows how to have [italic]italic[/italic], [bold]bold[/bold] and [underline]underlined[/underline] and [strikethrough]striked[/striketrhough] text. 

with this function:

<?php
function textDecoration($html)
{
    $patterns = [
        '/\[(italic)\].*?\[\/\1\] ?/',
        '/\[(bold)\].*?\[\/\1\] ?/',
        '/\[(underline)\].*?\[\/\1\] ?/'
    ];

    $replacements = [
        '<i>$1</i>',
        '<strong>$1</strong>',
        '<u>$1</u>'
    ];

    return preg_replace($patterns, $replacements, $html);
}

$html = textDecoration($html);

echo $html; // or return
?>

results in:
----
This sample shows how to have <i>italic</i>, <b>bold</b> and <u>underlined</u> and [strikethrough]striked[/striketrhough] text.

Notice!
There is no [strikethrough]striked[/striketrhough] fallback in the patterns and replacements array