php - Regex lookahead complex pattern -
this code replacing many single pipes double pipes. keep change small, i'd prefer correct second regex allows spaces between "|" , ",".
so, question how modify second regex not match \|[[:blank:]]*[^,\r\n].  
code:
$patterns = array (   '/\\\\\|,/',    "/(?<=[^,])\|(?=[^,\n\r])/" ); $replacements = array (   '|,',    '||' ); $line = preg_replace ($patterns, $replacements, $line);   example:
for string: "|di|,|15| ,|c00413914|,||         ,|f|"
expected / desired result:
"|di|,|15| ,|c00413914|,||         ,|f|"
actual result:
"|di|,|15|| ,|c00413914|,|||         ,|f|"
i've tried this, didn't work:
"/(?<=[^,])\|(?=[[:blank:]]*[^,\n\r])/"
please note:
this question fixing bug smallest change possible. current regex may suboptimal (like using negative character classes instead of negative lookaround), first priority minimize changes , not optimize regex.
update:
in other words, based on interpretation of original regex, revised should match single | followed 0 or more spaces that's not @ beginning or end of line, not preceded comma, , not followed ,, \r, or \n.
more examples:
5|fooshould match5| fooshould match5|,should not match5| ,should not match5|\rshould not match5| \rshould not match,||,should not match,|| ,should not match
discovered applying suggestions real data. original regex appears observe behavior:
|foo|,should not match. pipe first character on line.|foo| ,should not match. pipe first character on line.,|foo|should not match. pipe last character on line, newline may not exist (such eof).,|foo|should not match. pipe + whitespace last characters on line, newline may not exist (such eof).
the regex looking in beginning can written
(?<!,)\|(?![[:blank:]]*[,\n\r])   it matches pipe if there no optional whitespaces followed comma or linebreaks, , not preceded comma.
note in regex example not need possessive quantifier because inside lookahead in php possessive behavior turned on default due internal optimizations..
your final regex can like
(?<=[^\r\n,])\|(?=[[:blank:]]*+[^,\n\r])   it checks if pipe preceded character other comma or linebreaks, , followed 0 or more spaces not followed commq or linebreaks. possessive behavior can forced *+ if pcre library compiled without optimizations.
Comments
Post a Comment