php - Regex lookahead complex pattern -


this code replacing many single pipes double pipes. keep change small, i'd prefer correct second regex allows spaces between "|" , ",".

so, question how modify second regex not match \|[[:blank:]]*[^,\r\n].

code:

$patterns = array (   '/\\\\\|,/',    "/(?<=[^,])\|(?=[^,\n\r])/" ); $replacements = array (   '|,',    '||' ); $line = preg_replace ($patterns, $replacements, $line); 

example:

for string: "|di|,|15| ,|c00413914|,|| ,|f|"

expected / desired result:

"|di|,|15| ,|c00413914|,|| ,|f|"

actual result:

"|di|,|15|| ,|c00413914|,||| ,|f|"

i've tried this, didn't work:

  • "/(?<=[^,])\|(?=[[:blank:]]*[^,\n\r])/"

please note:

this question fixing bug smallest change possible. current regex may suboptimal (like using negative character classes instead of negative lookaround), first priority minimize changes , not optimize regex.

update:

in other words, based on interpretation of original regex, revised should match single | followed 0 or more spaces that's not @ beginning or end of line, not preceded comma, , not followed ,, \r, or \n.

more examples:

  1. 5|foo should match
  2. 5| foo should match
  3. 5|, should not match
  4. 5| , should not match
  5. 5|\r should not match
  6. 5| \r should not match
  7. ,||, should not match
  8. ,|| , should not match

discovered applying suggestions real data. original regex appears observe behavior:

  1. |foo|, should not match. pipe first character on line.
  2. |foo| , should not match. pipe first character on line.
  3. ,|foo| should not match. pipe last character on line, newline may not exist (such eof).
  4. ,|foo| should not match. pipe + whitespace last characters on line, newline may not exist (such eof).

the regex looking in beginning can written

(?<!,)\|(?![[:blank:]]*[,\n\r]) 

it matches pipe if there no optional whitespaces followed comma or linebreaks, , not preceded comma.

note in regex example not need possessive quantifier because inside lookahead in php possessive behavior turned on default due internal optimizations..  

your final regex can like

(?<=[^\r\n,])\|(?=[[:blank:]]*+[^,\n\r]) 

it checks if pipe preceded character other comma or linebreaks, , followed 0 or more spaces not followed commq or linebreaks. possessive behavior can forced *+ if pcre library compiled without optimizations.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -