regex - accessible longest match (from the beginning) without substring in replacement -

- June 15, 2014

i wondered if possible using sed match longest string (from beginning) not containing substring making match accessible laterwards using sed's regex replacement variables \n.

regarding following snippet

echo "blabla/a/b/dee/per" | sed -r -e 's:([^/a]*):\1:g'

i trying print out longest match containing sign indicated * not including substring /a in way above snippet prints out

blabla

regarding (/a deleted/replaced)

echo "blabla/b/b/dee/per" | sed -r -e 's:([^/a]*):\1:g'

i expecting

blabla/b/b/dee/per

as output due substring /a not available , longest match leads strings end. stuck @ describing substring /a.

caution: [^/a] placeholder describe problem. needs imo replaced correct substring description. possible in way using sed?

thank in advance

edit: john1024's third answer completes question. following snippet used:

 sed -r -e 's:(/a|$):\x00:;s:^(.*)\x00(.*):\1:g'

edit: fulfill original task prepend values pathes different prefixes containing substring surrounded other characters came along

 $ echo -ne "blabla/a/b/dee/per\nblabla/b/dee/per" | \    sed -r -e 's:(.*)/a/b:\1\x00:;s:(.*)/b:\1\x01:;s:^(.*)\x00(.*):\1/foo/a/b\2:g;s:^(.*)\x01(.*):\1/foo/b\2:g'  blabla/foo/a/b/dee/per  blabla/foo/b/dee/per

which first replaces prefix pathes /a/b or /b \x00 or \x01 respectively making sed groups, a.k.a. prefix , suffix pathes, accessible through \n described below.

note: additional trick used here avoid (.*)/b matching (.*)/a/b replace longest path prefixes first. again @john1024

find string beginning until first occurrence of `/a` (2nd version of question)

$ echo "blabla/a/b/dee/per" | sed 's|/a.*||' blabla  $ echo "blabla/b/b/dee/per" | sed 's|/a.*||' blabla/b/b/dee/per

find longest string not containing `/a` (original question)

this problem more natural match awk:

$ echo "blabla/a/b/dee/per" | awk -v rs='/a' 'length($0)>max{longest=$0; max=length(longest);} end{print longest;}' /b/dee/per  $ echo "blabla/b/b/dee/per" | awk -v rs='/a' 'length($0)>max{longest=$0; max=length(longest);} end{print longest;}' blabla/b/b/dee/per

how works

-v rs='/a'

this sets record separator /a. divides input upon every occurrence of /a.
length($0)>max{longest=$0; max=length(longest);}

if current record, $0, longer previous longest record, update longest , max new record.
end{print longest;}

when reach end of input, print out longest record saw.

capture string beginning first `/a` in sed group (3rd version of question)

$ echo "blabla/a/b/dee/per" | sed -r 's!(/a|$)!\x00!; s|^(.*)\x00.*|i found "\1".|' found "blabla".  $ echo "blabla/b/b/dee/per" | sed -r 's!(/a|$)!\x00!; s|^(.*)\x00.*|i found "\1".|' found "blabla/b/b/dee/per".

how works:

s!(/a|$)!\x00!

this replaces first occurrence of /a nul character, \x00. if no occurrence of /a found, nul character placed @ end of string (signified in regex $). (the nul character chosen because can never held in bash variable and, thus, extremely unlikely in input string.)
s|^(.*)\x00.*|i found "\1".|

this saves group 1 characters location first /a used be. can use \1 in replacement please.

as written, requires sed, such gnu sed, supports nul-character, hex 00. if sed not support nul, replace \x00 character won't in input string sed support. \x01 might second choice.

Search This Blog

Running