Regex split and concatenate path base and pattern with filename deleting part of path between them -
i have url this:
a) <a href=\"http://example.com/path-pattern-to-match/subpath/onemoresubpath/arbitrary-number-of-subpaths/somearticle1\">
or:
b) <a href=\"http://example.com/path-pattern-to-match/somearticle2\">
i need split path pattern base url, start of <a>
tag , concatenate iits somearticle
. in between needs deleted.
case 'b' remains untouched. case 'a' needs become:
<a href=\"http://example.com/path-pattern-to-match/somearticle1\">
please answer regex, need. other solutions interesting if explained, using perl or bash script, please avoid suggest programming module or function parse regex not best solution , without real 1 solution.
ps: need parse non multiline file. somearticle
variable.
if have look-behind support, use
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)(?:[^\/]+\/)*([^\/>"]*)(?=\\">)
see demo
explanation
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)
- fixed width lookbehind making sure have<a href=\"http://example.com/path-pattern-to-match/
literal text in front of...(?:[^\/]+\/)*
- 0 or more sequences of 1 or more characters other/
([^\/]+
) followed literal/
(i.e. subpaths)([^\/>"]*)
- capturing group matches our keyword "somearticle" (0 or more characters other"
,>
, or/
.(?=\\">)
- positive lookahead checking if there\">
right after preceding subpattern.
using $1
replacement string, can remove subpaths , keep "somearticle" part.
Comments
Post a Comment