Regex split and concatenate path base and pattern with filename deleting part of path between them -
i have url this:
a) <a href=\"http://example.com/path-pattern-to-match/subpath/onemoresubpath/arbitrary-number-of-subpaths/somearticle1\">
or:
b) <a href=\"http://example.com/path-pattern-to-match/somearticle2\">
i need split path pattern base url, start of <a> tag , concatenate iits somearticle. in between needs deleted.
case 'b' remains untouched. case 'a' needs become:
<a href=\"http://example.com/path-pattern-to-match/somearticle1\">
please answer regex, need. other solutions interesting if explained, using perl or bash script, please avoid suggest programming module or function parse regex not best solution , without real 1 solution.
ps: need parse non multiline file. somearticle variable.
if have look-behind support, use
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)(?:[^\/]+\/)*([^\/>"]*)(?=\\">) see demo
explanation
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)- fixed width lookbehind making sure have<a href=\"http://example.com/path-pattern-to-match/literal text in front of...(?:[^\/]+\/)*- 0 or more sequences of 1 or more characters other/([^\/]+) followed literal/(i.e. subpaths)([^\/>"]*)- capturing group matches our keyword "somearticle" (0 or more characters other",>, or/.(?=\\">)- positive lookahead checking if there\">right after preceding subpattern.
using $1 replacement string, can remove subpaths , keep "somearticle" part.
Comments
Post a Comment