Regex split and concatenate path base and pattern with filename deleting part of path between them -


i have url this:

a) <a href=\"http://example.com/path-pattern-to-match/subpath/onemoresubpath/arbitrary-number-of-subpaths/somearticle1\">

or:

b) <a href=\"http://example.com/path-pattern-to-match/somearticle2\">

i need split path pattern base url, start of <a> tag , concatenate iits somearticle. in between needs deleted.

case 'b' remains untouched. case 'a' needs become:

<a href=\"http://example.com/path-pattern-to-match/somearticle1\">

please answer regex, need. other solutions interesting if explained, using perl or bash script, please avoid suggest programming module or function parse regex not best solution , without real 1 solution.

ps: need parse non multiline file. somearticle variable.

if have look-behind support, use

(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)(?:[^\/]+\/)*([^\/>"]*)(?=\\">) 

see demo

explanation

  • (?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/) - fixed width lookbehind making sure have <a href=\"http://example.com/path-pattern-to-match/ literal text in front of...
  • (?:[^\/]+\/)* - 0 or more sequences of 1 or more characters other / ([^\/]+) followed literal / (i.e. subpaths)
  • ([^\/>"]*) - capturing group matches our keyword "somearticle" (0 or more characters other ", >, or /.
  • (?=\\">) - positive lookahead checking if there \"> right after preceding subpattern.

using $1 replacement string, can remove subpaths , keep "somearticle" part.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -