Regex PHP: Get specific content from a block of code from another website -
i have site want specific content 7 posts. 7 seven posts have same html layout (see below)
<div class="eventinfo"> <h3>z's(矢沢永吉)</h3> <h4>z's tour 2015</h4> <dl> <dt><img src="/event/img/btn_day.png" alt="公演日時" width="92" height="20"> </dt> <dd> <table width="99%" border="0" cellpadding="0" cellspacing="0"> <tbody><tr> <td width="9%" nowrap="nowrap">2015年6月</td> <td width="74%">4日 (木) 19:00開演</td> </tr> </tbody></table> </dd> <dt><img src="/event/img/btn_price.png" alt="料金" width="92" height="20"> </dt> <dd>s¥10,500 a¥7,500 (全席指定・消費税込)<br><span class="attention">※</span>注意事項の詳細を<a href="http://www.siteurl.com/info/live/guidelines.html" target="_blank">矢沢永吉公式サイト</a>より必ずご確認ください</dd> <dt><img src="/event/img/btn_ticket.png" alt="一般発売" width="92" height="20"> </dt> <dd> <table width="99%" border="0" cellpadding="0" cellspacing="0"> <tbody><tr> <td width="9%" nowrap="nowrap">2015年5月</td> <td width="74%">16日(土)</td> </tr> </tbody></table> </dd> <dt><img src="/event/img/btn_contact.png" alt="お問合わせ" width="92" height="20"> </dt> <dd><a href="http://www.siteurl.com/" target="_blank">ソーゴー大阪</a> 06-6344-3326</dd> <dt><img src="/event/img/btn_info.png" alt="公演詳細" width="92" height="20"> </dt> <dd><a href="http://www.siteurl.com/zs/index_pc.html" target="_blank">http://www.siteurl.com</a> </dd> </dl> </div>
i want fetch h3 layout , first table in code. regex method should use desired results?
also these 7 posts code above , have h3 , first table each of it.
i have tested not sure correct way or not: https://regex101.com/r/so6tj8/1
but can see have add unwanted data h4 dt img :(
i don't think regex best choice here. if can away without using regex so. have @ goutte php web-scraper.
$crawler = $client->request('get', 'http://www.example.com/some-page'); $heading = $crawler->filter('h3')->first(); $table = $crawler->filter('table')-> first();
this not better readable, make easier fix when html-structure changes.
if must choose regex following h3 (haven't tested it, that):
$html = preg_replace_callback( '/<h3>(.*?)<\/h3>/u', function ($match) { return $match[1]; }, $html );
for table similar have use multiline-modifier m
(it wouldn't hurt add h3 well, example don't need it).
Comments
Post a Comment