Regex PHP: Get specific content from a block of code from another website -


i have site want specific content 7 posts. 7 seven posts have same html layout (see below)

<div class="eventinfo"> <h3>z's(矢沢永吉)</h3>   <h4>z's tour 2015</h4>  <dl>     <dt><img src="/event/img/btn_day.png" alt="公演日時" width="92" height="20"> </dt>     <dd>       <table width="99%" border="0" cellpadding="0" cellspacing="0">         <tbody><tr>       <td width="9%" nowrap="nowrap">2015年6月</td>       <td width="74%">4日 (木) 19:00開演</td>     </tr>    </tbody></table> </dd> <dt><img src="/event/img/btn_price.png" alt="料金" width="92" height="20"> </dt> <dd>s¥10,500 a¥7,500 (全席指定・消費税込)<br><span class="attention">※</span>注意事項の詳細を<a href="http://www.siteurl.com/info/live/guidelines.html" target="_blank">矢沢永吉公式サイト</a>より必ずご確認ください</dd>  <dt><img src="/event/img/btn_ticket.png" alt="一般発売" width="92" height="20"> </dt> <dd>  <table width="99%" border="0" cellpadding="0" cellspacing="0">   <tbody><tr>     <td width="9%" nowrap="nowrap">2015年5月</td>     <td width="74%">16日(土)</td>   </tr> </tbody></table>   </dd>    <dt><img src="/event/img/btn_contact.png" alt="お問合わせ" width="92" height="20"> </dt>   <dd><a href="http://www.siteurl.com/" target="_blank">ソーゴー大阪</a> 06-6344-3326</dd>    <dt><img src="/event/img/btn_info.png" alt="公演詳細" width="92" height="20"> </dt>   <dd><a href="http://www.siteurl.com/zs/index_pc.html" target="_blank">http://www.siteurl.com</a> </dd> </dl> </div> 

i want fetch h3 layout , first table in code. regex method should use desired results?

also these 7 posts code above , have h3 , first table each of it.

i have tested not sure correct way or not: https://regex101.com/r/so6tj8/1

but can see have add unwanted data h4 dt img :(

i don't think regex best choice here. if can away without using regex so. have @ goutte php web-scraper.

$crawler = $client->request('get', 'http://www.example.com/some-page'); $heading = $crawler->filter('h3')->first(); $table = $crawler->filter('table')-> first(); 

this not better readable, make easier fix when html-structure changes.

if must choose regex following h3 (haven't tested it, that):

$html = preg_replace_callback(     '/<h3>(.*?)<\/h3>/u',     function ($match) {         return $match[1];     },     $html ); 

for table similar have use multiline-modifier m (it wouldn't hurt add h3 well, example don't need it).


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -