parsing - PHP script working locally but not when placed on webserver -
the following codes scrapes list of links given webpage , place them script scrapes text given links , places data csv document. code runs on localhost (wampserver 5.5 php) fails horribly when placed on domain.
you can check out functionality of script @ http://miskai.tk/anofm/csv.php . also, file html , curl both enabled onto server.
<?php header('content-type: application/excel'); header('content-disposition: attachment; filename="mehedinti.csv"'); include_once 'simple_html_dom.php'; include_once 'csv.php'; $urls = scrape_main_page(); function scraping($url) { // create html dom $html = file_get_html($url); // article block if ($html && is_object($html) && isset($html->nodes)) { foreach ($html->find('/html/body/table') $article) { // title $item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext); // body $item['tr2'] = trim($article->find('/tbody/tr[2]/td[2]', 0)->plaintext); $item['tr3'] = trim($article->find('/tbody/tr[3]/td[2]', 0)->plaintext); $item['tr4'] = trim($article->find('/tbody/tr[4]/td[2]', 0)->plaintext); $item['tr5'] = trim($article->find('/tbody/tr[5]/td[2]', 0)->plaintext); $item['tr6'] = trim($article->find('/tbody/tr[6]/td[2]', 0)->plaintext); $item['tr7'] = trim($article->find('/tbody/tr[7]/td[2]', 0)->plaintext); $item['tr8'] = trim($article->find('/tbody/tr[8]/td[2]', 0)->plaintext); $item['tr9'] = trim($article->find('/tbody/tr[9]/td[2]', 0)->plaintext); $item['tr10'] = trim($article->find('/tbody/tr[10]/td[2]', 0)->plaintext); $item['tr11'] = trim($article->find('/tbody/tr[11]/td[2]', 0)->plaintext); $item['tr12'] = trim($article->find('/tbody/tr[12]/td/div/]', 0)->plaintext); $ret[] = $item; } // clean memory $html->clear(); unset($html); return $ret;} } $output = fopen("php://output", "w"); foreach ($urls $url) { $ret = scraping($url); foreach($ret $v){ fputcsv($output, $v);} } fclose($output); exit();
second file
<?php function get_contents($url) { // use file_get_contents using curl makes more future-proof (setting timeout example) $ch = curl_init($url); curl_setopt_array($ch, array(curlopt_returntransfer => true,)); $content = curl_exec($ch); curl_close($ch); return $content; } function scrape_main_page() { set_time_limit(300); libxml_use_internal_errors(true); // prevent domdocument spraying errors onto page , hide errors internally ;) $html = get_contents("http://lmvz.anofm.ro:8080/lmv/index2.jsp?judet=26"); $dom = new domdocument(); $dom->loadhtml($html); die(var_dump($html)); $xpath = new domxpath($dom); $results = $xpath->query("//table[@width=\"645\"]/tr"); $all = array(); //var_dump($results); for($i = 1; $i < $results->length; $i++) { $tr = $results->item($i); $id = $tr->childnodes->item(0)->textcontent; $requesturl = "http://lmvz.anofm.ro:8080/lmv/detalii.jsp?uniquejvid=" . urlencode($id) . "&judet=26"; $details = scrape_detail_page($requesturl); $newobj = new stdclass(); $newobj = $id; $all[] = $newobj; } foreach($all $xtr) { $urls[] = "http://lmvz.anofm.ro:8080/lmv/detalii.jsp?uniquejvid=" . $xtr . "&judet=26"; } return $urls; } scrape_main_page();
yeah, problem here php.ini configuration. make sure server supports curl , fopen. if not start own linux server.
Comments
Post a Comment