InterviewSolution
| 1. |
Solve : Farming Webpages .... any suggestions for data acquisition? |
|
Answer» I have been farming web sites for information through automated macros which dump the copy/pasted data into a database.
And here's an example: Code: [Select]<?php /**************************************** *BEGIN:ConfigurecURL* ****************************************/ $ch=curl_init(); curl_setopt($ch,CURLOPT_POST,1); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,false); curl_setopt($ch,CURLOPT_COOKIEJAR,dirname(__FILE__).'/cookie.txt'); curl_setopt($ch,CURLOPT_HEADER,1); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); /**************************************** *END:ConfigurecURL* ****************************************/ $post='username=username&password=password'; curl_setopt($ch,CURLOPT_URL,'http://www.example.com/login'); curl_setopt($ch,CURLOPT_POSTFIELDS,$post); if($page=curl_exec($ch)) { //Thatwasthelogin;nowtoretrievethepages: $regexp='|insertregexpherewith(bracketsaroundtextwewanttosave)|isU'; //Pagetoparse $url="http://www.example.com/start"; //Loadthepage curl_setopt($ch,CURLOPT_URL,$url); $page=curl_exec($ch); //Findthedesiredtext if(preg_match_all($regexp,$page,$result)) { //dosomethingwiththematches } //Savethepage if(file_put_contents("/somewhere/file.txt",$page)) { echo"succeeded<br/>"; }else{ echo"FAILED<br/>"; } }else{ echo"Loginfailed"; } curl_close($ch); ?>Thanks Rob! I am going to try this outCool. The hardest part is getting the regular expression right. Let me know if you need any more help. Out of interest, this technique is usually called "web scraping" or sometimes (inaccurately) "screen scraping". |
|