Switch to full style
Codes, tips and tricks,discussions and solutions related to C#
Post a reply

how to screen scrape or grab some parts of a website?

Tue Jun 23, 2009 2:34 am

i want to grab the traffic news at this website:
Code:
http://www.onemotoring.com.sg/publish/o ... _news.html


i was able to screen scrape the whole page however, i only want to grab the traffic news which are in the table, is there any way that i could do that?

codes at my prac1.aspx:
Code:
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Prac1.aspx.cs" Inherits="Prac1" %>

<!
DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<
html xmlns="http://www.w3.org/1999/xhtml">
<
head runat="server">
    <
title>Untitled Page</title>
</
head>
<
body>
    <
form id="form1" runat="server">
    <
div>
    
    
Displaying a web page on your own page using Screen Scraping 
    
<br />
     <
asp:Button ID="btnDisplay" runat="server" onclick="btnDisplay_Click" 
            
Text="Display webpage now" />
        <
br />
        <
br />
        <
asp:Label ID="lblWebpage" runat="server"></asp:Label>
    
    </
div>
    </
form>
</
body>
</
html>
 

codes at my prac1.aspx.cs:
Code:

using System
;
using System.Collections;
using System.Configuration;
using System.Data;
using System.Linq;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Xml.Linq;
using System.Net//namespace for webclient
using System.Text;

public 
partial class Prac1 System.Web.UI.Page
{
    protected 
void Page_Load(object senderEventArgs e)
    {

    }
    protected 
void btnDisplay_Click(object senderEventArgs e)
    {
        
WebClient webClient = new WebClient();
        const 
string strUrl "http://www.onemotoring.com.sg/publish/onemotoring/en/on_the_roads/traffic_news.html";
        
byte[] reqHTML;
        
reqHTML webClient.DownloadData(strUrl);
        
UTF8Encoding objUTF8 = new UTF8Encoding();
        
lblWebpage.Text objUTF8.GetString(reqHTML);
    }


any help??
thank you in advance guys
:))



Re: how to screen scrape or grab some parts of a website?

Sat Jun 27, 2009 8:25 am

i was able to screen scrape the whole page however, i only want to grab the traffic news which are in the table, is there any way that i could do that?

Re: how to screen scrape or grab some parts of a website?

Tue Jun 30, 2009 8:41 am

can you get at it using the DOM?

or maybe you will ned to use PREG on the relevant section.

I have done similar in cURL using pattern matching to grab the code I wanted.

Re: how to screen scrape or grab some parts of a website?

Sun Jul 19, 2009 6:15 pm

Download biterscripting from
Code:
http://www.biterscripting.com
. Start biterscripting. Enter the following command.

The entire code below is just one command. Enter the whole command on one line.

Code:
script SS_WebPageToCSV.txt page("http://www.onemotoring.com.sg/publish/onemotoring/en/on_the_roads/traffic_news.html") number(11)


Try it now. This particular script seems to have been written just for you :-) It is open source. I did not write it, but I have been using it and other biter scripts.

Hope this helps. I am assuming you are getting this data only for your personal use and not to republish.

Randi

Post a reply
  Related Posts  to : how to screen scrape or grab some parts of a website?
 link to parts in the same page     -  
 Splash Screen     -  
 Screen Capture and multicast     -  
 My Header is Not Fitting to the Screen!     -  
 full Screen Graphics     -  
 Full Screen graphics (Lesson 2).     -  
 MacBook Pro Turned Into a Blue Screen     -  
 Website designer     -  
 need code for my website     -  
 Make A Website In Flash 2     -