• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Check next class after "linkcell" class when scraping data

YasserKhalil

Well-Known Member
Hello everyone
I am trying to learn about scraping data and I did half of my target .. There is a list of functions and its description
Depending on class name "linkcell" I could grab functions list properly
but as for description I am confused because the class name related sometimes is "definecell" and the last one in each table "functab" is "definecelllast"
Here's the code
Code:
Sub Test()
    Dim element        As IHTMLElement
    Dim elements        As IHTMLElementCollection
    Dim ie              As InternetExplorer
    Dim html            As HTMLDocument
    Dim counter        As Long
    Dim erow            As Long

    Set ie = New InternetExplorer
    ie.Visible = True

    ie.navigate "http://www.excelfunctions.net/vba-functions.html"

    Do While ie.readyState <> READYSTATE_COMPLETE
        Application.StatusBar = "Loading Web page …"
        DoEvents
    Loop

  Set html = ie.document
    Set elements = html.getElementsByClassName("linkcell")
    counter = 0

    For Each element In elements
            erow = Sheet1.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
            Cells(erow, 1) = html.getElementsByClassName("linkcell")(counter).innerText
           
            'Need help at this point
            'Cells(erow, 2) = html.getElementsByTagName("href")(counter).innerText
           
            counter = counter + 1
    Next element

    Application.StatusBar = ""
    ie.Quit
   
    MsgBox "Done...", 64
End Sub

Thanks advanced for help
 
Hi !

It may be better to loop Table objects in order to read their elements …

Working with class names could be a mess as it can fail on some computers.
 
Thanks a lot Mr. Marc
I missed your replies since a while
In fact I am novice at scarping .. Can you give me a model code of how to loop through the table objects>> ?
Another point I think there are many tables so I would first loop through each table ... How can I do that?
 
As you can see in many threads of this forum there is no model code
'cause each scrapping is specific to the source webpage …

Loop on getElementsByTagName("TABLE") and foresee a table object structure (within VBE Locals window) as all is here :
observe & respect object model …
You will see in the table model some Rows objects
and within Rows some Cells objects.
 
Hi ,

Try this :
Code:
Sub Test()
    Dim element        As IHTMLElement
    Dim elements        As IHTMLElementCollection
    Dim ie              As InternetExplorer
    Dim html            As HTMLDocument
    Dim counter        As Long
    Dim erow            As Long

    Set ie = New InternetExplorer
    ie.Visible = True

    ie.navigate "http://www.excelfunctions.net/vba-functions.html"

    Do While ie.readyState <> READYSTATE_COMPLETE
        Application.StatusBar = "Loading Web page …"
        DoEvents
    Loop
    Application.StatusBar = ""

    Set html = ie.document
    Set elements = html.getElementsByClassName("linkcell")
    counter = 0

    For Each element In elements
        erow = Sheet1.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
        With element
            Cells(erow, 1) = .innerText
            Cells(erow, 2) = .NextSibling.innerText
        End With
        counter = counter + 1
    Next element

    ie.Quit
    Sheet1.Columns(1).Resize(, 2).AutoFit
    MsgBox "Done...", 64
End Sub
Narayan
 
You're awesome and wonderful ..
The clue was in that magic part (NextSibling.innerText) ..
Thanks a lot for great help

Can you help me with looping through exisiting tables as I need to learn that too?

Best Regards
 
First demonstration :​
Code:
Sub DemoIE1()
    Dim oTable As HTMLTable, oRow As HTMLTableRow, C%, R&
    Application.StatusBar = "      Downloading …"
    Sheet1.UsedRange.Clear
With New InternetExplorer
        .Navigate "http://www.excelfunctions.net/vba-functions.html"
'       .Visible = True
   While .Busy Or .ReadyState < 4:  DoEvents:  Wend
    For Each oTable In .Document.getElementsByClassName("functab")
        For Each oRow In oTable.Rows
                    R = R + 1
            With oRow.Cells
                For C = 1 To .Length
                    Sheet1.Cells(R, C).Value = .Item(C - 1).innerText
                Next
                If .Length = 1 Then
                    Sheet1.Cells(R, 1).Font.Bold = True
                Else
                    Sheet1.Cells(R, 1).IndentLevel = 3
                    Sheet1.Hyperlinks.Add Sheet1.Cells(R, 1), .Item(0).Children(0).href
                End If
            End With
        Next
    Next
        .Quit
End With
    Sheet1.UsedRange.Columns.AutoFit
    Application.StatusBar = ""
    MsgBox "Done …", vbInformation, "    Demo IE #1"
End Sub
 
An easy IE way no needing to add any reference (late binding) :​
Code:
Sub DemoIE2()
        Dim oDoc As Object, oTable As Object, R&
        Set oDoc = CreateObject("htmlfile")
        Application.StatusBar = "      Downloading …"
        Sheet1.UsedRange.Clear
        R = 1
With CreateObject("InternetExplorer.Application")
        .Navigate "http://www.excelfunctions.net/vba-functions.html"
'       .Visible = True
   While .Busy Or .ReadyState < 4:  DoEvents:  Wend
    For Each oTable In .Document.getElementsByClassName("functab")
        If oDoc.frames.clipboardData.setData("Text", oTable.outerHTML) Then
            Sheet1.Paste Sheet1.Cells(R, 1)
            R = R + oTable.Rows.Length
        End If
    Next
        .Quit
End With
            oDoc.frames.clipboardData.clearData "Text"
        Set oDoc = Nothing
        Sheet1.UsedRange.Columns.AutoFit
        Application.StatusBar = ""
        MsgBox "Done …", vbInformation, "    Demo IE #2"
End Sub
 
That's great and awesome Mr MarcL
Thank you very very much for these magic solutions ..
I will study these codes further to learn from them so please let me ask for anything that is unclear for me
Best Regards
 
I tried to get this part but couldn't get it ..
Code:
.Item(0).Children(0).href

And as for Demo2
Code:
If oDoc.frames.clipboardData.setData("Text", oTable.outerHTML) Then
 
I tried to get this part but couldn't get it ..
Code:
.Item(0).Children(0).href
Item(0) : first item
Children(0) : first child
href : link property …
… as you can see in VBE Locals window …

And as for Demo2
Code:
If oDoc.frames.clipboardData.setData("Text", oTable.outerHTML) Then
Copy table data to the clipboard, see IE documentation on MSDN …
 
Last but not least, the faster request way (no reference to add) :​
Code:
Sub DemoReq()
    Dim oTable As Object, R&, T$
    Application.StatusBar = "      Downloading …"
    Sheet1.UsedRange.Clear
With CreateObject("WinHttp.WinHttpRequest.5.1")
    .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
    .setRequestHeader "DNT", "1"
     On Error Resume Next
    .send
     If Err.Number Then Beep: Exit Sub
     On Error GoTo 0
 If .Status = 200 Then T = .responseText Else Beep: Exit Sub
End With
     R = 1
With CreateObject("htmlfile")
        .body.innerHTML = T
    For Each oTable In .getElementsByTagName("TABLE")
'        If oTable.className = "functab" Then
             If .frames.clipboardData.setData("Text", oTable.outerHTML) Then
                Sheet1.Paste Sheet1.Cells(R, 1)
                R = R + oTable.Rows.Length
            End If
'        End If
    Next
        .frames.clipboardData.clearData "Text"
End With
    Sheet1.UsedRange.Columns.AutoFit
    Application.StatusBar = ""
    MsgBox "Done …", vbInformation, " Demo Request"
End Sub
 
You're a legend Mr. Marc
That's really awesome and fascinating demos

What is this line
Code:
.setRequestHeader "DNT", "1"
What "DNT" is it a header? Where did you get it? (By inspecting elements ..?!) and what 1 ?
 
Another way to get the required data from that specific webpage with adding nothing to the reference library:
Code:
Sub Tab_Data()
    Dim hdoc as Object, tbl As Object, tRow As Object, tCel As Object
  
    With CreateObject("MSxml2.xmlhttp")
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .send
        Set hdoc = CreateObject("htmlfile")
        hdoc.body.innerHTML = .responseText
    End With
  
    For Each tbl In hdoc.getElementsByTagName("table")
        For Each tRow In tbl.getElementsByTagName("tr")
            For Each tCel In tRow.getElementsByTagName("td")
                y = y + 1
                Cells(x, y) = tCel.innerText
            Next tCel
            y = 0
            x = x + 1
        Next tRow
    Next tbl
End Sub
 
Last edited:
Finally, if you wanna stick to the class then this one works fine as well:
Code:
Sub Vba_tabledata()

    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim htable As Object, data As Object

    With http
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .send
        html.body.innerHTML = .responseText
    End With

    Set htable = html.getElementsByClassName("linkcell")
    For Each data In htable
        x = x + 1
        Cells(x, 1) = data.innerText
        Cells(x, 2) = data.NextSibling.innerText
    Next data

End Sub

Btw, add the following reference to the library:
Microsoft XML and Microsoft HTML object Library
 
What is this line
Code:
.setRequestHeader "DNT", "1"
What "DNT" is it a header? Where did you get it? (By inspecting elements ..?!) and what 1 ?
DNT means Do Not Track and "1" is like switching the light on …
I got it by inspecting requests under Mozilla Firefox …
This header is not mandatory, at the convenience of the coder.
From RFC blank book

As you can see, there is no model to scrap data,
as it depends on the webpage object model
and the coder feeling how to reach them …
 
Thanks to Shahin and its last code, getElementsByClassName
fails on my side in late binding on "functab" but in early binding
- with reference Microsoft HTML Object Library added -
no more issue pointing this functab class :​
Code:
Sub DemoReqRevA()
    Dim oTable As HTMLTable, R&, T$
    Application.StatusBar = "      Downloading …"
    Sheet1.UsedRange.Clear
With CreateObject("WinHttp.WinHttpRequest.5.1")
    .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
    .setRequestHeader "DNT", "1"
     On Error Resume Next
    .send
     If Err.Number Then Beep: Exit Sub
     On Error GoTo 0
 If .Status = 200 Then T = .responseText Else Beep: Exit Sub
End With
     R = 1
With New HTMLDocument
        .body.innerHTML = T
    For Each oTable In .getElementsByClassName("functab")
        If .frames.clipboardData.setData("Text", oTable.outerHTML) Then
            Sheet1.Paste Sheet1.Cells(R, 1)
            R = R + oTable.Rows.Length
        End If
    Next
        .frames.clipboardData.clearData "Text"
End With
    Sheet1.UsedRange.Columns.AutoFit
    Application.StatusBar = ""
    MsgBox "Done …", vbInformation, " Demo Request"
End Sub

Shahin - again - do not forget to free object variables …
 
@Marc L,
Whatever thread it is, I find it very thrilling to get in contact with you. Something I would like to ask. The other day you taught me how to write code without using "On Error Resume Next" so that in every situation the code will remain invulnerable. However, you might have noticed in my second post here that my written code is very much fragile and will break if it fails to find class name anywhere in the table. I could have handled it if two class elements are there by using "with statement" in the later expression in combination with "length" and "item" property. I can't do it here because of single class element. Could you please take a look how can it be if i want to use "with" statement along with "length" and "item" property in my below code. I tried but this is not the way you know:
Code:
Sub Tabledata()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim htable As Object, data As Object

    With http
        .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each data In html.getElementsByClassName("linkcell")
        With data
            If .Length Then x = x + 1: Cells(x, 1) = .item(0).innerText
            If .Length Then Cells(x, 2) = .item(0).NextSibling.innerText
        End With
    Next data
End Sub
 
No matter using a For Each loop :
without any element, the loop directly ends …
[Just try next code with "linkcellS" for example.]

So in this case it's just a misobservation of the object model !
As data variable is not a collection but just an element
so without a length property …

For Each element In collection : length property on collection,
not under element as you can foresee in VBE Locals window …

Your code amended :​
Code:
Sub Tabledata()
    Dim oDoc As New HTMLDocument, oCell As HTMLTableCell, R&
    ActiveSheet.UsedRange.Clear
With New XMLHTTP60
    .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
    .send
     oDoc.body.innerHTML = .responseText
End With
For Each oCell In oDoc.getElementsByClassName("linkcell")
    R = R + 1
    Cells(R, 1).Resize(, 2).Value = Array(oCell.innerText, oCell.nextSibling.innerText)
Next
    Set oDoc = Nothing
End Sub
 
This thread will be definitely great reference for scraping web data .. Thanks a lot for both of you
Best Regards
 
You're welcome Yasser !

Same result but pointing and checking the collection :​
Code:
Sub TableData2()
    Dim oDoc As New HTMLDocument, N&, R&
    ActiveSheet.UsedRange.Clear
With New XMLHTTP60
    .Open "GET", "http://www.excelfunctions.net/vba-functions.html", False
    .send
     oDoc.body.innerHTML = .responseText
End With
With oDoc.getElementsByClassName("linkcell")
'   If .Length Then
         For N = 0 To .Length - 1
            R = R + 1
            Cells(R, 1).Resize(, 2).Value = Array(.Item(N).innerText, .Item(N).nextSibling.innerText)
        Next
'   End If
End With
    Set oDoc = Nothing
End Sub
Try with an non existing class like"linkcellS", no issue …

Obviously you must notice the convenience of For Each.
 
Hats off to you, Marc L. You are just awesome!!! Sorry, YasserKhalil for creating a sub-thread within your thread. Basically, it is hard for me to control myself from erecting any question specially when I get in contact with Marc L.
 
Back
Top