1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't go on to the next page

Discussion in 'VBA Macros' started by shahin, Mar 17, 2017.

  1. shahin

    shahin Member

    Messages:
    119
    Hi there!! Hope you all are well. I got stuck at some point creating a parser. I have already made it and it is running smoothly. However, the problem I'm facing is that I can't find any element triggering to the next page except for clicking the button. As I'm using xmlhttp method, I can't avail that click button option. Any help would be a great relief for me. Thanks in advance. Here is the code:

    Code (vb):

    Sub ExtractingEmail()
    Const url = "http://www.sportfocus.com/comdir/keyword.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim Posts As Object, Post As Object
    Dim x As Long, i As Long, Z As Long

    x = 2

    http.Open "GET", url, False
    http.send
    html.body.innerHTML = http.responseText

    Set Posts = html.getElementsByClassName("dirListingPlus")
        For Each Post In Posts
            On Error Resume Next
            Cells(x, 1) = Post.getElementsByTagName("a")(0).innerText
            Cells(x, 2) = Split(Post.getElementsByTagName("a")(1).href, ":")(1)
            x = x + 1
        Next Post
    End Sub
     
  2. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    Hi !

    So it seems it's a bad request or you forgot some argument.
    Just observe via a webbrowser inner inspector tool
    which request is launched when you click on button and its arguments …
  3. shahin

    shahin Member

    Messages:
    119
    Ist page: "http://www.sportfocus.com/comdir/keyword.cfm?cid=11&maj=cricket&ckid=102&min=clubs and Associations&kid=860&key=England"

    2nd Page: "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs and Associations&kid=860&key=England"

    3rd Page: "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs and Associations&kid=860&key=England"

    4th Page: "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs and Associations&kid=860&key=England"

    Only change noticed between first and second url is= "keyword" to "morekeywords"

    2, 3 and 4 page links are identical.

    Most importantly when you copy the 2, 3 or 4 page links and try to go to the page, it says file or directory not found.


    Html element for the next button:

    Code (vb):

    <div align="center">
                <input type="submit" value="Next 20 Results ->" class="buttonlg">
            </div>
     
  4. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    If there is only same URL between page I just may say :
    so it is not possible !

    But 'cause that works within webbrowser, I'm very sure
    you forgot to inspect parameters which should point to
    first item of the page and maybe how many items within page …
    shahin likes this.
  5. shahin

    shahin Member

    Messages:
    119
    Dear Marc L, right you are. It is POST method and there are three parameters hidden in the form element. I've tried to rectify my code but it is now parsing only three records which is far less than I was having before. Something I'm doing wrong. The postdata parameter should be filled in the way so that I can parse the whole data cause you know there are two things that might help you give me a hint about what i should do now. 1. rowlimit 2. count.



    Code (vb):

    Sub ExtractingEmail()
    Dim http As Object: Set http = CreateObject("MSXML2.serverXMLHTTP")
    Dim html As New HTMLDocument
    Dim Posts As Object, Post As Object, postdata As String
    Dim x As Long, i As Long, Z As Long

    x = 2

    postdata = "rowlimit=20&count=21"
    http.Open "Post", "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England", False
    http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    http.send postdata

    html.body.innerHTML = http.responseText
    Set Posts = html.getElementsByClassName("dirListingPlus")
        For Each Post In Posts
            Cells(x, 1) = Post.getElementsByTagName("a")(0).innerText
            Cells(x, 2) = Split(Post.getElementsByTagName("a")(1).href, ":")(1)
            x = x + 1
        Next Post
    End Sub
     

    The form data elements are:

    Code (vb):

    <form action="morekeywords.cfm?cid=11&amp;maj=cricket&amp;ckid=102&amp;min=clubs%20and%20Associations&amp;kid=860&amp;key=England" method="post">
            <input type="hidden" name="type" value="Name">
            <input type="hidden" name="rowlimit" value="20">
            <input type="hidden" name="count" value="21">
            <div align="center">
                <input type="submit" value="Next 20 Results ->" class="buttonlg">
            </div>
        </form>
     

    Thanks in advance.
  6. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    So it seems parameters are rowlimit for the items number in page
    and count for the page first item …

    You can use these parameters within request send command
    like in this thread.

    Some webpages may need specific request headers to well work …
  7. shahin

    shahin Member

    Messages:
    119
    Dear Marc L, I've modified my code a little bit but still no luck. This time it is fetching data (20) of the first page over and over again. Here is the code:

    Code (vb):

    Sub ExtractingEmail()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New HTMLDocument
    Dim Posts As Object, Post As Object, postdata As String
    Dim x As Long, y As Long

    x = 2
    For y = 1 To 21
        postdata = "type=Name&rowlimit=729&count=" & y
        http.Open "Post", "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England", False
        http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        http.send postdata
     
        html.body.innerHTML = http.responseText
        Set Posts = html.getElementsByClassName("dirListingPlus")
        For Each Post In Posts
            On Error Resume Next
            Cells(x, 1) = Post.getElementsByTagName("a")(0).innerText
            Cells(x, 2) = Split(Post.getElementsByTagName("a")(1).href, ":")(1)
            x = x + 1
        Next Post
    Next y
    End Sub
     
  8. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    As first 20 items page starts from 1
    second page starts from 21, third from 41, …

    (Logic at child level)
  9. shahin

    shahin Member

    Messages:
    119
    Very unfortunate that I can't fix it. :confused:
  10. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    Too many not understandable things within your code !

    • Why a loop as you read all item ?!
    • When you use a flag for items, check at least
    if it is common to all needed items !

    You can fix it just by reading source code page #2,
    inspect for example last item then
    check if needed data are in responseText.
    If you don't see anything so you have to pilot IE.

    But here - as always - it is just a purpose of well observation,
    if you can't follow Document Object Model of this webpage,
    do not ever use htmlfile object but just extract data
    from responseText using VBA basics text function
    and like the Logic is at very child level, it's not difficult …
    shahin likes this.
  11. shahin

    shahin Member

    Messages:
    119
    Thanks dear Marc L, you are always right. I should have studied the page source attentively. There is always a clue to work with as I got here. You hinted me in the first place, though! This is the first time I read the source code of 2nd page, third page etc. Thanks again. Here is the refined one:

    Code (vb):

    Sub ExtractingEmail()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New HTMLDocument
    Dim Posts As Object, post As Object, postdata As String
    Dim x As Long, y As Long

    x = 2
    For y = 0 To 5
        postdata = "type=Name&rowlimit=20&count=" & (y * 20) + 1
        http.Open "Post", "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England", False
        http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        http.send postdata
        html.body.innerHTML = http.responseText
        Set Posts = html.getElementsByClassName("dirListingPlus")
        For Each post In Posts
            On Error Resume Next
            Cells(x, 1) = post.getElementsByTagName("a")(0).innerText
            Cells(x, 2) = Split(post.getElementsByTagName("a")(1).href, ":")(1)
            x = x + 1
        Next post
    Next y
    End Sub
     
  12. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    The loop is useless as you can read all data at once,
    like in your previous code but without any loop …

    Is your code working as expected ? Which data you need to extract ?
  13. shahin

    shahin Member

    Messages:
    119
    Right you are. No, the code is not working as i expected. I need to parse the name of 4 pages from 1 to 4. Thanks again for your response.
  14. shahin

    shahin Member

    Messages:
    119
    From 2nd page, class name is blank. Elements for class is as below:

    Code (vb):

    <dl class="">
            <dt><a href="cditem.cfm?NID=10995">Abberton &amp; District Cricket Club</a></dt>
            <dd>
                COLCHESTER<br>  &nbsp;
               
            </dd>
           
        </dl>
     
  15. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    As I yet wrote …
    So remove y loop - very useless except to waste time ! - and
    update send parameters for the items number to read
    and from the first item !

    But above all finally apply correct logic as your actual code is looking
    for specific items only and not all those you need ‼ As I yet wrote :
    So when at least you achieve to check that,
    you must finally understand dirListingPlus class name is a no sense ‼
    As you could be warned without using On Error Resume Next (a bad use !) …

    Edit : you finally checked it and found class name is not a good flag !

    As using a class name could be a mess (often !), better is to watch
    within responseText if a flag may be an ID, a name or a tag name …

    And all your need is doable without using htmlfile, just an easy parsing text …

    As a training, follow this thread and try on your side
    before cheating with my codes (from post #10) …
  16. shahin

    shahin Member

    Messages:
    119
    Found a solution but this is tedious. I can't run the loop but if i input the value of "count" then it scrapes successfully. Basically, I was not getting records of 2nd page, third page and so on cause class elements for the first page and the rest of the pages are different. From second page the class is with no value. So I tried responsetext method to make it right.
    Code (vb):

    Sub ExtractingEmail()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New HTMLDocument
    Dim postdata As String, str As Variant, y As Long, i As Long
    Dim x As Long

    x = 2
    postdata = "type=Name&rowlimit=20&count=1"
    http.Open "Post", "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England", False
    http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    http.send postdata

    html.body.innerHTML = http.responseText
    str = Split(http.responseText, "<dt><a")
    y = UBound(str)
        For i = 1 To y
            On Error Resume Next
            Cells(x, 1) = Split(Split(Split(str(i), " href=""")(1), """>")(1), "<")(0)
            Cells(x, 2) = Split(Split(str(i), " href=""mailto:")(1), """")(0)
            x = x + 1
        Next i
    End Sub
     
  17. shahin

    shahin Member

    Messages:
    119
    Ha ha ha!!!! Well said, Cheating with your code.:DD Yeah that's right. I tried few times with your code to serve my purpose. Anyways, now I know how to use split function. Btw, do have a look If you are not bothered already, to give me a clue how i can run the loop now.
  18. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    Your parameter is for reading only 20 items but you wrote « for 4 pages » :
    so the question is why didn't you mod the items number
    according to your need ?!
    If a page contains 20 items, how many items for 4 pages ?
    My child just solves this arithmetic !

    (As reminders …)
    shahin likes this.
  19. shahin

    shahin Member

    Messages:
    119
    Dear Marc L, Thanks for the answer. Before modifying my code I tried manipulating with rowlimits but after writing responsetext method I didn't notice that mistake. Now, It is working like magic. However, The issue is clearly solved.
  20. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    Last but not least : I wanna see your working code !
  21. shahin

    shahin Member

    Messages:
    119
    Dear Marc L, here it is. The count parameter is just useless. No looping is required. The total data is 729 and it takes 5 seconds or less. Thanks again.
    Code (vb):

    Sub ExtractingEmail()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New HTMLDocument
    Dim postdata As String, str As Variant, y As Long, i As Long
    Dim x As Long

    x = 2
    postdata = "type=Name&rowlimit=729&count=1"
    http.Open "Post", "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs%20and%20Associations&kid=860&key=England", False
    http.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    http.send postdata

    html.body.innerHTML = http.responseText
    str = Split(http.responseText, "<dt><a")
    y = UBound(str)
        For i = 1 To y
            On Error Resume Next
            Cells(x, 1) = Split(Split(Split(str(i), " href=""")(1), """>")(1), "<")(0)
            Cells(x, 2) = Split(Split(str(i), " href=""mailto:")(1), """")(0)
            x = x + 1
        Next i
    End Sub
     
  22. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    Line #3 : Dim html As New HTMLDocument
    Useless as you do not use this object to parse data !
    Waste of memory as unused and you forgot to release objects
    from memory at procedure ending (see Nothing in VBA help).

    And using With statement object variable could be useless too !
    (To see in next demonstration …)
    Line #13 : html.body.innerHTML = http.responseText
    Same as previous !
    Line #4 : Dim postdata As String, str As Variant, y As Long, i As Long
    str variable is used to load Split result,
    can be declared str() as String 'cause all you grab is only text …
    Line #8 : postdata = "type=Name&rowlimit=729&count=1"
    • A variable to only use it once may be useless,
    text can directly be in send codeline …
    Except for long constant like an URL for example …
    • What occurs if a new item is created ? You will missed it !
    Line #14 : str = Split(http.responseText, "<dt><a")
    Well done for the flag !
    Line #15 : y = UBound(str)
    Like previous postdata variable, useless to just use it once !
    UBound(str) can directly be in next codeline …
    Line #10 : without this header the request fails !

    Line #17 : On Error Resume Next
    To avoid within a loop, you could place it just before …

    But can be removed from code !
    Just by checking items number within a Split result variable,
    to see in next demonstration …
  23. Marc L

    Marc L Excel Ninja

    Messages:
    2,713
    As reminders before writing any codeline,
    well observe how webpage works,
    if any request is used when clicking a button,
    if the request has parameters,
    if the request needs specific headers,
    checking if data are in request response, …​

    Respecting this will save your time !

    Demonstration to read all items without On Error statement
    using classic historic via romana way :​
    Code (vb):
    Sub Demo1()
             Const URL = "http://www.sportfocus.com/comdir/morekeywords.cfm?cid=11&maj=cricket&ckid=102&min=clubs and Associations&kid=860&key=England"
        With New XMLHTTP60
            .Open "POST", URL, False
            .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
            .setRequestHeader "DNT", "1"
            .send "type=name&rowlimit=999&count=1"
             SPQ = Split(.responseText, "<dt><a href=")
        End With
        If UBound(SPQ) > 0 Then
           ReDim VA$(1 To UBound(SPQ), 1)
            For R& = 1 To UBound(SPQ)
                                           SP = Split(SPQ(R), "<a href=""mailto:")
                                     VA(R, 0) = Split(Split(SP(0), ">")(1), "<")(0)
              If UBound(SP) > 0 Then VA(R, 1) = Split(SP(1), """")(0)
            Next
                ActiveSheet.UsedRange.Offset(1).Clear
                [A2].Resize(UBound(VA), 2).Value = VA
        End If
    End Sub
    Note some webpage request may not accept 999 parameter value
    as above the real items number or superior to a max limit …
    shahin likes this.
  24. shahin

    shahin Member

    Messages:
    119
    OMG!!! It's really an explicit explanation. I will follow it from now on. The thing is that I'm not from an IT background and what I write I learned from different tutorials on the web. So, my basic is not that strong in such technical things. However, better late than never and I'm learning while mistaking. That's it.
    Btw, should I set http=nothing in the end?
  25. shahin

    shahin Member

    Messages:
    119
    It's an OSCAR winning coding. I just love the way you write. Thanks a zillion.

Share This Page