1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Trouble Parsing Image and Price

Discussion in 'VBA Macros' started by shahin, Mar 5, 2017.

  1. shahin

    shahin Member

    Messages:
    119
    Hi there! I've made a parser which is pulling 3 categories excluding Image and price out of five categories. I could not find any "img" tag there. Moreover the page contains lots of data but only 25 are being scraped. The page is loading slowly to display the full content though! Here is the code I've written. Any help would be a great help to me. Thanks in advance.

    Code (vb):

    Sub VariousWines()
    Const URL = "https://www.vivino.com/explore?e=eJzLLbI1UcvNzLM1NFDLTaywNTFQS660LS1WK7ZNTlQrS7YtKSpNVSsviY61NYRQRhDKGEKZQChzqJwJABUoGa4%3D"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, posts As Object, topic As HTMLHtmlElement, post As HTMLHtmlElement
    Dim i As Long, x As Long

    x = 2

    http.Open "GET", URL, False
    http.send
    html.body.innerHTML = http.responseText

    Set topics = html.getElementsByClassName("card card-lg")

        For i = 0 To topics.Length - 1
        Set topic = topics(i)
            Cells(x, 1).Value = topic.getElementsByClassName("bold")(0).innerText
            Cells(x, 2).Value = topic.getElementsByClassName("text-block wine-card__region")(0).innerText
            Cells(x, 3).Value = topic.getElementsByClassName("text-inline-block light average__number")(0).innerText
            Cells(x, 4).Value = topic.getElementsByClassName("wine-price-value")(0).innerText
            On Error Resume Next
            Cells(x, 5).Value = topic.getElementsByClassName("wine-card__image")(0).getElementsByTagName("img")(0).src
            x = x + 1
        Next i
     
    End Sub
     
    Last edited: Mar 5, 2017
  2. shahin

    shahin Member

    Messages:
    119
    No "img", "src" I could find in the element. FYC, html elements for the Image portion is given below:

    Code (vb):

    <div class="wine-card__image-wrapper">
    <a href="/wineries/tschida/wines/angerhof-eiswein-gruner-veltliner-2012">
    <figure class="wine-card__image" style="background-image: url(//images.vivino.com/thumbs/qlER3oggQVKh1FZn7YGxZg_375x500.jpg)">
    <div class="image-inner"></div>
    </figure>
    </a>
    </div>
     
  3. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    Hi !

    So just mod your code to use the correct class name …
  4. shahin

    shahin Member

    Messages:
    119
    Hi Marc L, nice to find you in the loop. If I could understand you properly then you meant this. It is not working either.

    Code (vb):

    Set topics = html.getElementsByClassName("card card-lg")

        For i = 0 To topics.Length - 1
        Set topic = topics(i)
            Cells(x, 1).Value = topic.getElementsByClassName("wine-price-value")(0).innerText
            Cells(x, 2).Value = topic.getElementsByClassName("wine-card__image-wrapper")(0).getElementsByTagName("img")(0).src
            x = x + 1
        Next i
     
  5. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    So like it does not work on your side, you must change your strategy !

    As per example search for a tag name …
  6. shahin

    shahin Member

    Messages:
    119
    Thanks Marc L for your kind reply. In case of image I have clearly messed up with my code but what about "The Price" portion? Have I not rightly written it?

    Code (vb):

    Set topics = html.getElementsByClassName("wine-price-value")
    For Each topic In topics
        Cells(x, 1).Value = topic.innerText
        x = x + 1
    Next topic
     
  7. shahin

    shahin Member

    Messages:
    119
    FYC, Here is the html elements for the price:
    Code (vb):

    <div class="text-inline-block header-large light wine-price average__number">
    <span class="wine-price-prefix">$</span>
    <span class="wine-price-value">29.98</span>
    <span class="wine-price-suffix"></span>
    </div>
     
  8. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,888
    For image... since URL is within style element in Class, easiest way is to get innerHTML of the wrapper and do string manipulation.
    Ex. To extract entire string in Style="xxx"
    Code (vb):
    Cells(x, 5).Value = Split(Split(topic.getElementsByClassName("wine-card__image-wrapper")(0).innerHTML, "style=""")(1), """>")
    As for price. It's likely filled by script. When page is loaded in IE, values are updated via script. However, responseText will not fire script to fill those values. Thus they are empty.

    Quick and dirty workaround is to load responseText to IE and read values from there.
  9. Marc L

    Marc L Excel Ninja

    Messages:
    2,713


    As always observing how webpage works before to start any code …​
  10. Chihiro

    Chihiro Well-Known Member

    Messages:
    2,888
    Yep. That's usually key to scraping web sites. Not so much VBA skill.

    @shahin
    You can see in image how price info is brought in.
    upload_2017-3-6_11-27-42.png
    shahin likes this.
  11. shahin

    shahin Member

    Messages:
    119
    Thanks sir Chihiro for the demo. Btw, I have got the basic portion from some NATHAN and edited the code for working nicely to parse image. Here it is:

    Code (vb):

    Set topics = html.getElementsByClassName("wine-card__image-wrapper")
    For Each topic In topics
     Cells(x, 1) = "http:" & Replace(Replace(topic.Children(0).Children(0).Style.backgroundImage, "url(", ""), ")", "")
     x = x + 1
    Next topic
     
  12. shahin

    shahin Member

    Messages:
    119
    Another chunk of html elements out of which I can't sift through the Img and src to get the Image. Here is the portion:

    Code (vb):


    <div class="swipe" style="visibility: visible;"><div class="swipe-wrap" style="width: 1500px;"><div data-index="0" style="width: 300px; left: 0px; transition-duration: 0ms; transform: translate(0px, 0px);"><img alt=""class="" src="https://images.craigslist.org/00l0l_hW1X6OeifVZ_300x300.jpg"></div><div data-index="1" style="width: 300px; left: -300px; transition-duration: 0ms;
    transform: translate(300px, 0px);"
    ></div><div data-index="2" style="width: 300px; left: -600px; transition-duration: 0ms; transform: translate(300px, 0px);"></div><div data-index="3" style="width: 300px; left: -900px; transition-duration: 0ms; transform: translate(300px, 0px);"></div>'<div data-index="4" style="width: 300px; left: -1200px; transition-duration: 0ms; transform: translate(-300px, 0px);"></div></div></div>

     
  13. shahin

    shahin Member

    Messages:
    119
    Ain't it possible to parse images using REGEX in this case? Code and html elements are pasted below:
    Code:
    Code (vb):

    Set topics = html.getElementsByClassName("result-image gallery")
    For Each topic In topics
         Cells(x,1)= topic.getElementsByTagName("img")(0).src
         x = x +1
    Next topic
     
    Elements:
    Code (vb):

    <a href="/mnh/atq/6033903864.html"class="result-image gallery" data-ids="1:00l0l_auIVAPKuweh"><img alt=""class="" src="https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg"><span class="result-price">$120</span></a>
     
  14. shahin

    shahin Member

    Messages:
    119
    Dear Marc L, you must be right with what you said but the thing is that I am not good at understanding your point oftentimes because of my incapability. However, in this case even if I use the basic code to get images from a page, it fetches only two "gif" files but not a single "jpg". I tried with:

    PageLink: https://newyork.craigslist.org/search/ata

    Code (vb):

    Set topics = html.getElementsByTagName("img")
    For Each topic In topics
        Cells(x, 1) = topic.src
        x = x + 1
    Next topic
     
  15. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    I deleted my previous code thinking I was wrong 'cause of late binding
    for objects but trying now early binding I get same result :
    getElementsByClassName collection is always empty …

    Post a complete code, it's far easier to try to follow you !
    And indicating added references.

    As a reminder : a valid code for a webpage often is not on another one !
    At each new webpage, you must start from blank and study its html code …
    shahin likes this.
  16. shahin

    shahin Member

    Messages:
    119
    @Marc L,
    Hi!! Terribly sorry to say that I was not around my pc that is why I got late to respond. I might have mistaken choosing the right class. Anyways, here is the code :

    Code (vb):

    Sub CraigImg()
    Const URL = "https://newyork.craigslist.org/search/ata"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, topic As Object
    Dim x As Long
    x = 2
    http.Open "GET", URL, False
    http.send
    html.body.innerHTML = http.responseText
    Set topics = html.getElementsByClassName("result-image gallery")
        For Each topic In topics
            Cells(x, 1) = topic.getElementsByTagName("img")(0).src
            x = x + 1
        Next topic
    End Sub
     
  17. Marc L

    Marc L Excel Ninja

    Messages:
    2,713


    Does this code work on your side ?​
  18. shahin

    shahin Member

    Messages:
    119
    Nope. Not at all.
  19. shahin

    shahin Member

    Messages:
    119
  20. Marc L

    Marc L Excel Ninja

    Messages:
    2,713

    So save responseText to a text file and check it over your code
    - or just use InStr function or Like operator -
    it will be easier to understand !

    Once understood, well read Chihiro's post #8 …
  21. shahin

    shahin Member

    Messages:
    119
    Last edited: Mar 20, 2017 at 10:06 AM
  22. Marc L

    Marc L Excel Ninja

    Messages:
    2,713


    So now you can add images as hyperlinks on cells …​
  23. shahin

    shahin Member

    Messages:
    119
    Yep, successfully.

Share This Page