Can't parse phone number from a web page interrupted by <br> tag

shahin · Apr 22, 2017

Hi there hope you all are doing good. After writing a script to parse documents from a web page I could see that it only grabs the first element of each category and can't reach out for the elements partitioned by <br> tag within the same category. Phone number, fax etc are in first category within "p" tag but can't get it anyway. Here is what I tried with. Hope somebody will show me any way to succeed. By the way, i tried with two type of calling and i attached both.

Code:

Sub RestData()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim ele As Object, post As Object

With CreateObject("MSXML2.serverXMLHTTP")
    .Open "GET", "http://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736", False
    .send
    html.body.innerHTML = .responseText
End With
Set ele = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")
    For Each post In ele
        x = x + 1
        Cells(x, 1) = post.innerText
    Next post
   
'    Set ele = html.getElementsByClassName("contact-details block dark")
'    For Each post In ele
'    Set docs = post.getElementsByTagName("p")(0)
'        x = x + 1
'        Cells(x, 1) = docs.innerText
'    Next post
Set html = Nothing: Set ele = Nothing: Set docs = Nothing
End Sub

shahin · Apr 22, 2017

After multiple-try I got a workaround partially. Now I can parse four categories but In case of "address" and "web" I failed to continue. Hope there is a solution I might get here. Thanks in advance.

Code:

str = Split(.responseText, " class=""contact-details block dark"">")
y = UBound(str)
On Error Resume Next
For i = 1 To y
    Cells(x, 1) = Split(Split(str(i), "Company Name:")(1), "<")(0)
    Cells(x, 2) = Split(Split(str(i), "Phone:")(1), "<")(0)
    Cells(x, 3) = Split(Split(str(i), "Fax:")(1), "<")(0)
    Cells(x, 4) = Split(Split(str(i), "mailto:")(1), ">")(0)
    x = x + 1
Next i

Html element for address:

Code:

<p>Level 3, 2 Bulletin Place<br>Sydney<br>NSW<br>2000</p>

Html element for web:

Code:

<p>Company Name: Harveys Chartered Accountants<br>Phone: +61 2 9247 2227<br>Fax: +61 2 9247 8550<br>Web: <a target="_blank" href="http://www.harveys.com.au">http://www.harveys.com.au</a></p>

Html element for full coding:

Code:

<div class="contact-details block dark">
                <h3>Contact Details</h3><p>Company Name: Harveys Chartered Accountants<br>Phone: +61 2 9247 2227<br>Fax: +61 2 9247 8550<br>Web: <a target="_blank" href="http://www.harveys.com.au">http://www.harveys.com.au</a></p><h4>Address</h4><p>Level 3, 2 Bulletin Place<br>Sydney<br>NSW<br>2000</p><h4>Contact</h4><p>Name: Cecilia Holmes<br>Phone: +61 2 9247 2227<br>Fax: +61 2 9247 8550<br>Email: <a href="mailto:cholmes@harveys.com.au">cholmes@harveys.com.au</a></p>
            </div>

shahin · Apr 26, 2017

Finally solved it:

Code:

str = Split(http.responseText, " class=""contact-details block dark"">")
y = UBound(str)
On Error Resume Next
    For i = 1 To y
        x = x + 1
        Cells(x, 1) = Split(Split(Split(str(i), "Contact Details</h3>")(1), "Company Name:")(1), "<")(0)
        Cells(x, 2) = Split(Split(Split(str(i), "Contact</h4>")(1), "Name:")(1), "<")(0)
        Cells(x, 3) = Replace(Split(Split(Split(Split(str(i), "Address</h4>")(1), "<p>")(1), "<br>")(0), "</p>")(0), "<br />", " ")
        Cells(x, 4) = Split(Split(str(i), "Phone:")(1), "<")(0)
        Cells(x, 5) = Split(Split(str(i), "Fax:")(1), "<")(0)
        Cells(x, 6) = Split(Split(str(i), "mailto:")(1), ">")(0)
        Cells(x, 7) = Split(Split(Split(Split(str(i), "Web: ")(1), " href=")(1), "<")(0), ">")(0)
    Next i

Can't parse phone number from a web page interrupted by <br> tag

shahin

Active Member

shahin

Active Member

shahin

Active Member