• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

SCRAPE THE SPECIFIC DATA FROM HTML

blazy

Member
Hello!

I need help from one of you Excel gurus, I hope its not too complicated.

If you follow the below example links you will see in what format I have the data in html which I need to import it into Excel:

http://www.sec.gov/Archives/edgar/data/1319067/000139834415000423/fp0012923_nq.htm

http://www.sec.gov/Archives/edgar/data/1319067/000139834415000459/fp0012926_nq.htm

...

This data must be then reorganized and the final output is like in the attached file '360 Funds'.

So from each links I need only the following information: Fund Name, Sector, Company, Shares, Report Date.

The most important is Shares data, I just need that (+company name, sector, Fund name, date). Other data I do not need.

So I am looking for a possible VBA solution, where I could just copy paste a link and the data will be imported into the excel file -> and then reorganized so the only data I have is above described data.

If you have any other solution that is fine, I would greatly appreciate any help, since I have to import the data from 130 each month, whichit is really time consuimg job.


Thank you!
 

Attachments

  • 360 Funds.xlsx
    17.1 KB · Views: 11
Hi,

you had already done a request via a QueryTable to load data,
what else ?‼

Hello! sorry, I dont quite understand... I have not, so far I was doing it manually.

I was thinking about the following vba solution:

When I run a macro one window opens where I can paste in the link, and then all the data is imported into excel spreadsheet. Now I do not know what is better to make vba that automatically sort the data before importing, or is better to sort the data after we have it excel spreadsheet.

Hope I make any sense, what do you think is most appropriate solution to this?

I am also attaching the excel file with all the links from where I need to import the data. As I said it is more than 1000 links (.txt files on ftp link, please see attached)

Thanks!
 

Attachments

  • SEC Index of Form N-Q Filings Q1 2015.xlsx
    45.2 KB · Views: 1
Hello, I have found one code but its probably nothing big. Please see attached file.

Thanks.
 

Attachments

  • 360 Funds_1.xlsm
    26.4 KB · Views: 8
Hi:

Glad that you solved it, I was really busy yesterday with lots of meetings. I would suggest you to list down all the URLs in one excel sheet and loop through the same, so that your link will be dynamic.

Thanks
 
Hey!

I would be glad if I really did work it out.. :) I did not. I can provide you with the file with all the links in a couple of days. Can we meanwhile do the code to loop through all the links that are in sheet or workbook, like you suggested? The second step is tricky and I rely on your expertese (wheter to scrape just specifi data or scrape all and than restructure it once in excel file.

Thanks.
 
HI, Nebu, thanks you are awesome!!!!!

Could we extend the code to loop through 1000 links when I provide you with the file of all the links?

Thanks!
 
Hey @Nebu !

Thank you so far.

I am attaching the list of first links. I will be adding the list further down, like in the attached file.

Some links go to reports with many funds - I am keeping track of all the funds (even bonds) but the VBA should only find 'common stocks', and the funds containing them. If that is not possible it is fine to leave it out.

Also notice that not in all links the titles and positions are the same (I hope that does not complicate things too much).

For exmple: here you have 'shares' in two columns:
https://www.sec.gov/Archives/edgar/data/878719/000119312514354686/d776420dnq.htm

in here the left column is shares and the right one is companies name:

https://www.sec.gov/Archives/edgar/data/1539996/000091047214005502/formnq.htm

here the title is not just "shares" but it is "Shares/ Principal Amount ($)" or "numebr of shares"

https://www.sec.gov/Archives/edgar/data/1353176/000135317614000025/acgf103114nqdoc.htm

https://www.sec.gov/Archives/edgar/data/766285/000076628514000030/amana20140831n-q.htm

here the structure is just a little odd:

http://www.sec.gov/Archives/edgar/data/1175813/000116204415000169/conestoganq.htm

and so on...

Hope we can find some pattern which meets all the requirements?

Thank you very much!
 

Attachments

  • LINK LIST.xlsx
    23.9 KB · Views: 4
Hi:

I went through a couple of links you had given in link list workbook. The following are my observations

say for a the URL
http://www.sec.gov/Archives/edgar/data/1319067/000139834415000423/fp0012923_nq.htm

.Name = "fp0012923_nq_1" has to be dynamic (which is easy to do since it is a part of the URL)

.WebTables = "3,4" (these are the tables which you want to download from the above URL)

3 and 4 means 3rd and 4th table on the website.


These table numbers have to be determined first before running the macro, since your webpage layout is different from one link to the other without determining these table numbers it would be difficult to code it. Let me know your thoughts.


Thanks



 
Hi:

Please find the attached. You will have to fill in the table nos in the link tab which will be tedious because you are pulling different fund details from the same URL, but once if you can get the table nos correct you will be able to scrape the data with some level of accuracy. I am attaching a copy for your perusal.

Thanks
 

Attachments

  • 360 Funds_1.xlsm
    62.1 KB · Views: 7
Hello, thank you Nebu!

I however have one question and final remark, because I still did not get it to work.

Macro works (and never stops) if I just run it, without writing in the table numbers. Also if I insert just numbers for few links, the file crashes...

Could you write a macro, in a way if no numbers are filled in, then skip that link?

And please can you tell me in what format do I have to write the numbers in, so the code will run correctly?

I think I understand the logic, but the file keeps crashing.

Thank you.
 
Hi :

The macro might be crashing because I had hard coded the table number as 3,4. If the table no 3 or 4 is not present in the URL it may error out. Without determining the table numbers, I am afraid you could download and scrape data.

Thanks
 
Back
Top