Web Scraping Free Cash Flow from Marketwatch.com
Author: richard1000
Creation Date: 3/3/2017 7:42 PM
profile picture

richard1000

#1
Web scraping request. I wish to scrape the following web site and scrape the "Free Cash Flow" (the last line) for the latest quarter.

http://www.marketwatch.com/investing/stock/aapl/financials/cash-flow/quarter#

YCharts data has "Free Cash Flow" but has a significant delay.
profile picture

Eugene

#2
I got it done earlier today after I had noticed your previous post in another thread. After pasting the code, check System.Xml in Strategy's References dialog:

CODE:
Please log in to see this code.


Note how it caches data in Wealth-Lab's global memory to speed up execution by avoiding repeated web requests.
profile picture

richard1000

#3
Thanks. I guess previous code from Yahoo scraping did the trick.

I may want to pick another data from this table in the future so can you explain about how this part? (I don't know html coding or about HtmlAgilityPack.)

CODE:
Please log in to see this code.


When I opened the html source code, 'mainRow' showed up in 20 different rows and I can't find out why 8th row was used. If there was a reference to "Financing Activities" table, then 8th row would be correct but I don't see it in the code.
profile picture

Eugene

#4
Sorry but teaching XPath and web developer tools goes beyond the quick help I intended to provide.
profile picture

richard1000

#5
Ok. Can you link some reference so I may teach myself?
profile picture

Eugene

#6
If I had references I'd give them but I learn these things (XPath, DOM, HTML, JSON, libraries for scraping...) here and there. Pointers are all around in Google search.
profile picture

richard1000

#7
1. I was testing the above code and the symbols CVX and GE returns a value of 0 since FCF is negative. Marketwatch.com formats the negative value in parentheses rather than as a negative sign.

2. For some strange reason, symbols AMZN and VZ returns a value of -1 even though marketwatch.com shows them as positive and negative FCF respectively.
profile picture

Eugene

#8
Good catch, let's fix the parsing routine. Firstly I expected NumberStyles.AllowParentheses to handle numbers like "(460M)" but for some reason it doesn't. Probably this number format isn't built-in. Therefore the fix below is not elegant but works for me.

To clear global memory from -1s and 0s from the previous runs, uncomment ClearGlobals and do an Execute once, then comment ClearGlobals out and compile. Or copy/paste and restart WLP.

CODE:
Please log in to see this code.


P.S. Perhaps an easier approach would be to replace "(" with a "-" and trim ")" but this occurred to me only later.