Help with XPath for ImportXML Google Sheets.

Discuss all general (i.e. non-personal) investing questions and issues, investing news, and theory.
Post Reply
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

[2014 thread bumped in 2024 --admin LadyGeek]

Hello all,

I am trying to track one of the trickier funds that I have with Vanguard.

It's a 529 Aggressive Portfolio.

https://personal.vanguard.com/us/funds/ ... IntExt=INT

I use a spreadsheet to track my investments. And I am trying to pull the 'price' on that webpage.

So, this is what I am doing to get the Xpath for my ImportXML function for google sheets.

Overview -> Go to actual price -> right click -> Inspect Element -> right click on highlighted xml element -> Copy XPath

And I got this.

Code: Select all

//*[@id="colMcontent"]/table[1]/tbody/tr/td[1]/div/table/tbody/tr[1]/td[2]
Now, when I got to the googlesheets, I did this.

Code: Select all

=importXML(A1,"//*[@id="colMcontent"]/table[1]/tbody/tr/td[1]/div/table/tbody/tr[1]/td[2]")
And it gives me a parse error. Anybody who knows XPath maybe able to help me in getting the actual value of that field in the webpage?
boombaz
Posts: 41
Joined: Sat Dec 28, 2013 12:13 am

Re: Help with XPath for ImportXML Google Sheets.

Post by boombaz »

You're double quoting a string that already has double quotes in it. Either escape the double quotes in the XPath or use single quotes to enclose the XPath

Code: Select all

=importXML(A1,'//*[@id="colMcontent"]/table[1]/tbody/tr/td[1]/div/table/tbody/tr[1]/td[2]')
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

Thank you, but, that did not work either.

SO, I got a better XPath from an extension and stick this in there like so:

Code: Select all

=importXML(A1,"/html[@class='yui3-js-enabled']/body[@id='body']/div[@id='vg0']/div[@id='main']/table[@id='columnContainer']/tbody/tr/td[@id='colM']/div[@id='colMcontent']/table[1]/tbody/tr/td[1]/div[@class='cbdTable2']/table/tbody[@class='right']/tr[1]/td[2]")
And also tried this

Code: Select all

=importXML("//*[@id='colMcontent']/table[1]/tbody/tr/td[1]/div/table/tbody/tr[1]/td[2]")
And the double-quote work like this for my other data.

Both the above trials did not work.
ericinvest
Posts: 57
Joined: Sat Jun 14, 2014 10:57 am

Re: Help with XPath for ImportXML Google Sheets.

Post by ericinvest »

I experienced the same issue to track vanguard 529 funds. After testing a little more the following macro should do it,
importXML("https://personal.vanguard.com/us/funds/ ... IntExt=INT","//*[@id='colMcontent']/table[1]/tr/td[1]/div/table[1]/tbody[@class='right']/tr[1]/td[2]")
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

ericinvest wrote:I experienced the same issue to track vanguard 529 funds. After testing a little more the following macro should do it,
importXML("https://personal.vanguard.com/us/funds/ ... IntExt=INT","//*[@id='colMcontent']/table[1]/tr/td[1]/div/table[1]/tbody[@class='right']/tr[1]/td[2]")
That worked!!! Thank you.
ericinvest
Posts: 57
Joined: Sat Jun 14, 2014 10:57 am

Re: Help with XPath for ImportXML Google Sheets.

Post by ericinvest »

You are welcome. Glad it helped.
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

This basically fills the last piece of my investment data puzzle.

Next goal: Figuring out my IRR based on all the transactional histories.

(401K's, IRA's, Taxable, 529's)

The data feeds for each account are so different. I don't think I will be able to ever to achieve this.
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

Note that if the HTML on Vanguard's site ever changes, this will stop working, as it's just scraping the HTML structure of the page to parse out the value.

Also, you can simplify the XPath to this and it will still work (and be more generic):

Code: Select all

=importXML("https://personal.vanguard.com/us/funds/snapshot?FundId=4510&FundIntExt=INT","(//div[@class='cbdTable2']//td[2])[1]")
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

Reviving this thread for a similar question... What would be the proper ImportXML formula and XPath to:

1. extract the Price or SEC Yield from: https://personal.vanguard.com/us/funds/ ... =INT#tab=0

2. extract the P/E or P/B from: https://personal.vanguard.com/us/funds/ ... =INT#tab=2

Help welcome!
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

siamond wrote:Reviving this thread for a similar question... What would be the proper ImportXML formula and XPath to:

1. extract the Price or SEC Yield from: https://personal.vanguard.com/us/funds/ ... =INT#tab=0

2. extract the P/E or P/B from: https://personal.vanguard.com/us/funds/ ... =INT#tab=2

Help welcome!
Unfortunately you cannot grab the XPath of data in a tab with ImportXML for any Vanguard page that has tabs like the ones above. This is because the content in those tabs loads after the initial page load. Google Sheets ImportXML only grabs XML (HTML in this case) data from the initial page load. It cannot run Javascript and because the data in those tabs is fetched via Javascript it will never see the data generated in those tabs.

What I've done instead is use Morningstar to grab such data. For example from here:

http://portfolios.morningstar.com/fund/summary?t=VTSMX

The added bonus here is that they use the ticker symbol for the URL so it's easy to make a spreadsheet that uses the ticker from a table cell to generate the URL (no need to figure it out like w/ Vanguard). You can do this with the following in place of your normal URL in the importXML call:

Code: Select all

CONCATENATE("http://portfolios.morningstar.com/fund/summary?t=",A1)
Where value of cell "A1" is VTSMX
Last edited by vitaflo on Wed Jul 19, 2017 1:14 pm, edited 1 time in total.
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

After some digging, it looks like Vanguard is loading those tabs with straight HTML. So I watched the page load in the Dev Inspector Tools and found the corresponding url for each tab that you can use for an ImportXML call. Here they are:

https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085

To select a different fund to view, just change the number at the end to the number associated with the fund you want (0085 = VTSMX). You can find this number in the URL of any fund when you're on said fund page.

With this you can get the data you want siamond. For example if you want P/E for VTSMX you can get it with the following:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPortFolioMgmtTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='portfolioForm:equityCharacteristicsDataTabletbody0']/tr[4]/td[2]")
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

vitaflo wrote:With this you can get the data you want siamond. For example if you want P/E for VTSMX you can get it with the following:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPortFolioMgmtTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='portfolioForm:equityCharacteristicsDataTabletbody0']/tr[4]/td[2]")
Fantastic! Ok, I am all set with the portfolio & management tab, thanks to your example.

I must be missing something about the overview tab though. Mind providing the exact ImportXML formula for the Expense Ratio and the Price?
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

siamond wrote:I must be missing something about the overview tab though. Mind providing the exact ImportXML formula for the Expense Ratio and the Price?
Solved my own problem! Using the Price & Performance tab, here are the Price and SEC Yield.

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPricePerfTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='currentQuoteForm:priceTabletbody0']/tr[1]/td[2]")

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPricePerfTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='currentQuoteForm:priceTabletbody0']/tr[3]/td[3]")
And back to the Overview tab, here is the Expense Ratio:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='fundFactsTabletbody0']/tr[3]/td[2]")
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

siamond wrote: Solved my own problem! Using the Price & Performance tab, here are the Price and SEC Yield.

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPricePerfTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='currentQuoteForm:priceTabletbody0']/tr[1]/td[2]")
FYI, there's an even easier way to get price:

Code: Select all

=GOOGLEFINANCE("VTSMX")
Note you can also get things like PE and Expense Ratio with this function but they may be different than what's on Vanguard's site since it uses Morningstar data (and I know you were looking for Vanguard specific data). See here for available parameters:

https://support.google.com/docs/answer/3093281?hl=en
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

vitaflo wrote: FYI, there's an even easier way to get price:

Code: Select all

=GOOGLEFINANCE("VTSMX")
Yeah, I also have been using but, it fails miserably from one perspective

Code: Select all

=INDEX(GoogleFinance(H14,"close",B14), 2, 2)
This gives the closing price on the day, but, sometimes all of a sudden, it stops loading for some days/months. No particular rhyme/reason.

So, a question I have, probably geared more towards google spreadsheets is:

If a formula like this brings in some data for the first time, I don't want google to look it up every time the spreadsheet loads. Anyway to do this?. Some sort of macro like

Code: Select all

NVL(cellvalue,  GoogleFinance())?
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

NOgmacks wrote: If a formula like this brings in some data for the first time, I don't want google to look it up every time the spreadsheet loads. Anyway to do this?. Some sort of macro like

Code: Select all

NVL(cellvalue,  GoogleFinance())?
Even if your code example worked, the spreadsheet would still attempt to load the GoogleFinance data. It always will, and you'd still have the problem of not having the previous value. So if the problem you're trying to solve is you just want it to show the last "good" return of data instead of errors when it fails, you'd need to write a script of your own for that. Something that checks GoogleFinance and if it succeeds writes the value to a cell (just the value) and if not, fails gracefully without writing anything.

Scripts in Google Sheets can do a lot (I wrote one to email me when my AA gets out of whack by a certain %, checks every day), but they're not trivial. Certainly more involved than just the standard cell formulas we've been talking about here, but I wouldn't doubt what you wanted to do is doable if someone dug into it.
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

vitaflo wrote: Scripts in Google Sheets can do a lot (I wrote one to email me when my AA gets out of whack by a certain %, checks every day), but they're not trivial.
This sounds awesome. Didn't know about scripts. Any way you can share this particular script you wrote? Maybe gives me a direction to start with. You will take away the personal part of the logic of course.
Afieldsmd
Posts: 5
Joined: Fri Jan 24, 2014 11:09 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by Afieldsmd »

Hello,
Google washed me up on these shores. I'm ok using
Index(ImportHtml) but I'm struggling with parse errors using the Index(IMPORTXML) commands.
I'm trying to pull the data for a 529 plan from an XPath. Could anyone tell me what I'm doing wrong? Thank you!

=Index(IMPORTXML("https://www.collegeinvest529.com/cotpl/ ... ormance.cs", "//*[@id='form']/section/div[1]/div[1]/table/tbody/tr[1]/td[2]"))

Aaron
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

vitaflo wrote: Wed Jul 19, 2017 1:05 pm After some digging, it looks like Vanguard is loading those tabs with straight HTML. So I watched the page load in the Dev Inspector Tools and found the corresponding url for each tab that you can use for an ImportXML call. Here they are:

https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085

To select a different fund to view, just change the number at the end to the number associated with the fund you want (0085 = VTSMX). You can find this number in the URL of any fund when you're on said fund page.

With this you can get the data you want siamond. For example if you want P/E for VTSMX you can get it with the following:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPortFolioMgmtTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='portfolioForm:equityCharacteristicsDataTabletbody0']/tr[4]/td[2]")
Hi vitaflo, could you please try to also figure out tab1a? Meaning this kind of display: here. Thanks!
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

siamond wrote: Fri Jan 05, 2018 7:15 pm Hi vitaflo, could you please try to also figure out tab1a? Meaning this kind of display: here. Thanks!
Found the answer to my own question. This is actually the same Web page as the regular performance page. It's just that the corresponding data isn't displayed by the browser when the page is queried, but it is part of the HTML source, and can be scraped.
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

siamond wrote: Fri Jan 05, 2018 10:53 pm
siamond wrote: Fri Jan 05, 2018 7:15 pm Hi vitaflo, could you please try to also figure out tab1a? Meaning this kind of display: here. Thanks!
Found the answer to my own question. This is actually the same Web page as the regular performance page. It's just that the corresponding data isn't displayed by the browser when the page is queried, but it is part of the HTML source, and can be scraped.
Yes, this is the hard part about scraping sites for data. It's not always easy to tell if a tab is loading new data from the server (in which case it's not part of the original HTML request and can't be accessed with ImportXML) or is just doing a hide/show (in which case it can be scraped just fine).
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

Afieldsmd wrote: Thu Nov 09, 2017 1:55 pm Hello,
Google washed me up on these shores. I'm ok using
Index(ImportHtml) but I'm struggling with parse errors using the Index(IMPORTXML) commands.
I'm trying to pull the data for a 529 plan from an XPath. Could anyone tell me what I'm doing wrong? Thank you!

=Index(IMPORTXML("https://www.collegeinvest529.com/cotpl/ ... ormance.cs", "//*[@id='form']/section/div[1]/div[1]/table/tbody/tr[1]/td[2]"))

Aaron
Late to seeing this, but the problem here is one siamond also ran into a while back and that is this site requires a cookie to load the page. Browsers of course have no problem with this, but the ImportXML function is not a browser and thus doesn't support cookies. As such it's not going to work for this site.

There are technically ways around this but that becomes very technical very quickly and is beyond the scope of this thread.
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

NOgmacks wrote: Thu Jul 20, 2017 2:28 pm
vitaflo wrote: Scripts in Google Sheets can do a lot (I wrote one to email me when my AA gets out of whack by a certain %, checks every day), but they're not trivial.
This sounds awesome. Didn't know about scripts. Any way you can share this particular script you wrote? Maybe gives me a direction to start with. You will take away the personal part of the logic of course.
I will share a my simple script to send emails, but if you don't know programming, you may run into trouble, just a caveat.

To start you need to go to your spreadsheet and in the menu go to Tools > Script Editor. This is where you'll be writing code to check your AA and send you an email if it's out of whack. Here's an example:

Code: Select all

function readCell() {
    var ss = SpreadsheetApp.getActiveSpreadsheet();
    var sheet = ss.getSheetByName("MySheet");  // sheet name with your AA values in it

    var TotValue = sheet.getRange("A1").getValue() * 100; //make value in A1 into percent integer
    TotValue = TotValue.toFixed(2); //round to 2 places.

    var StockValue = sheet.getRange("B2").getValue() * 100; //make value in B2 into percent integer
    StockValue = StockValue.toFixed(2); //round to 2 places.

    if (TotValue >= 5.0) {  //if off by more than 5%
        sendMail(TotValue, "TOTAL");  // send email
    }

    if (StockValue >= 5.0) {  //if off by more than 5%
        sendMail(StockValue, "Stock");  // send email
    }
};

function sendMail(value, allocation) {
    var email = "user@example.com";  //your email
    var subject = allocation + " Allocation Alert: " + value + "%";
    var body = "Your " + allocation + " asset allocation is off by " + value + "%";
    MailApp.sendEmail(email,subject,body); 
}
"MySheet" is the name of the sheet you're targeting. "A1" and "B2" are the cells where you've calculated how far out of whack your AA is for your total portfolio (stocks to bonds) and stocks (US to Int). The script will check these values and if they are >5% (5.0 in the "if" statement) it will call the sendMail() function (to "user@example.com" in this case). Use your own values in the above, these are just for example.

For this to run you need to set a trigger. Save the script and then in the menu go to Edit > Current Project's Triggers. You will need to create a new trigger. It should default to run readCell() which is what you want. For Events you want it set to Time Driven > Day Timer > X-X, where "X-X" is a time you want the trigger to run (mine is set for 9am-10am for example). During that time frame every day the script will run and if your AA is out of whack by 5% or more it will send you an email saying so.

When you save the trigger it will ask you to authorize and view permissions. Start this process and eventually you will get to a screen saying the app isn't authorized. Hit Advanced here and click on "Go to Script Name (unsafe)". You will then get a screen asking for you to allow the script to run, do so (don't worry, it's not unsafe, just don't share the sheet with anyone). Your script is now hooked up and running.

You may want to test the script to make sure there are no errors when it runs. To do so, go to Run > Run Function, and pick either function. Note if you pick readCell() and your AA isn't off by 5%, you won't get an email. But if there's an obvious error (like you mistyped the sheet name) the script page will bark at you to fix it. You may also want to manually change the values in your sheet to make sure the script runs and sends you the email properly.

Hopefully this helps someone has a little programming experience and wants some automated alerts. It certainly helps me not peek at my portfolio. The portfolio now lets me know when it's time to rebalance, I don't need to check it.
Afieldsmd
Posts: 5
Joined: Fri Jan 24, 2014 11:09 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by Afieldsmd »

Thank you!
User avatar
Kevin M
Posts: 15789
Joined: Mon Jun 29, 2009 3:24 pm
Contact:

Re: Help with XPath for ImportXML Google Sheets.

Post by Kevin M »

vitaflo wrote: Wed Jul 19, 2017 1:05 pm After some digging, it looks like Vanguard is loading those tabs with straight HTML. So I watched the page load in the Dev Inspector Tools and found the corresponding url for each tab that you can use for an ImportXML call. Here they are:

https://personal.vanguard.com/us/JSP/Fu ... undId=0085 <snip>
It would be great if you could provide some details on how you "watch the page load" to find the URLs.

I'm just barely familiar with using the Network tool in dev tools to view the preview and/or responses of the various scripts. Is this what you're talking about, or is there a faster way?

Also, siamond mentioned that these web pages have changed, and it's no longer obvious how to find the URLs with the dev tools. Any updates on that?
vitaflo wrote: Wed Jul 19, 2017 1:05 pm With this you can get the data you want siamond. For example if you want P/E for VTSMX you can get it with the following:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPortFolioMgmtTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='portfolioForm:equityCharacteristicsDataTabletbody0']/tr[4]/td[2]")
For those not familiar with parsing xml, the data also can be retrieved using ImportHTML. The only trick is finding the right table number. Once you've found the table of interest, the values can be extracted with INDEX, and if necessary other functions to strip off unwanted characters.

Note that the table number is different for different funds. For example, it seems that price and sec yield for bond fund investor shares are in table 6, while admiral shares are in table 5. Also, three rows are returned for bond funds (SEC yield in row three) and four rows are returned for the money market funds (SEC yield in row four). At least this is what I've found so far.

Kevin
If I make a calculation error, #Cruncher probably will let me know.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

I'm sure Vitaflo will have more authoritative comments than me, but here is a quick example of why I think things changed a good deal on the Vanguard Web pages since this thread started.

I took a new look at this individual fund page, I was looking for the expense ratio, I quickly noticed on the Network activity that there is a request with a rather glaring link:

Code: Select all

https://api.vanguard.com/rs/ire/01/ind/fund/4509/expense.jsonp?callback=angular.callbacks._u&planId=null
Checking the preview of the response with the arrows, we can find the ER (0.15%) field right away. And clearly, this is not using the old static Web page. I just hope they will keep those old static pages working for a while (they do for now, I checked)...

As a side note, JSON is a lightweight format to transport data between server and client (wikipedia JSON). Unsurprisingly, It appears that there is some solid support for it in Google scripts:
https://developers.google.com/apps-scri ... s/external
User avatar
vitaflo
Posts: 1905
Joined: Sat Sep 03, 2011 3:02 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by vitaflo »

Kevin M wrote: Sat Apr 06, 2019 6:44 pm It would be great if you could provide some details on how you "watch the page load" to find the URLs.

I'm just barely familiar with using the Network tool in dev tools to view the preview and/or responses of the various scripts. Is this what you're talking about, or is there a faster way?
This is all there is to it yes, however, it does seem that their site has changed. They seem to be using Angular now with API calls to fetch data. This is bad news because Vanguard (smartly) seems to have locked down their API to themselves only. While you can see the response in the Network dev tool, if you try to load up a Vanguard API URL on your own you're not going to get a response from it (you'll just get an error). That makes most of this thread's info moot.

So while the old links still work, it's probably only a matter of time before they don't and you'll have to find somewhere else to scrape your data.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

vitaflo wrote: Sat Apr 06, 2019 8:05 pmThis is bad news because Vanguard (smartly) seems to have locked down their API to themselves only. While you can see the response in the Network dev tool, if you try to load up a Vanguard API URL on your own you're not going to get a response from it (you'll just get an error). That makes most of this thread's info moot.
Well, I don't think it is that bad. Please check this Google Sheet script I cobbled together. It works fine, I did log and return the proper expense ratio value at the end. Now one must use a script for this, as the refer[r]er header is mandatory for it to work...

Code: Select all

function scrapeVG_ER(qFundId) {
  var response, urlString, webContent=null, dataJSON, err;
  var idx, len, valER=-1;
      
  var options = {'headers': {'Referer': ' https://investor.vanguard.com'}};
  
  urlString = "https://api.vanguard.com/rs/ire/01/ind/fund/" +
              qFundId +
              "/expense.jsonp?callback=angular.callbacks._u&planId=null";
  console.log("Web query %s", urlString);
   
  // Perform the Web query while catching errors
  try {
    response = UrlFetchApp.fetch(urlString, options);         
    console.log("Response %s", response);
  }
  catch (err) {
    response = null;
  }

  if (response != null) {
    errcode = response.getResponseCode();  
    console.log("Errcode %s", errcode);
    if (errcode == 200) {                                   // HTTP 200 OK?
      webContent = response.getContentText();
      console.log("Content %s", webContent);
    }
  }
  
  if (webContent != null) {
    // the Web content may include some extraneous function name around the JSON context => strip it
    // e.g. something like angular.callbacks._u({JSON-content})
    
    idx = webContent.indexOf("{");                                   // search first JSON curly brace
    if (idx > 0)
      webContent = webContent.substr(idx, webContent.length-idx);    // get rid of any extraneous header 
    
    len = webContent.length;
    while ((len > 0) && (webContent[len-1] != "}"))                  // search last JSON curly brace
      len--;
    
    if (len < webContent.length)
      webContent = webContent.substr(0, len);                        // get rid of any extraneous trailer 
    
    console.log("JSON to parse %s", webContent);
    
    // now let's parse the JSON content while recovering from parsing errors
    if (webContent.length > 0) {
      try {
        dataJSON = JSON.parse(webContent);
        console.log("JSON object %s", dataJSON);
      }
      catch (err) {
        dataJSON = null;
      }
    
      // finally, we can get to the real content... Expense Ratio in this case
      // note the conversion to an integer object by use of the "+" unary operator
      if (dataJSON != null) {
        valER = +dataJSON.expenseRatio;
        console.log("Exp Ratio %s", valER);   
      }
    }
  }

  return(valER);
}
Topic Author
NOgmacks
Posts: 169
Joined: Mon Dec 30, 2013 9:22 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by NOgmacks »

I'm the OP on this original question, and I'm loving that this discussion is still going after these many years. I still use similar Vanguard funds for the kid's 529's. Recently, I became acquainted with writing code in Python, Jupyter notebook, and using the selenium module. I was safely able to let the browser load, wait a couple of seconds, pull the values off using XPath.

Now, if I can only load this into my Google drive, and get my sheet refer to those fields, I'm good. This pipeline seems more like a pipe dream, but nothing is stopping me from pursuing it further.
User avatar
Kevin M
Posts: 15789
Joined: Mon Jun 29, 2009 3:24 pm
Contact:

Re: Help with XPath for ImportXML Google Sheets.

Post by Kevin M »

vitaflo wrote: Sat Apr 06, 2019 8:05 pm
Kevin M wrote: Sat Apr 06, 2019 6:44 pm It would be great if you could provide some details on how you "watch the page load" to find the URLs.

I'm just barely familiar with using the Network tool in dev tools to view the preview and/or responses of the various scripts. Is this what you're talking about, or is there a faster way?
This is all there is to it yes,<snip>
Thanks. Any tips on honing in on the scripts that actually return data?

Kevin
If I make a calculation error, #Cruncher probably will let me know.
User avatar
Kevin M
Posts: 15789
Joined: Mon Jun 29, 2009 3:24 pm
Contact:

Re: Help with XPath for ImportXML Google Sheets.

Post by Kevin M »

siamond wrote: Sat Apr 06, 2019 10:16 pm <snip>Please check this Google Sheet script I cobbled together. It works fine, I did log and return the proper expense ratio value at the end. Now one must use a script for this, as the refer[r]er header is mandatory for it to work... <snip>
Thanks siamond! And thanks for doing the extra work to make the script even better than the version you posted earlier today.

Kevin
If I make a calculation error, #Cruncher probably will let me know.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Help with XPath for ImportXML Google Sheets.

Post by siamond »

Kevin M wrote: Sun Apr 07, 2019 1:26 pmAny tips on honing in on the scripts that actually return data?
Here are a couple of things I do:
- ignore the .js/.css/.png/etc names
- focus on names that seem to have a clear meaning
- focus on entries where the waterfall area shows that the response took a bit longer than the rest of the stuff (blue area is larger)
- with the new pages from Vanguard, focus on entries with the 'json' letters inside the name

I played around a bit more with JSON parsing and corresponding scripting, and this is pretty cool. I found very useful to copy and paste raw Web responses including JSON data in a friendly JSON viewer, in order to figure out the structure of the data, hence the object names to use.

Hopefully such approach will allow to go straight to the data of interest, while building scraping scripts that will not have to change (or rarely so) when Vanguard changes the display of their Web pages...
User avatar
goldenage
Posts: 3
Joined: Wed Jul 27, 2022 11:27 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by goldenage »

I found this thread to be incredibly helpful - however after reading I still needed a fair amount of trial and error in order to get things to work. As such - I want to reiterate my steps below to hopefully save others some trouble. All steps are in Google Chrome with final output inserted into a cell in Google Sheets.

In my case - I wanted to retrieve the SEC Yield for both VTEB and BND so that I could automatically compare the Tax Equivalent Yields.

In order to do this the first thing you need to do is find the Vanguard Fund ID. You can do this by navigating to the product page (e.g. https://investor.vanguard.com/investmen ... rofile/bnd). Then Inspect and search for "fundid". In the case of BND it is "0928".

Once you have the Fund ID - navigate to the following URL where FundId is replaced with the appropriate Fund Id:.

Code: Select all

https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPricePerfTabContent.jsf?FundIntExt=INT&FundId=0928
Once open - Inspect and search for the value you want to retrieve (e.g. I want the SEC Yield so I search for "3.31" in the Inspect Console. Right click the found section and select Copy > Copy XPath.

Construct your code as follows - =importXML("URL","XPath")

This will look like the following. This example pulls the SEC Yield for BND:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPricePerfTabContent.jsf?FundIntExt=INT&FundId=0928","//*[@id='currentQuoteForm:priceTabletbody0']/tr[5]/td[2]/text()")
Hope this helps someone! Thanks for the great community and thread!
Disclaimer: Personal Opinion, Not Financial Advice
king__of_universe
Posts: 1
Joined: Wed Feb 28, 2024 6:28 pm

Re: Help with XPath for ImportXML Google Sheets.

Post by king__of_universe »

vitaflo wrote: Wed Jul 19, 2017 1:05 pm After some digging, it looks like Vanguard is loading those tabs with straight HTML. So I watched the page load in the Dev Inspector Tools and found the corresponding url for each tab that you can use for an ImportXML call. Here they are:

https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085
https://personal.vanguard.com/us/JSP/Fu ... undId=0085

To select a different fund to view, just change the number at the end to the number associated with the fund you want (0085 = VTSMX). You can find this number in the URL of any fund when you're on said fund page.

With this you can get the data you want siamond. For example if you want P/E for VTSMX you can get it with the following:

Code: Select all

=importXML("https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundPortFolioMgmtTabContent.jsf?FundIntExt=INT&FundId=0085","//*[@id='portfolioForm:equityCharacteristicsDataTabletbody0']/tr[4]/td[2]")
It looks like the Portfolio & Management tab URL no longer works. I'm not sure how you extracted those originally. Is there a new URL that does work?
Post Reply