R and Data Sets
R and Data Sets
This is a bit of a carry over from another thread on how to hedge.
I am looking for resources on how to learn the R Programming language, how to apply it to investment theory, and packages that I should be looking at. For the packages I am looking for ones that have total return indexes in particular.
Any pointers in the right direction would be appreciated.
I am looking for resources on how to learn the R Programming language, how to apply it to investment theory, and packages that I should be looking at. For the packages I am looking for ones that have total return indexes in particular.
Any pointers in the right direction would be appreciated.
 AtlasShrugged?
 Posts: 572
 Joined: Wed Jul 15, 2015 6:08 pm
Re: R and Data Sets
Alex, you might be better off using SPSS. It has a much friendlier user interface than 'r'. The downside is the license cost of SPSS.
“If you don't know, the thing to do is not to get scared, but to learn.”
 DartThrower
 Posts: 821
 Joined: Wed Mar 11, 2009 4:10 pm
 Location: Philadelphia
Re: R and Data Sets
I would take one of the free online versions of the R courses available at Coursera or Edx. Duke and Johns Hopkins in particular have a lot of course offerings on Coursera. Then work on projects and when you get stuck ask questions at websites stackoverflow or quora. You will stumble across many many finance packages as you learn and ask questions about R. Try this link to get started:
https://cran.rproject.org/web/views/Finance.html
For additional motivation and camaraderie you could join a programming club or meetup group. There are tons of those in larger metro areas. You could also try other packages such as SPSS mentioned already, but these can get very expensive.
https://cran.rproject.org/web/views/Finance.html
For additional motivation and camaraderie you could join a programming club or meetup group. There are tons of those in larger metro areas. You could also try other packages such as SPSS mentioned already, but these can get very expensive.
A Boglehead can stay the course longer than the market can stay irrational.
Re: R and Data Sets
Start with small, clean datasets, messy datasets are much harder to deal with in R. Learn to manipulate your datasets before you have to use them for analyses. And RStudio is lovely compared to the base R.
Re: R and Data Sets
Deleted
Last edited by ACM4297 on Sun Jun 18, 2017 7:30 pm, edited 1 time in total.

 Posts: 281
 Joined: Thu Apr 01, 2010 1:18 pm
Re: R and Data Sets
Maybe try Data Camp?
Re: R and Data Sets
Big R fan here. I always point new R users to learn the 'tidyverse' set of packages: http://tidyverse.org/. You will make your life much easier learning how to do data manipulation with a dplyr + tidy verse mindset as opposed to learning it the 'base R' way. Hadley Wickham, the man behind these R packages has made data analysis literally 510x easier. Once you get your head around what 'tidy data' is, it makes your life much easier.
A great textbook that will take you from beginning to end is: http://r4ds.had.co.nz/.
In terms of investment related packages, I've wanted to look into https://github.com/abresler/fundManageR, I've heard good things.
A great textbook that will take you from beginning to end is: http://r4ds.had.co.nz/.
In terms of investment related packages, I've wanted to look into https://github.com/abresler/fundManageR, I've heard good things.
Re: R and Data Sets
This is good advice. There is a free book online Processing and Analyzing Financial Data with R.DartThrower wrote:I would take one of the free online versions of the R courses available at Coursera or Edx. Duke and Johns Hopkins in particular have a lot of course offerings on Coursera. Then work on projects and when you get stuck ask questions at websites stackoverflow or quora. You will stumble across many many finance packages as you learn and ask questions about R. Try this link to get started:
https://cran.rproject.org/web/views/Finance.html
For additional motivation and camaraderie you could join a programming club or meetup group. There are tons of those in larger metro areas. You could also try other packages such as SPSS mentioned already, but these can get very expensive.
Re: R and Data Sets
I would also suggest starting with EdX and Coursera classes ... many statistics classes use R. The Analytics Edge class begins June 6th on EdX...you can take it for free. DataCamp also has introductory R classes as well as finance specific classes..."Importing and Managing Financial Data in R" & "Introduction to PortfolioAnalytics in R" were a couple that I liked. As for packages, quantmod, PerformanceAnalytics, and PortfolioAnalytics are the packages that I use most often.
I find the QuickR website a helpful resource http://www.statmethods.net/ .....the companion R in Action Book is also good.
Most often you'll download required data from the web. I download price data directly into R from yahoo or quandl with the quantmod package. Economic data is downloaded from the FED. Once you've learned R basics, the numerous packages make it easy to work with data without having to write a lot of code.
As another poster suggested, install Rstudio in addition to R. The RStudio IDE makes R easier and more pleasant to work with.
I find the QuickR website a helpful resource http://www.statmethods.net/ .....the companion R in Action Book is also good.
Most often you'll download required data from the web. I download price data directly into R from yahoo or quandl with the quantmod package. Economic data is downloaded from the FED. Once you've learned R basics, the numerous packages make it easy to work with data without having to write a lot of code.
As another poster suggested, install Rstudio in addition to R. The RStudio IDE makes R easier and more pleasant to work with.
Re: R and Data Sets
Everybody, thanks for the feedback. Sometimes in a vast see it helps to get a few pointers to orientate oneself.
As for Python, that might be next on my list. Any specific resources that I should be looking at?
As for Python, that might be next on my list. Any specific resources that I should be looking at?
Re: R and Data Sets
Your path with python would be similar as with R. Data Camp has R and Python classes. Coursera has "Programming for Everbody" from the University of Michigan and Rice University has "An Introduction to Interactive Programming in Python". The Python Pandas documentation is a good intro to using Python http://pandas.pydata.org/pandasdocs/stable/ . Here is a basic python finance tutorial... http://nbviewer.jupyter.org/github/twie ... sics.ipynb
Re: R and Data Sets
I taught myself the basics of R many years ago using the book by Alain Zuur et al, "A beginner's guide to R".
What I especially like about this book for beginners is that it focuses on getting data into R, as well as data manipulation, rather than immediately jumping into statistics. Its also quite accessible if you don't have any experience in programming.
What I especially like about this book for beginners is that it focuses on getting data into R, as well as data manipulation, rather than immediately jumping into statistics. Its also quite accessible if you don't have any experience in programming.
Re: R and Data Sets
It is a mistake to believe that the purpose of R or SPSS is to analyze data. The purpose is to test hypotheses and develop models. To understand how to use R effectively is to understand how to construct experiments and models properly. This includes an understanding of what data to collect and what tests to commit to using when the experiment is designed.
Once the above is understood, understanding how to execute a selected test in R is more of an afterthought.
There is a misconception that if you have an implementation of a statistical test (R, portfoliovisualizer, etc.) and feed it whatever data you have available, then the theoretical purpose of the test is satisfied and the outcome may be interpreted accordingly. Nothing could be further from the truth, widespread usage in that manner notwithstanding.
Once the above is understood, understanding how to execute a selected test in R is more of an afterthought.
There is a misconception that if you have an implementation of a statistical test (R, portfoliovisualizer, etc.) and feed it whatever data you have available, then the theoretical purpose of the test is satisfied and the outcome may be interpreted accordingly. Nothing could be further from the truth, widespread usage in that manner notwithstanding.
Index fund investor since 1987.
 DartThrower
 Posts: 821
 Joined: Wed Mar 11, 2009 4:10 pm
 Location: Philadelphia
Re: R and Data Sets
I believe that testing hypotheses and developing models is part of data analysis. I understand you to be saying that statistical packages in general are just tools that allow their user to implement what's really important  the application of correct statistical reasoning to correctly constructed data. Users of statistical packages often produce garbage because they see the packages themselves as "black boxes" that magically do the thinking work for you. Statistical packages relieve the user of the time consuming chore of coding and testing known algorithms. This fact is enormously powerful however in the right hands.jalbert wrote:It is a mistake to believe that the purpose of R or SPSS is to analyze data. The purpose is to test hypotheses and develop models. To understand how to use R effectively is to understand how to construct experiments and models properly.
I agree with what you said, but have a slight issue with how you said it.
A Boglehead can stay the course longer than the market can stay irrational.

 Posts: 46
 Joined: Tue Mar 14, 2017 1:15 pm
Re: R and Data Sets
I can't recommend python and the pandas package enough.
Re: R and Data Sets
I said it in a provocative way to make the point because misuse of statistical tests is so pervasive these days. But I believe you have it backwards. When statistical methods are used, data collection and analysis are part of hypothesis testing and model construction, not the other way around.DartThrower wrote: I believe that testing hypotheses and developing models is part of data analysis.
....
I agree with what you said, but have a slight issue with how you said it.
Most statistical tests that involve sampling a population or sampling a distribution require committing to hypotheses to be tested, and committing to the tests to be used in the analysis before the data is collected, or at least before it is viewed. Random samples are used because it typically is impractical or impossible to measure the entire population, and nonrandom samples invalidate many statistical tests. Ttests on portfoliovisualizer are not at all valid for instance because of nonrandom samples of correlated data points, but knowledge of that doesn't seem to stop people from citing the tstatistic values generated.
What is pervasive today is viewing data sets as the first class object being studied regardless of how collected, analyzing them with statistical tests, and then expecting the results to generalize by magic.
Last edited by jalbert on Wed May 03, 2017 3:50 pm, edited 2 times in total.
Index fund investor since 1987.
Re: R and Data Sets
I will side with Jalbert on this. I have seen statistical tests used badly in so many different ways that a clear denunciation of sloppy methods is appreciated.jalbert wrote:I said it in a provocative way to make the point because misuse of statistical tests is so pervasive these days.
And now 2 great links.
The first is Five Thirty Eight's great PHacking example:
https://fivethirtyeight.com/features/sc ... ntbroken/
Plus insight on jelly beans from XKCD:
https://xkcd.com/882/
Re: R and Data Sets
DW and I got huge chuckle out of this one. Thank you!Plus insight on jelly beans from XKCD:
https://xkcd.com/882/

 Posts: 1076
 Joined: Tue Feb 23, 2016 4:24 am
Re: R and Data Sets
+100 Wickham's stuff  it has saved me a lot of hours
Also like R Studio
UCLA has good code to copy http://stats.idre.ucla.edu/other/dae/
More specifics depend on your background  are you really good with SAS or python and want to expand  are you an economist or stats person  etc
Also like R Studio
UCLA has good code to copy http://stats.idre.ucla.edu/other/dae/
More specifics depend on your background  are you really good with SAS or python and want to expand  are you an economist or stats person  etc
G.E. Box "All models are wrong, but some are useful."