R and Data Sets

Discuss all general (i.e. non-personal) investing questions and issues, investing news, and theory.
Post Reply
alex_686
Posts: 3920
Joined: Mon Feb 09, 2015 2:39 pm

R and Data Sets

Post by alex_686 » Mon May 01, 2017 11:22 am

This is a bit of a carry over from another thread on how to hedge.

I am looking for resources on how to learn the R Programming language, how to apply it to investment theory, and packages that I should be looking at. For the packages I am looking for ones that have total return indexes in particular.

Any pointers in the right direction would be appreciated.

User avatar
AtlasShrugged?
Posts: 608
Joined: Wed Jul 15, 2015 6:08 pm

Re: R and Data Sets

Post by AtlasShrugged? » Mon May 01, 2017 12:14 pm

Alex, you might be better off using SPSS. It has a much friendlier user interface than 'r'. The downside is the license cost of SPSS.
“If you don't know, the thing to do is not to get scared, but to learn.”

User avatar
DartThrower
Posts: 845
Joined: Wed Mar 11, 2009 4:10 pm
Location: Philadelphia

Re: R and Data Sets

Post by DartThrower » Mon May 01, 2017 12:57 pm

I would take one of the free online versions of the R courses available at Coursera or Edx. Duke and Johns Hopkins in particular have a lot of course offerings on Coursera. Then work on projects and when you get stuck ask questions at websites stackoverflow or quora. You will stumble across many many finance packages as you learn and ask questions about R. Try this link to get started:

https://cran.r-project.org/web/views/Finance.html


For additional motivation and camaraderie you could join a programming club or meetup group. There are tons of those in larger metro areas. You could also try other packages such as SPSS mentioned already, but these can get very expensive.
A Boglehead can stay the course longer than the market can stay irrational.

Twood
Posts: 84
Joined: Sun Nov 06, 2016 12:15 am

Re: R and Data Sets

Post by Twood » Mon May 01, 2017 2:49 pm

Start with small, clean datasets, messy datasets are much harder to deal with in R. Learn to manipulate your datasets before you have to use them for analyses. And RStudio is lovely compared to the base R.

ACM4297
Posts: 59
Joined: Mon Mar 27, 2017 8:37 pm

Re: R and Data Sets

Post by ACM4297 » Mon May 01, 2017 3:12 pm

Deleted
Last edited by ACM4297 on Sun Jun 18, 2017 7:30 pm, edited 1 time in total.

understandingJH
Posts: 284
Joined: Thu Apr 01, 2010 1:18 pm

Re: R and Data Sets

Post by understandingJH » Mon May 01, 2017 3:19 pm

Maybe try Data Camp?

charis23
Posts: 31
Joined: Mon Jan 02, 2017 3:25 pm

Re: R and Data Sets

Post by charis23 » Mon May 01, 2017 3:20 pm

Big R fan here. I always point new R users to learn the 'tidyverse' set of packages: http://tidyverse.org/. You will make your life much easier learning how to do data manipulation with a dplyr + tidy verse mindset as opposed to learning it the 'base R' way. Hadley Wickham, the man behind these R packages has made data analysis literally 5-10x easier. Once you get your head around what 'tidy data' is, it makes your life much easier.

A great textbook that will take you from beginning to end is: http://r4ds.had.co.nz/.

In terms of investment related packages, I've wanted to look into https://github.com/abresler/fundManageR, I've heard good things.

Beliavsky
Posts: 690
Joined: Sun Jun 29, 2014 10:21 am

Re: R and Data Sets

Post by Beliavsky » Mon May 01, 2017 3:21 pm

DartThrower wrote:I would take one of the free online versions of the R courses available at Coursera or Edx. Duke and Johns Hopkins in particular have a lot of course offerings on Coursera. Then work on projects and when you get stuck ask questions at websites stackoverflow or quora. You will stumble across many many finance packages as you learn and ask questions about R. Try this link to get started:

https://cran.r-project.org/web/views/Finance.html


For additional motivation and camaraderie you could join a programming club or meetup group. There are tons of those in larger metro areas. You could also try other packages such as SPSS mentioned already, but these can get very expensive.
This is good advice. There is a free book online Processing and Analyzing Financial Data with R.

GLState
Posts: 141
Joined: Wed Feb 15, 2017 10:38 am

Re: R and Data Sets

Post by GLState » Mon May 01, 2017 3:58 pm

I would also suggest starting with EdX and Coursera classes ... many statistics classes use R. The Analytics Edge class begins June 6th on EdX...you can take it for free. DataCamp also has introductory R classes as well as finance specific classes..."Importing and Managing Financial Data in R" & "Introduction to PortfolioAnalytics in R" were a couple that I liked. As for packages, quantmod, PerformanceAnalytics, and PortfolioAnalytics are the packages that I use most often.

I find the Quick-R website a helpful resource http://www.statmethods.net/ .....the companion R in Action Book is also good.

Most often you'll download required data from the web. I download price data directly into R from yahoo or quandl with the quantmod package. Economic data is downloaded from the FED. Once you've learned R basics, the numerous packages make it easy to work with data without having to write a lot of code.

As another poster suggested, install Rstudio in addition to R. The RStudio IDE makes R easier and more pleasant to work with.

alex_686
Posts: 3920
Joined: Mon Feb 09, 2015 2:39 pm

Re: R and Data Sets

Post by alex_686 » Mon May 01, 2017 5:50 pm

Everybody, thanks for the feedback. Sometimes in a vast see it helps to get a few pointers to orientate oneself.

As for Python, that might be next on my list. Any specific resources that I should be looking at?

GLState
Posts: 141
Joined: Wed Feb 15, 2017 10:38 am

Re: R and Data Sets

Post by GLState » Mon May 01, 2017 6:24 pm

Your path with python would be similar as with R. Data Camp has R and Python classes. Coursera has "Programming for Everbody" from the University of Michigan and Rice University has "An Introduction to Interactive Programming in Python". The Python Pandas documentation is a good intro to using Python http://pandas.pydata.org/pandas-docs/stable/ . Here is a basic python finance tutorial... http://nbviewer.jupyter.org/github/twie ... sics.ipynb

h8(N)++
Posts: 12
Joined: Sun May 15, 2016 12:11 am

Re: R and Data Sets

Post by h8(N)++ » Mon May 01, 2017 7:56 pm

I taught myself the basics of R many years ago using the book by Alain Zuur et al, "A beginner's guide to R".

What I especially like about this book for beginners is that it focuses on getting data into R, as well as data manipulation, rather than immediately jumping into statistics. Its also quite accessible if you don't have any experience in programming.

jalbert
Posts: 3811
Joined: Fri Apr 10, 2015 12:29 am

Re: R and Data Sets

Post by jalbert » Mon May 01, 2017 10:08 pm

It is a mistake to believe that the purpose of R or SPSS is to analyze data. The purpose is to test hypotheses and develop models. To understand how to use R effectively is to understand how to construct experiments and models properly. This includes an understanding of what data to collect and what tests to commit to using when the experiment is designed.

Once the above is understood, understanding how to execute a selected test in R is more of an afterthought.

There is a misconception that if you have an implementation of a statistical test (R, portfoliovisualizer, etc.) and feed it whatever data you have available, then the theoretical purpose of the test is satisfied and the outcome may be interpreted accordingly. Nothing could be further from the truth, widespread usage in that manner notwithstanding.
Risk is not a guarantor of return.

User avatar
DartThrower
Posts: 845
Joined: Wed Mar 11, 2009 4:10 pm
Location: Philadelphia

Re: R and Data Sets

Post by DartThrower » Wed May 03, 2017 9:04 am

jalbert wrote:It is a mistake to believe that the purpose of R or SPSS is to analyze data. The purpose is to test hypotheses and develop models. To understand how to use R effectively is to understand how to construct experiments and models properly.
I believe that testing hypotheses and developing models is part of data analysis. I understand you to be saying that statistical packages in general are just tools that allow their user to implement what's really important - the application of correct statistical reasoning to correctly constructed data. Users of statistical packages often produce garbage because they see the packages themselves as "black boxes" that magically do the thinking work for you. Statistical packages relieve the user of the time consuming chore of coding and testing known algorithms. This fact is enormously powerful however in the right hands.

I agree with what you said, but have a slight issue with how you said it.
A Boglehead can stay the course longer than the market can stay irrational.

whanaumark
Posts: 46
Joined: Tue Mar 14, 2017 1:15 pm

Re: R and Data Sets

Post by whanaumark » Wed May 03, 2017 9:22 am

I can't recommend python and the pandas package enough.

jalbert
Posts: 3811
Joined: Fri Apr 10, 2015 12:29 am

Re: R and Data Sets

Post by jalbert » Wed May 03, 2017 2:31 pm

DartThrower wrote: I believe that testing hypotheses and developing models is part of data analysis.
....
I agree with what you said, but have a slight issue with how you said it.
I said it in a provocative way to make the point because misuse of statistical tests is so pervasive these days. But I believe you have it backwards. When statistical methods are used, data collection and analysis are part of hypothesis testing and model construction, not the other way around.

Most statistical tests that involve sampling a population or sampling a distribution require committing to hypotheses to be tested, and committing to the tests to be used in the analysis before the data is collected, or at least before it is viewed. Random samples are used because it typically is impractical or impossible to measure the entire population, and non-random samples invalidate many statistical tests. T-tests on portfoliovisualizer are not at all valid for instance because of non-random samples of correlated data points, but knowledge of that doesn't seem to stop people from citing the t-statistic values generated.

What is pervasive today is viewing data sets as the first class object being studied regardless of how collected, analyzing them with statistical tests, and then expecting the results to generalize by magic.
Last edited by jalbert on Wed May 03, 2017 3:50 pm, edited 2 times in total.
Risk is not a guarantor of return.

alex_686
Posts: 3920
Joined: Mon Feb 09, 2015 2:39 pm

Re: R and Data Sets

Post by alex_686 » Wed May 03, 2017 2:55 pm

jalbert wrote:I said it in a provocative way to make the point because misuse of statistical tests is so pervasive these days.
I will side with Jalbert on this. I have seen statistical tests used badly in so many different ways that a clear denunciation of sloppy methods is appreciated.

And now 2 great links.

The first is Five Thirty Eight's great P-Hacking example:
https://fivethirtyeight.com/features/sc ... nt-broken/

Plus insight on jelly beans from XKCD:
https://xkcd.com/882/

User avatar
Abel
Posts: 15
Joined: Sun Aug 07, 2016 7:35 pm

Re: R and Data Sets

Post by Abel » Thu May 04, 2017 5:46 pm

Plus insight on jelly beans from XKCD:
https://xkcd.com/882/
DW and I got huge chuckle out of this one. Thank you!

qwertyjazz
Posts: 1079
Joined: Tue Feb 23, 2016 4:24 am

Re: R and Data Sets

Post by qwertyjazz » Thu May 04, 2017 6:57 pm

+100 Wickham's stuff - it has saved me a lot of hours
Also like R Studio
UCLA has good code to copy http://stats.idre.ucla.edu/other/dae/

More specifics depend on your background - are you really good with SAS or python and want to expand - are you an economist or stats person - etc
G.E. Box "All models are wrong, but some are useful."

Post Reply