Most of us have heard statistics like the following: ”you are 10 times more likely to develop lung cancer from smoking cigarettes on a daily basis” or “you are 4.8 times less likely to suffer a heart attack if you perform strenuous exercise 3 or more times a week”.

I’ve been working with survey data lately in order to derive exactly those types of results. Let’s say, for example, I’m trying to model the likelihood of a yes answer to whether a person listens to heavy metal music based on their gender. For me to perform a linear regression on gender and yes/no response I may receive very strange results. For example, when there are only two distinct choices, your x/y plot can have only two outcomes, either a 1 for yes or a 0 for no. This makes for a strange looking graph. Luckily for us someone created logistic regression.

Read the rest of this entry »

Share/Save/Bookmark

Tags: , , ,


1308 Visual 9
Desperate for information on R when first starting out, I searched the internet for any resource I could find.  Fortunately most of the resources were free, unfortunately, very few were helpful to a novice user.  Instead I turned to finding a book that provided comprehensive coverage of R and it’s hundreds of add-on modules.  I immediately stumbled across The R Book by Michael J Crawley.  Undismayed by the high price ($110 new), I placed my order.
Now that my copy of The R Book is sufficiently creased and worn, I feel comfortable enough to write my review.
Where The R Book makes up in coverage it lacks in depth.  Do not expect to find detailed explanations on advanced statistics.  For example, the chapter on Mulivariate Statistics is 15 pages long and contains no more than a paragraph on neural networks.  Many of the explanations of functions seem to skip major strokes and are not always complete.
Now that my criticisms are out of the way, I’ll move on to the pro’s.  The R Book is, as I mentioned above, the most thoroughly comprehensive source for R on the market.  A serious R user should maintain a copy, at least as a reference book.  For novices, the book does contain explanations and guidance for statistical models using R.  Crawley’s coverage of statistical models and classical tests are adequate.  For example, he takes the time to explain appropriate model types by type of response variable (i.e.-proportional, count, etc).  He even includes a fairly brief but adequate explanation of Logistic Regression.  Expect a more thorough explanation of classical statistical methods in the first three-quarters of the book and a more brief explanation of more advanced methods at the end.
To the novice user, Crawley’s explanations may seem overwhelmingly complex and rife with statistical jargon; afterall, R is used primarily by statisticians.  However, any resourceful student or professional with a basic understanding of statistics will find enough resources on the web in order to aid their reading.  Additionally, Crawley provides adequate coverage of R’s graphics capabilities.  However, if you are or plan to be a sophisticated R graphics user, I would recommend purchasing a book that covers graphics specifically.
 
The R Book is recommended reading for novice to expert users.  The former as a both a reference and a learning tool and the latter as a reference book.  The $110 is not a stretch for those of us used to paying high prices for textbooks, afterall The R Book contains nearly 900 pages.
 
Good luck and happy reading!
 
 If you agree/disagree with my write-up, please feel free to comment or email me at ryan@rstatx.com.

Share/Save/Bookmark

Tags: , , , ,

For those of you familiar with R you may find the codes below immediately helpful, however, if you are unfamiliar, I explain below how and why to use them.

 

  1. rt<-function(x) read.delim(paste(“C:\\Desktop\\R\\”,x,”.txt”,sep=”"))
    1. data<-rt(“filename”)
  2. write.table(data,”C:\\Desktop\\table.txt”,col.names=F,row.names=F)
  3. write.table(data,”clipboard”,sep=”\t”,col.names=NA)
  4. data<-read.table(file.choose(),header=T)

Please see Ben Bolker’s comments below on setting the working directory instead of setting a working path in a function as outlined above in 1 and 2. 

Read the rest of this entry »

Share/Save/Bookmark

Tags: , , , , ,

A colleague of mine reminded me of the importance of the P-value last week and I thought it would make a great blog topic. 

The P-value is one of the most common occuring metrics seen in almost any type of analysis where any level of statistics are used.  Often times, on the job, an analyst (novice to expert) will incorporate a simple regression using MS Excel into their analysis.  The analyst, happy to have added a certain level of sophistication to their work finds and promotes a relationship between two variables; let’s say animal control enforcement versus monthly cat food sales.  The analyst is happy to have made such a find, the marketing manager is so happy she begins to divert advertising dollars to increase local lobbying for animal control enforcement.  Sooner or later, the local pound is full of stray cats, cat food sales are down, the marketing manager is running for city council, and the analyst is out of a job.

So what happened?

Read the rest of this entry »

Share/Save/Bookmark

Tags: , , ,

Why R?

R is a statistical computing and graphics software popular among academics.  However it’s popularity is increasing in the world of business among other places.  This is no doubt due to the fact that R is completely free!

I began using R for this very reason.  As a quasi-statistician with a limited budget and an even limited amount of time to produce results, R was the perfect fit.  It was a challenge at first, as R did not, at the time, have an adequate graphical user interface and there were limited resources on the web suitable for novice users (there still isn’t).  In the absence of resources I purchased a book and got to work learning R.  In the first few months I quickly learned the good and the bad.  So here they are:

Read the rest of this entry »

Share/Save/Bookmark

Tags: , , , , , , ,