Wednesday, 15 May 2013

2013-05-15 Meeting day, and new regression equation


Today morning at 8:30am, there should have been an individual meeting for me with Dr. S. He had mentioned in an email to me, he was busy with a phone call and was not able to come to the department on time. But I decided to wait for him with a hope that he might come. After 45 minutes of waiting while reading for the book I will mention in the following paragraph, I decided to return to my office.

I was reading a small book called "Statistics: A Very Short Introduction" by David J. Hand.
Its preface, saying that "The modern discipline (statistics) is all about the use of advanced software tools to aid perception and provide ways to shed light, routes to understanding, instruments for monitoring and guiding, and systems to assist decision-making",  is very encouraging for students to learn and use statistics.

The author says, "the misconception of statistics lies with those who do not understand what the numbers are saying, or who wilfully misuse the results. We do not blame a gun for murdering someone: rather it is the person firing the gun who is blamed." Personally, I think that is very true.

A news article from domain.b.com reported a finding from a journal article in Nature Chemistry "The hydrodeoxygenation of bioderived furans into alkanes", saying that the furans generated from hemi-cellulose and cellulose of non-food biomass can be converted to medium-chain alkanes, which are essentially the gasoline that we use in our cars!! One of my colleagues, Dr. Kumar is working on optimizing the enzymatic  conversion of cellulose to glucose for the purpose of ethanol production. He had published a number of journal articles related to that. He might be interested to know about this new process.
In the remaining of the day, I continue working on my linear regression with the help from David. He told me an important point about linear regression: "If you cannot explain the interaction terms, then leave them. They will ask you what do you mean by the product of temperature and moisture content. Stick with only the main effects (storage time X1,  moisture content X2, temperature X3, storage configuration X4 and pellet type X5)." Really, what does the product of temperature and moisture content tell you....?

So, without the interaction, I only have this not-so-sophisticated equation, with a lower R-squared value than before:
  But I'm happy with it because it is way easier to explain, for example, +0.01163X2 means that the ratio of NCVi/NCVf increases as moisture content X2 increases. To validate my regression modal, a cross validation with R function cv.glm was performed. The average residual that I got is 0.009, which is bigger than I expected. I expected somewhere around 0.005 or less.
Next step is to perform probit regression on the same data to see whether the method helps.

No comments:

Post a Comment