The Secret Life of Data Analysis

Once again I’ve gone too long without posting, but I’ve been really busy from TAing as well as working on the topic of my post today ::drumroll please:: data analysis! So readers, I hope my awesome title got you excited for another thrilling post describing my day to day life as a scientist. So if you remember from my earlier post on exciting science and the mundane tasks that go with it, I described what my day to day life was like while I was doing lab experiments. Now that my experiments are finished my day to day is slightly different. So let’s get started! This summer I collected tons and tons of data, which is a really awesome thing, however that means lots and lots of data analysis. So what exactly does data analysis entail, well dear reader let me show you. If you remember from my previous post I said that I was spending so much time with the balance and it became my best friend forever, well move over Mettler Toledo I have a new BFF…

Mettler Toledo my former best friend forever…


Hello to new best friends, both my Apple Display and the MacBook Pro off in the corner!


Look at all the pretty data on the left of the screen with enough space for chatting with colleagues on the right! Yep I’ve been spending 80% of my day organizing my data and fighting with it in my favorite and simultaneously least favorite statistical program R. The other 20% of my day is usually somehow related to teaching or preparing to teach Genetics. It’s been quite a change from the crazy summer in the lab. So here’s how data analysis works!

Saving and Organizing the Data

I originally collected all of my data in Microsoft Excel in many .xlsx spreadsheets and I saved them both on my computer and in the mythical cloud server known as Dropbox. When I eventually graduate and defend my dissertation Dropbox will be getting a huge shout out from me. They allow me to save my data on an internet server that I can access from anywhere on any computer. This gives me the safety of knowing that even if my trusty MacBook Pro dies, my dissertation research won’t go down with it. My only complaint is that I am cheap, and at Dropbox unless you pay for it, you are limited to around 2GB of data. This means that as I gathered more data something in my Dropbox had to go. I love my nephew, he’s a cutie, but his adorable pictures were taking up precious PhD data space so unfortunately his collection of pictures didn’t make the cut.

Someday nephew if/when you become a graduate student, you’ll understand.


Mmmm sweet, sweet data taking up my Dropbox space!


At the time I was just focused on getting all of the data down in (digital) writing and I didn’t keep it as neat or organized as I should have. Sometimes I had more than one version of a data sheet, sometimes I had my own weird code abbreviations for things and it all needed to become organized so that I could make sense of it and see what the heck was happening! I spent a good week working on organizing my data, but now it is beautiful and all ready for statistical analysis.

Actual Data Analysis

Once the data was organized it was time to begin the statistical analysis. I have to admit that I have not always been a fan of statistics, but now I have an appreciation for how awesome it can be. Statistics makes sense out of mounds of numbers and it helps scientists discern differences in their data. As I mentioned earlier, to actually “do” the statistics I use the awesome (and sometimes pain in the butt) program called R. What makes R so great is the incredible amount of things that it can do, it does basic statistics, it makes phylogenetic trees, it makes beautiful graphs and figures, and so much more, also it is free! Overall R is great because of it’s immense power, however it is still a programming language and it has a steep learning curve that I am still slowly climbing up. This comic from PhD comics sums up my feelings on doing statistics in R.

Coding can be frustrating to say the least… “Piled Higher and Deeper” by Jorge Cham

Luckily I have a lot of help from my fantastic advisor who has been working with R for some years now. With his help and the help of others I have been able to work through my analysis and get some pretty cool results! It’s great to finally see your data turn into something that makes sense however it can also be slightly demoralizing to see months of data turn into two measly graphs.



In the end it’s all about being able to complete the story you started, and that is what keeps me going. After staring (as well as yelling/screaming/crying)  at my computer for weeks it is awesome to see the end of the research story come together. At the end of data analysis I often think back to the beginning when I came up with the research idea and I think of how far it has come. Science is exciting because I can take an idea and turn it into scientific data out there for everyone to read (if/when it gets published). I myself can make a contribution to our collective scientific knowledge, and that is a cool feeling.