For people doing ecological research, the trend of increasing demand for bioinformatic skills is obvious. I’ll never forget seeing the picture below for the first time when my current adviser gave a talk at N.C. State University and finished with this image to describe how future ocean microbiome research may be conducted. There will be all these autonomous devices taking abundant measurements and these “ecogenomic sensors” to send genetic data in near real-time back to the lab.
Oceanographers seem to like to show off pictures of field work in far off beautiful places to get people excited about their research and possibly encourage young-minds to study it. While field work does occasionally happen, what they forget to mention is that your three weeks in the field is followed by months (more likely years) of time in front of computer analyzing the data you collected and writing. With devices like the ones portrayed above, a trend of less field work and more computational time could easily become reality. This isn’t a particularly futuristic idea either; Monteray Bay Aquarium Research Institute (MBARI) is already building devices called Environmental Sample Processors to do molecular and chemical tests automatically in the ocean.
If this all sounds good to you, then great! I’ve noticed most people have learned the biology first, and then tried to pick up the computational skills later. If you fall into that category, I’ve tried coming up with a list of things I think are important to start looking into and learning. If you are in high school or undergraduate, definitely also consider taking at least an introductory computer science class.
1. Get comfortable using Linux
I won’t get into all the details of what exactly Linux is and isn’t. The main idea is that Linux comes in many different distributions (distro for short), and these are operating systems for your computer (like Windows or Mac OS). One of the advantages to Linux (besides being totally free) is it’s powerful command line. Being able to use the command line is without a doubt, 100 percent necessary, and the only way to get comfortable with it is just to use it. The best thing you can probably do is take an old computer and install Linux on it (Ubuntu is popular for beginners, but I’m more partial to Xubuntu which just uses a different desktop environment called Xfce). Try to see if you can essentially do everything you normally do besides browsing the Internet on the command line (navigating to different directories, editing files, etc.). While you are it, try using vim which is a popular text editor.
2. Learn some basic scripting
I basically only use Python, R, and Bash (shell script) these days. They are all very good to know and are completely free. Once you have Linux set up, try installing Python and R and doing some tutorials. Come up with little projects for yourself and ask try to write a program that can accomplish your desired task. Again, you will learn so much more just by trying to figure out what it is you want to do compared to reading a book that describes a bunch of functions. Most people that are writing code only have learned some basics and are just really efficient at googling and skimming documentation to find what they need. Also, don’t spend your time trying reinvent the wheel. The odds of you running into a problem that someone else already has is high. There is a good chance someone has already posted a solution somewhere on the Internet.
3. Use version control
This is a must and something most people often overlook. Every good biologist keeps a detailed lab notebook and backs up all of their data. Fortunately, version control software can take care of both for your code. Start playing around with git (another tool you can install in Linux) for all of the code you’re writing. Also, set up accounts and try using Bitbucket and GitHub. These are free services that allow you to back up your code, keep track of changes, and easily share it with the world.
4. Be familiar with basic bioinformatic tools
Fortunately, there is a pretty good Coursera course for this: Bioinformatic Methods 1. This pretty basic class will introduce you to programs like BLAST and provide hands-on examples that walk you through looking up sequences in databases, doing alignments, and phylogenetic trees. It is completely self-paced and availabile any time.
Looking for more?
Great! Here is an excellent article in PLOS Computational Biology: A Quick Guide for Developing Effective Bioinformatics Programming Skills
Pingback: World Ocean Atlas Maps in Python | UNder the C