I think one of the things I love about attending grad school so much is all the opportunity for collaboration – especially across disciplines. To be surrounded by so many people with passion for these different topics, all working toward discovery and creation – it’s really an amazing experience. One particular interdisciplinary area I’ve fallen in love with is bioinformatics. I’ve had interest in computational biology for a long time, as can be seen with my work with SynthNet – but this has been my first opportunity to work first hand with others in these areas – as well as experts squarely in the biology fields, which has been an extremely helpful learning experience.
Enter Bioinformatics
I’ve found bioinformatics, genomics, proteomics, etc to be especially interesting, as there are such a ridiculous number of inherent parallelisms between what occurs in nature and what we’ve discovered and devised in Computer Science. Obviously the underlying Turing-complete, algorithmic nature of things drives them both, but it’s still awe inspiring to see these processes in genetics happening naturally, and then be able to make predictions using the same rules that one would in CS.
PALADIN: software for rapid functional characterization of metagenomes
One such area in bioinformatics that we’ve been focused on for the last 6 months or so is the problem of identifying genes/proteins from metagenomic read sets. In a metagenomic sample, you have many organisms present, perhaps thousands – all these small pieces of DNA mixed together – and it presents a problem when you want to actually identify what was in there. Or more aptly, in our case, the function of what was in there. I love making the analogy to taking 500 different jigsaw puzzle boxes, opening them up, and dumping them all together. To make things worse, though the puzzles are different, some of them feature a lot of the same themes – flowers, grass, sky, etc. But let’s step it up – you also lose some pieces in the process, some get damaged and misshaped, and there are duplicates of others. Now try reconstructing all 500 puzzles – not so easy!
While there are lots of strategies and ways for computers to “reconstruct these puzzle pieces”, so to speak – many of them are slow and have inherent issues. We attempt to solve these speed and other issues with our new software, PALADIN, that I’ve been lucky enough to be the lead developer of, though it’s a 100% team effort for this kind of project
I won’t go into the full details of the software here, but if you’d like to learn more about it, our team, and details on the upcoming manuscript, you can read more about it on Professor Matt MacManes’ blog post.
And if you’re looking to try it out, visit our Github repository.
Leave a Reply