Give me a Platform: Genomics in the Cloud


Reading Steve Yegge’s rant recently about Google’s failure to truly understand platforms made me realize I should probably be thinking more about platforms. It’s kind of crazy how influential these things have become. Amazon’s prize for figuring this out early is that they now host the majority of all applications around the world. As a tepid Amazon Prime user, I tend to forget what the company is really up to. But this is huge. Gertner estimated in 2015 that Amazon Web Services (AWS) now houses more than a million users every month (“including 2,000 government agencies, 5,000 education institutions, and more than 17,500 nonprofits.” Notably, NASA and the CIA are among the mix.)

So, what is a platform anyway? Conceptually, an ideal, open, platform is one that begins as a simple system, or the most essential infrastructure, that is then allowed to evolve through outside contributions and participation. So, applications (like FarmVille) sit on top of platforms (like Facebook.) But the layers go deeper: applications themselves can be platforms when hosting software like Excel macros. Cloud computing, browsers, and hardware are also all examples of platforms.

And why do platforms matter? Because they make creation and adaptation cheaper, easier, and probably more secure. The first two are pretty self-explanatory: if you can use the server farm of AWS rather than building your own, or use Google maps’ API instead of starting from scratch, it means you can use your unique resources to build something new. The security point is more contentious, but the logic goes something like this: in our software-eaten, Internet of Things world, the weakest link may provide access to a whole lot more than itself. Rather than relying on every cash-strapped start-up to reinvent the security wheel every time, we can more reasonably expect Amazon (making $100 billion in revenue this year) to make darn sure its getting security right.

However, the implications of platforms on power dynamics are enormous. Amazon now holds the keys to an incredible amount of sensitive information. Their policies prevent them from accessing or using customer content outside of their agreements, unless required to by law. But anything Amazon decides to do with its platform has consequences all the way up. And how Amazon’s developers decide to organize their options and tools will nudge its users into certain kinds of creation and use.

Within AWS is one service in particular that I have been fascinated by: Genomics in the Cloud. This tool (and others that are similar) have changed the way that genomics is done. The two biggest (initial) hurdles to large-scale genome-wide association studies are storing / processing petabytes of genomic data, and having access to large enough datasets to find meaningful results. Using centralized cloud-storage solves both of these issues. Which is a huge boon for biomedical research, and a potentially enormous value-creator. Teams from all over the world can use these systems, and have resources, tools, and opportunities for collaboration at their fingertips.

For example, a lab at UC Berkeley (my lovely alma mater) called the Algorithms, Machine, and People (AMP) Lab has been able to scale its work on genomics and cancer research by using the cloud computing of AWS combined with machine learning to process genomic data from many machines simultaneously and find novel associations.


When you combine the technological feasibility and ease of massive bioinformatic sharing with political and cultural will – see, for example, the Global Alliance for Genomics & Health – the sharing is caring mentality seems to become a foregone conclusion. However, questions around agency and jurisdiction should not go uncontested. Most people who consent to donate their genomic information do so within a particular institutional context, and may tend to believe that their consent is limited to use by that institution. But this is changing rapidly. You could find your DNA was taken out of the particular cancer research you were interested in and used to help create pharmaceutical drugs, or was compared to criminal DNA databases to link you to a crime. Data of all kinds tells stories about us, but the clues hidden within our genomic data are likely more sensitive than most.

When the majority of the world’s applications rely upon your platform, you kind of rule the world. A commitment to democracy surely means we should be wary of any single platform that flies too close to the sun.


