Proteomics, Data and the Fight Against Cancer

Cancer detection and prevention

In recent years there have been rapid advances in the world of cancer treatment and disease prevention. But there is no simple way to deal with cancer yet. A patient may have to undergo a medley of different treatments, such as chemotherapy, radiation and surgery. These processes take a toll on the patient, both physically and mentally.

Many researchers and scientists have devoted their lives to the war on cancer. To get a sense of how large our collective investment in fighting this war is, an average of $4.9 billion is spent on research alone every year in the United States. Still, every year, cancer accounts for 9 million deaths.

But today (and this probably isn’t a surprise to our readers), data is leading the charge in the fight against cancer — through genomic and proteomic research.

What is the hope raised by this research? That eventually, doctors all over the world will be able to catch cancerous cells early and can prevent them from spreading.

Genomics and proteomics: A New Hope

The development of genomics and proteomics offers cancer researchers a new hope because it offers a path to attack the root causes behind cancer, in contrast to traditional treatments that are reactive to later-stage cancers.

To explain: all the DNA contained in your cells makes up your genome. Genomics is the study of the sequences of the four nucleotide bases of a DNA strand — adenine, cytosine, guanine, thymine represented by the letters A, C, G, and T — and how each sequence of letters passes information to help each cell in your body work properly. When looking at the data from a cancer patient, researchers conduct a genomic analysis to identify certain genomic features such as DNA sequence, structural variation, gene expression, or regulatory and functional element annotation (genomic scale).

Genomics enables scientists and doctors to look at the genetic make-up of a patient, but there’s a downside — it is very difficult and time consuming, and until recently, the process rarely produces tangible or actionable research results.

The Human Genome Project determined that the human genome contains more than 30,000 unique genes which give rise to as many as 100,000 distinct protein products. The abundance of proteins in the body enables researchers to view the human body on a much more complex level.

Proteomics is the study of these proteins in the body; it is the use of quantitative protein-level measurements of gene expression to characterize biological processes (e.g., disease processes and drug effects), and to decipher the mechanisms of gene expression control.

The human body contains the mechanisms to fight cancer. When a patient has cancer, it means their body isn’t working properly using these mechanisms, and cannot fight the cancerous cells as it potentially could. Proteins are the first molecules to react to changes in the body. Because they are influenced by both genetic and environmental factors, proteins are a valuable source of data to help scientists understand a patient’s actual health status, not just risk or disposition. Through proteomics, in theory, a doctor can facilitate these mechanisms to operate in the right way to fight cancer.

Proteomic researchers are still working to understand the full potential capabilities of treatment based on the research they are conducting. They’ve had to develop numerous processes to isolate, view and analyze a variety of data, to help them realize the prospective paths to treatment — for example, experimenting with patient data overlap (overlapping patient data to find irregularities), multiple ever more advanced fractionation methods to isolate proteins, and mass spectrometry processes to determine the mass and charge of proteins.

Using protein data to catch disease at the earliest stages

All of these techniques are used by laboratories to identify protein characteristics to help clinicians differentiate between the healthy and disease states of the patient. Experiments carried out in a real laboratory need to be complemented by virtual data analytics experiments done on the computer. Tactics range from deploying software packages that analyze the electrophoretic separation of the DNA proteins, to the use of protein bioinformatics. The ultimate goal of the field of proteomics is to catalog proteins and to understand their role in biology and pathology so this data may be applied to early diagnosis and to optimizing treatments.

One company involved in this important field is Applied Proteomics Inc. API says its aim is to lead in the fight against cancer and other diseases by stopping them at the source. Using proteomics, their goal is to catch diseases at the earliest stages of growth — not only stopping the disease in its tracks, but tracking it every step of the way.

Measuring and analyzing proteins has challenged researchers for decades. API built Linus, the first platform to reproducibly measure an unprecedented amount of proteins simultaneously. These measurements and analysis require the manipulation of vast amounts of data, and the Linus platform is the most efficient system for protein biomarker discovery and test delivery to date. API seeks to stop advanced disease before it starts, to alleviate perhaps the greatest burden on our healthcare systems and, most importantly, to save lives.

Explosion in the amount, variety and importance of data must be managed

As a result of these advances in research techniques, the scientific community and healthcare industry, like all industries, are facing an explosion in the amount, variety and importance of data created — a new Data Age. (And in turn, the Data Age will continue to enable ever-greater advances.)

The quantity of information generated from the necessary analyses of just a single cancer patient adds up to about one terabyte of data. With millions of cancer patients around the world, the demands on data infrastructure to capture, store, manage, analyze and rapidly access and share a constantly increasing reservoir of critical information keep increasing. Without the most advanced data infrastructure to process the information, none of this research would be possible.

This sheer mass of data poses other new challenges to the analyst. Not all data is equally important, and it’s useful only when context is understood and when analytic tools are available to help researchers identify and focus on the critical subsets of data that result in the highest impact. Science and healthcare professionals are now employing high-performance computing systems that can manipulate massive amounts of data, enabling genomics and proteomics researchers to innovate faster and to move closer to the end goal: to treat, and even cure, cancer.

Cancer treatment breakthrough in the foreseeable future

With further advances in supercomputing and data infrastructure, as machine learning rises, the capabilities of data analysis will increase tremendously. In the foreseeable future cancer will be caught before it is given a chance to spread, treatment will be less painful, patient medical costs will decline and most importantly, our loved ones will be better prepared to fight disease.

There are 11 million new cancer cases per year and cancer accounts for 13 percent of global mortality. Cancer mortality does not arise from a lack of available remedies per se, but rather from the diagnosis of such conditions at stages that are too late for remedies to be effective. Prevention, early detection and early intervention are the primary aims of oncologists and cancer biologists. If genes are viewed as the master controllers of cellular behavior, proteins are the effectors, and as such, protein expression and activity must comprise the molecular basis of health or disease.