What is Bioinformatics?

What is Bioinformatics?

Tuesday, March 16, 2004

My favorite books in Bioinformatics would include:
1. Bioinformatics: Sequence and Genome Analysis by David W. Mount

This is extremely comprehensive, and targeted to an advanced undergraduate/graduate course. Of all the books I have looked at, it does the best job of surveying the literature for all the approaches that are used, with excellent discussions of the algorithms and statistics for the most popular approaches. It is a great textbook to have for your libraryand it has good foundations for the various approaches. It cannot be used as a lab manual where you can practice the stuff. If you are interested in knowing and understanding the concepts this book is for you.

2. Genome by T. A. Brown

I really love this book because it has a good introduction on the subject and it has good combination of the history behind the experiment and also the trends in genomics. I feel Dr. Brown had presented this topic very well in simple words and I think people who do not have background in bioinformatics will also be able to follow the book.

3. Bioinformatics: A Practical guide to the analysis of Genes by Baxevanis and Ouellette

This is the strongest book that focuses on the informatics aspects of Bioinformatics, data respresentation issues as well as data analysis methods. I think it works as a text book, but it is more targeted towards bioinformatics practitioners. Its scope is much more narrow and its coverage of the computational approaches is shallower. Together, the Mount and Bauxevanis/Ouellette book provide outstanding coverage of sequence data representation, datasets, and analysis methods with the #1 book.

4. Durbin et al, Biological Sequence Analysis

The approach is more mathematical for most life sciences graduate students. I think it would be a great book for a second course in Computational Biology.

4. Introduction to Bioinformatics by Arthur M. Lesk

Good introduction.

5. Baldi and Brunak - Bioinformatics, the machine learning approach

This provides a Computer Science/Machine learning perspective but is out of date. Not of much helpful for biologists.

6. Robin J. Wilson -Introduction to Graph Theory

A great introduction to Graph Theory... clear definitions and examples, great figures, useful exercises, and even some clever quotes. Everything you could ask for - if only all texts were this clear and well-organized. This was my first foray into the topic, and Wilson's text made it enjoyable.

7. Fundamentals of Neural Networks by Laurene V. Fausett

The author explains with details each network model. It also has a detailed structured language of each algorithm which makes it easy to code it. With the step to step numeric examples, the written program may be checked to accuracy.

8. Fundamentals of Molecular Evolution by Dan Graur, Wen-Hsiung Li

This is a very complex, in-depth, informative book on molecular and genetic evolution. It has good explanations of genetic drift, mutation rates, times to fixation, patterns in evolutionary changes. Lots of statistical information on how allele frequencies change. It is written for a more knowledgable audience with a good understanding of evolution and genetics. Gives informative understanding of trends in evolution beyond natural selection. Supports neutral theory of evolution quite strongly.

All said and done, can you tell me the difference between proteomics and genomics?

The term 'proteomics' indicates PROTEINS expressed by a GENOME and is the systematic analysis of protein profiles of tissues . The term "proteome" refers to all proteins produced by a species, much as the genome is the entire set of genes. Unlike the genome, the proteome varies with time and is defined as "the proteins present in one sample (tissue, organism, cell culture) at a certain point in time". Proteomics parallels the related field of genomics. Now that the human genome has been sequenced, we face the greater challenge of making use of this information for improving healthcare and discovering new drugs. There is an increasing interest in proteomics technologies now because deoxyribonucleic acid (DNA) sequence information provides only a static snapshot of the various ways in which the cell might use its proteins whereas the life of the cell is a dynamic process. With this background, DNA/RNA (ribonucleic acid) sequences, parse, are not just enough for the clear identification of a therapeutic target because proteins and not DNA/RNA are the basis of mode of action of drugs. Structural genomics is the determination of structure of proteins, RNA and other biological macromolecules. Functional genomics is an ambitious attempt at high-throughput basic research through the integration of multiple automated technologies including RNA profiling, proteomics, genetics of animal models, assays, structural biology and bioinformatics. Parallel to these developments, there is an interest in functional proteomics.

Oh Wow..How about Proteomics application in the areas of drug discovery, pharmaceuticals? Can you give me some information on that?

Proteomics will also play an important role for drug discovery and development (Müllner et al 1998).

Proteomics is the link between genes, proteins and disease. Many of the best-selling drugs either act by targeting proteins or are proteins. In addition, many molecular markers of disease, the basis of diagnostics, are proteins. Patterns of protein expression can be used as a guide to drug design. Application of proteomics to study underlying pharmaceutical mechanisms and use these for drug development is referred to as pharmaceutical proteomics. Unlike classical genomic approaches that discover genes related to a disease, proteomics could characterize the disease process directly by finding sets of proteins (pathways or clusters) that together participate in causing it. The same technology is used to study the effects of candidate drugs intended to reverse a disease process.

Alright Karthik, You made it such a big deal about Proteomics..Can you tell me in a nutshell the applications of Proteomics? Thanks man.

Proteomics will contribute greatly to our understanding of gene function in the post-genomic era. Differential display proteomics for comparison of protein levels has potential application in a wide range of diseases. Because it is often difficult to predict the function of a protein based on homology to other proteins or even their three-dimensional structure, determination of components of a protein complex or of a cellular structure is central in functional analysis. This aspect of proteomic studies is perhaps the area of greatest promise (Pandey and Mann 2000). After the revolution in molecular biology exemplified by the ease of cloning by DNA methods, proteomics will add to our understanding of the biochemistry of proteins, processes and pathways for years to come.

I realize the term"Proteomics" very new..Is it totally a new field or old field with a new name?

Good question.."Proteomics" is a new field but the methods used are old. In my opinion, it is the same book with a different cover. However I also have to mention about there are rapid developments which make this field up-to-date and thats why everybody talks about it so much.Some scientists do not like the term proteomics and continue to use terms describing various technologies for proteins such as protein separation, etc. However, there is a distinction to be made between the molecular function of an isolated protein and the function of that protein in the complex cellular environment as studied by proteomic technologies. Proteomics attempts to catalog and characterize these proteins, compare variations in their expression levels in health and disease, study their interactions, and identify their functional roles. I also would like to mention that proteomics is not the study of individual proteins as has been done traditionally, but rather in an automated, large-scale manner which requires new technologies and considerable effort is currently being devoted to the development of novel tools.

What is Proteomics?

Proteomics represents the genome at work and is a dynamic process. Proteomics can be divided into expression proteomics, the study of global changes in protein expression, and cell-map proteomics, the systematic study of protein-protein interactions through the isolation of protein complexes (Blackstock and Weir 1999). Proteins expressed by an organism change during growth, disease, and the death of cells and tissues. Modifications of proteins that occur during and after their synthesis, such as the attachment of sugar residues or lipids, change the proteome complement. The minimum proteome size can be calculated from the size and 2-D polyacrylamide gel electrophoresis (2-D PAGE) separated proteins. Proteomics is based on leading edge technological capability for undertaking the mass screening of proteins and their post-translational modifications in whole organisms as well as in their tissues in normal and diseased states.

What are different steps in Proteome research?

There are three main steps in proteome research. They are listed as follows:
1. Separation of individual proteins by 2-D polyacrylamide gel electrophoresis (2-D PAGE).
2. Identification by mass spectrometry or N-terminal sequencing of individual proteins recovered from the gel.
3. Storage, manipulation, and comparison of the data using bioinformatics.

OK....I think I am getting what you are saying.. Tell me more about cancer, gene therapy and evolution of new medicines...

The cells in our bodies are exposed to dangerous substances - some natural, some man-made - capable of causing genetic mutations in our DNA. Radiation from space and from the sun, chemical solvents and tobacco smoke are just a few. And sometimes, mistakes in the genetic code can happen for no other reason than a momentary glitch in the cell machinery. But have no fear, our cells have methods at work to combat these mistakes, and are often able to fix them before they can do any harm. But not always. Over a lifetime, these left-over mistakes can build up. If a single cell gets enough of them and if they are located within a few very sensitive genes, it may begin to grow out of control, and stop performing the function it was supposed to perform. A cell like this represents the beginning stages of cancer. Typically, when a person is diagnosed with cancer, this single cell has grown into a mass of perhaps millions of cells - all of which contain the genetic mutations of the original. They soon learn to move to other areas of the body - to metastasize - and cause a tumour in a totally different organ. Modern cancer therapy aims to rid the body of every single one of these mutated cancer cells, no matter where they lie, since a single remaining cell may well cause the cancer to recur at a later time. Radiation treatment is usually used as a "local therapy", that is directed at killing cells within the tumor site itself. In a similar vein, surgery is often used to remove the tumor itself. Chemotherapy aims to kill those cells which have wandered from the original tumor site, whether residing in the bloodstream or lodged within another organ. And while there are numerous cancers where these methods of treatment are highly successful, there is still hope that we can do a better job. A number of new gene therapies directed towards cancer are now being developed. At the British Columbia Medical School, scientists have pioneered Canada’s first gene therapy against cancer, for treatment of men with prostate cancer. Gene-based treatments like this one offer a renewed sense of optimism that one day, we will indeed conquer cancer.

But the impact of genetic science does not end at cancer treatment. Indeed, one of the most exciting possibilities lies in the prevention and detection of cancer. Once we know what the mutations which lead to cancer are, all we have to do is find out if a person has them. For instance, the recently discovered BRCA1 and BRCA2 genes have been found to increase a woman’s risk of developing breast cancer. Patients who have a family history of breast cancer may now request genetic screening to see if they have inherited these genes and are susceptible to the disease. In those positive cases, patients are referred to a genetic counselor for more information on helping them deal with this information. In such cases, it may be possible to increase the surveillance of these patients, in order to detect the cancer earlier, at a stage where available treatments are able to effect a cure. In the history of mankind, never before have we learned so much, in such a short space of time. The Human Genome Project, dedicated to creating a complete map of all human genes, is symbolic of our ever-widening quest for knowledge. It is a testament to human achievement that will stand throughout history as one of the greatest scientific endeavors ever undertaken. With a united front of gene mappers, gene sequencing specialists, technical staff and medical professionals, mission is to uncover the genetic mysteries at the heart of diseases like cancer. Understanding these methodologies can be used to create new diagnostic tools and improved treatments.

The next episode along the lines is Proteomics. Will talk about it later.

Ok...Karthik...tell me this..how do you find mistakes in the gene sequence?

Well..what you do is after the gene sequence is known, the road to a better world seems simple. Find out who has mistakes in their gene sequence, and then correct the mistake! But in most cases, we don’t know how the gene works or what kind of protein it makes. Most genes control a number of functions within the human body. And more often than not, how one gene is expressed (turned "on" to make proteins) can have a direct effect on how other genes function. To make matters even worse, some mistakes in the code may be disastrous, and some may have little or even no consequences. It’s a complicated system but thats what makes it more interesting!.

Scientists studying functional genomics are the ones trying to solve these riddles. Starting with the genetic sequence, they have, at their disposal, a number of methods to determine what a gene does within the cell. Sometimes, they will construct a replica of the gene and insert it into a different type of cell to see what it does - how it changes the cell’s appearance, if it changes any other genes and how it affects the cell’s growth. They will make or find a copy of the protein that the gene makes and learn about that protein’s role in the cell machinery. And sometimes they will alter the sequence of the gene on the chromosomes themselves, to see the effect it has on the cell. In the end, scientists can find how each single gene functions in the body, and learn more about how mistakes in this gene can lead to the symptoms of disease.

Hmm..interesting...can you tell me more about the sequencing of the genome?

The scientists constructing the genetic maps of each of our chromosomes pass on their information to another group of specialists, who perform "gene sequence analysis". When you hear on the news that the gene for a specific disease has been found, it is more than likely that it is a group of these people who has announced their discovery. Gene sequencing gets right to the heart of the genetic code, deciphering the exact sequence of lettered bases which compose a gene. As we mentioned earlier, there can be millions of these bases in a gene and so in the past, this process has taken a number of years to find the code for each gene.

But over the past 30 years, scientists have made some remarkable advances in gene sequencing technology. The methods are the same as they were in the beginning, but now it is possible to determine genetic sequences using machines. These automated sequencers allow scientists to learn more about our genes, faster than ever before. And now, because of these advances in technology, scientists around the world are working towards a day when every gene, on every chromosome has been sequenced. This is the mission of the Human Genome Project, perhaps the most ambitious scientific undertaking throughout history, I would say...

Alright..that makes sense..How do you locate genes?

By gene mapping...Of course, if we’re to find these mistakes in our genes, we must first find which genes control what, and even before that, we must know where they are. But where to start? You see, not every piece of DNA in the human body encodes a gene. Some of it is so-called "junk DNA" - sequences of DNA that are disposable and do not encode for proteins - which is scattered throughout the genome, blending in with the important gene-encoding DNA. The first task of scientists then, was to find the positions of the genes which are used to make proteins, and to ignore all the junk pieces.

In 1971, when scientists devised a method to cut large pieces of DNA on each chromosome in to smaller, more manageable pieces the job got a lot easier. Within each of these smaller pieces, scientists were finally able to locate the regions containing genes. As the position of more and more genes were found, a "genetic map" was constructed which showed the positions of the genes relative to each other, and relative to the ends and center of the chromosomes. The science of locating these genes is called "Genetic Mapping" and although we now know the location of a number of very important genes, the map is far from complete. Scientists around the world are busy working to fill in the holes, to produce a complete map of the human genome.

OK, what is the deal with human genome...What is it...So many people talk about it...Tell me more.

The human body contains about 100 trillion cells. Most of these cells contain a nucleus (red blood cells do not have nuclei). Each nucleus has 23 pairs of chromosomes, formed during conception when 23 individual chromosomes from each parent come together to form the new offspring. DNA (deoxyribonucleic acid), the "molecule of life" is the chemical which makes up each of these chromosomes. Our chromosomes are made up of thousands of genes, which determine who we are - they define the traits and characteristics passed down to us from our parents. The term "human genome" refers to all of the genes found in the human body. There are estimated to be between 50,000 and 100,000 human genes, but today, only a few thousand have been identified. They are difficult to locate because they are found hidden in the packages of 23 pairs of chromosomes. Inside the cell nucleus, genes are used in cells to make proteins and it is these that define our individual characteristics. They are the instructions which lead to brown hair, or blonde, blue eyes or green, and in some cases, whether we are susceptible to diseases such as cancer, and diabetes.

What is the relationship between Biochemistry and Bioinformatics?

BIOCHEMISTRY forms the basic for protein structure & Genomic structure. This deals with chemical reactions and formulae of DNA and RNA. Biochemistry helps Bioinformatics to have a depth knowledge of protein structure so it assists to arrive at a chemical formulae in producing drugs for harmful diseases and also to understand the chemical reaction going inside the nuclear side of human beings.

Biochemistry importance in bioinformatics can be dealt in two ways viz. genomics and bioinformatics.

What is Biochemistry then?

Biochemistry is the study of the chemistry of living organisms. It includes identification of cellular molecules and their formation and degradation in cells, the production of energy by cells, and the reproduction of cells. Biochemistry seeks a molecular explanation for life, or the processes that support life. Biochemists investigate all aspects of cellular chemistry which is the foundation of the understanding of all of the biological sciences.

Many biochemists study directly or use the protein molecules called enzymes. Enzymes are the extremely efficient catalysts of cells. They allow the conversions of the chemicals of life to occur within cells, without the side-reactions that would occur in non-enzymatic systems. Some enzymes also act as control-points, to allow the chemical reactions to occur only when they are needed. In some instances vitamins are converted to organic chemicals called coenzymes that assist certain enzymes in their catalytic roles. Nucleic acids are information molecules that direct the formation of the proteins in the cells. Deoxyribonucleic acid (DNA) is the blueprint for reproduction or copying of all cellular elements. The isolation of DNA from one type of cell and its insertion into a different type of cell (recombinant DNA technology) can now be used to prepare large quantities of the gene product. For example, human insulin can be synthesized by easily- grown bacteria containing the gene that encodes insulin.

Carbohydrates or sugars are not just sources of energy, but can be recognition molecules on the surfaces of cells. Lipids or fats are essential elements in the structure of the membranes that surround cells and intracellular organelles, and in intracellular signalling. A major objective in Biochemistry is to describe these functions in exact biochemical terms.

What experimental/scientific background should I have?

1.Cell Biology
2.Physical Chemistry

What computational background should I have to get in?

* Introduction to C
* Introduction to Probability and/or Statistics
* Introduction to Algorithms
* Knowledge of PERL

Why do we need bioinformatics?

Over the last 50 years, the amount of publicly available genetic data has grown exponentially. Genome research, such as the Human Genome Project, has resulted in vast amounts of sequencing data that will not result in useful products until the data are properly interpreted. Also, the number of available protein structures is increasing exponentially with the increase of high-throughput structural biology projects. The latest microarray technologies make it possible to assay rates of gene transcription and translation of thousands of genes in a single experiment. To approach and integrate these datasets to understand biological processes holistically, advanced computational and analytical methods are needed.

What can I do with a Bioinformatics degree?

People can work at the interface between biochemistry, computer science, math and statistics, creating new solutions for high-throughput chemistry, designing analysis systems for drug design, and many other things.

Okay....Okay..What kind of research do these 'so called Bioinformaticists" do?

They will do one of the following three things:
1. Tool building.
They will be creating new programs and methods for analyzing and organizing data.
2. Usage of tools
Using existing programs and data to answer biologically interesting questions
3. Maintenance of tools
Setting up databases, translating biologists' questions into ones that programs can answer, keeping the tools working and the databases up to date.

What is Bioinformatics anyway?

Bioinformatics is the use of computers and statistics to make sense out of the huge mounds of data that are accumulating from high-throughput biological and chemical experiments, such as sequencing of whole genomes, DNA microarray chips, two-hybrid experiments, and tandem mass spectrometry.

Hi there, my objective of this blog is to answer the frequently asked questions about Bioinformatics. I would be updating this page from time to time. So do check this page periodically.