Bioinformatics, also named Computational Biology, is an
interdisciplinary area at the crosspoint of computer science, maths,
and biology. Bioinformatics aims formalizing problems that originate
in molecular or evolutionary biology, in biotechnology, or in
medecine, and proposes computable, algorithmic solutions for these
problems.
The genome encodes all the inherited information (genes) needed to
build an organism from the initial egg (feconded cell). Genes codes
for proteins, which adopt specific 3D structure and fulfill one or
more biological function (activity). The central dogma of biology
means that the information flows from the genome, to the transcripts
and then to proteins.
Among typical questions issued from these disciplines are:
Can one infer the function, the activity of a gene or of a
protein from what is known on other genes or proteins that similar
in sequence to the one investigated?
This has led to the problems of comparing two sequences (sequence
comparison), or, for an input sequence, of searching all most
similar sequences in a sequence databank. These are string algorithm
problems [6].
When several genes or proteins have similar function, which,
apriori common, parts of their sequences is responsible for this
function?
The comparison or alignment of multiple sequences
[2], as well as sequence motif inference
[1], are two problems that formalize this biological
question. These also are algorithmic problems on strings, as well as
inference problems in machine learning.
Can one infer the three dimensional structure of a protein from
its sequence or from a set of similar sequences? From the Central
Dogma, the information from the sequence is sufficient to the cell
to fold the amino-acid chain that constitutes the protein.
This gave rise the inference problem of structure prediction.
A biotechnological question arose in DNA sequencing, especially
in genome sequencing, knowing that the basic technique can read up
to 800 bp long sequences. How can one recover a large DNA sequence
from a set of small (< 800) DNA pieces?
This problem is known as sequence assembly and was successfully
solved for sequencing of the fly or of the human genomes.
An paradigmic evolutionary question is: how can one recover the
evolutionary relationships of a set of species from the sequences of
the genes they bear?
This is the problem of phylogeny reconstruction, for which many
algorithmic and mathematical approaches have been proposed [5].
Aside sequence comparison, molecular structure prediction, phylogeny
reconstruction, other cornerstone problems are: sequence structural
analysis (e.g., finding repeats in the sequence), gene expression
analysis, gene regulation or metabolic network inference and
simulation, drug design, etc (see [3, 4]
for a recent survey).
Once solutions have been developped for a problem, bioinformatics is
concerned with applying this method to real case and/or simulated data
to check the quality of its answers, its applicability in practice,
its relevance to the biological question that gave rise to the
problem, in order to criticize the formalization of the problem.
Biologists and bioinformaticians collaborate to the application of
bioinformatics methods to a specific instance of the problem, in order
to gain new insights of this instance, to infer biological knowledge.
On the other hand, questions issued from the realm of biology and
medicine have given rise to many new problems, which forced computer
scientists and mathematicians to develop new concepts in their field
or to investigate new aspects of already established concepts.
Computer science areas like string algorithms or formal languages
witness this fact.
Sometimes solving a single bioinformatic problem may require research
in mathematics, combinatorics, probability, statistical inference,
machine learning, algorithmics, theoretical computer science, etc.
Together with the diversity of applications this may explain why
bioinformatics is broad (badly delimited) interdisciplinary area of
research.
Thomas Lengaeur, editor.
Bioinformatics - From Genome to Drugs, volume I: Basic
Technologies of Methods and Principles in Medicinal Chemistry.
Wiley-VCH Verlag, Weinheim, 2002.
Thomas Lengaeur, editor.
Bioinformatics - From Genome to Drugs, volume II:
Applications of Methods and Principles in Medicinal Chemistry.
Wiley-VCH Verlag, Weinheim, 2002.
David Sankoff and Joseph B. Kruskal, editors.
Time Warps, String Edits and Macromolecules: the Theory and
Practice of Sequence Comparison.
CSLI Publications, second edition, 1999.