Introduction to bioinformatics and systems biology

Aims

This course provides an introduction to Bioinformatics and the principles of the new emerging science, Quantitative Systems Biology. It aims at: describing the context that ignited the revolution of the high-throughput biomolecular analysis, which changed the way problems in life sciences are now approached and resolved; defining the fields of bioinformatics and systems biology and discussing the new opportunities, challenges and directions in life sciences; presenting the high-throughput “omic” technological platforms and methodologies for the analysis of gene sequence, gene expression, protein concentration, metabolic flux distribution and metabolite concentration providing an introduction to multivariate statistical analysis used in high-throughput data analysis. presenting examples of integrated “omic” studies of plant physiology.  

Prerequisites

Students should have had at least undergraduate level exposure to Genetics, Molecular Biology, Cell Biology, Biochemistry, Statistics, Knowledge in Linear Algebra and programming is recommended but not required.
  

Learning Outcomes

After attending the class, the students: will have learned the basic principles and aims of bioinformatics and systems biology research; will have understood the need for holistic and systematic analyses of biological systems in the post-genomic era; will have learned the technological platforms for gene expression analysis (DNA microarrays), and mainly for metabolic network analysis, including metabolic flux analysis and metabolomics; will have been taught the basic methods of multivariate statistical analysis that are used for high-throughput biomolecular data analysis ; will have been presented and have used a open-access high-throughput biomolecular data analysis software; will have been presented thoroughly an example of integrated “omic” analysis of plant physiology will have been exposed to recent publications in the molecular systems biology field and provided with resources for future reference.  

Syllabus

    This unit will include a presentation of the history of the high-throughput revolution with all the technological advances that triggered the new mentality in approaching problems in life sciences and new opportunities for research. In this context, the terms and fields of Bioinformatics and Systems Biology will be defined and the need for integrated systemic and systematic analyses will be discussed. In sequence, current analytical and computational challenges in the post-genomic era will be discussed, focusing on (a) metabolic network analysis through metabolic flux analysis and metabolomics, (b) the normalization and multivariate statistical analysis techniques for “omic” data (short description of Transcriptomic and proteomic analyses will be provided, attempting to minimize the overlap with the “microarray analysis” course, and (c) the challenges of integrated “omic” analyses, including the data storage requirements and the need for tools to upgrade the information content of these databases and visualize the integrated results. Finally, a case-study of tomato plant  hydroponic cultures will be presented.    

Content Delivery

    The students will be first exposed to the history of the genomic revolution starting from the discovery of DNA as the genetic material of an organism in mid 40s to the invention of recombinant DNA technology in the 70s that eventually lead to the technological revolution of fast sequencing methods, the development of DNA microarrays, through to the full sequencing of the human genome in 2003. I will discuss the effect that this technological revolution of high-throughput biomolecular analysis had on the way problems in life sciences are now approached that lead to the emergence of the fields of bioinformatics and the new post-genomic “quantitative systems biology” era. Following the chronological order of the technological inventions, the new opportunities, challenges and directions for research in the systems biology era will be briefly discussed at all levels of cellular function (metabolic, proteomic, transcriptional) (1st Day). A powerpoint presentation will be used and further details and explanations will be provided on the board. The teaching will be interactive with the students; I will engage them into conversation about the biology, about the interconnection between the various levels of cellular function, about what they know about the genomic revolution and how they perceive the new opportunities of the emerging systems biology era. It will be through their answers that the new knowledge will be presented in a way that they can easily understand and assimilate it.   On the second day of classes, emphasis will be given to the presentation of transcriptomic analysis using microarrays and NGS. A detailed review of both techniques will be provided, including the challenges that still exist. Students will be provided with the powerpoint presentation slides ahead of the class and the same level of interaction with the students (as described earlier) will be pursued.  On the third day of classes, the students will be presented with the description of proteomic and metabolomic analyses. Emphasis will be given to the need for data normalization after data acquisition to render the datasets comparable. The rest of the day will be dedicated to multivariate statistical analysis, discussing initially the need and the importance of such analysis for the high-throughput biomolecular data handling and cellular physiology study as a whole. The students will be presented with the major clustering, projection/visualization and hypothesis testing algorithms. Students will be provided with the powerpoint presentation slides ahead of the class and the same level of interaction with the students (as described earlier) will be pursued. On the fourth day, the students will get acquainted in a computer room with one of the main publicly available high-throughput data analysis software and use the techniques that they will have learned in the morning for the analysis of a given dataset.    After been presented with all available technological platforms for “omic” analysis at different levels of cellular function, I will initiate a discussion of the need for integrated “omic” analyses, the big opportunities and expectations for them, but also the challenges of the experimental design for such experiments. The value of time-series analyses will also be discussed (how many time-points, how many samples etc). Moreover, another main challenge of this type of analyses is the data storage, the need for integrated databases and the analysis and visualization of the acquired information. The students will participate with the opinions and their own point of view of what needs to be done and how (especially concerning plant research). A case-study on tomato plant omic analysis will be presented too. As most of the technologies discussed during the class are employed in this study and the challenges of the experimental design and data handling were also encountered, the students will have the chance to see a real case scenario of the new opportunities, challenges and directions of quantitative systems biology research.    

Coursework And Assignment Details

    The students will take a written exam on the material of the course based on the presentation and the notes provided to them.