Preface#
Science is fundamentally about learning from uncertain and incomplete data. We cannot measure exactly everything, and our data may be numerous, high-dimensional, and correlated, with potentially subtle underlying structures.
This book introduces data science methods by analyzing data in astronomy. It presents the fundamental concepts of probability, statistics, and machine learning and explores the computational tools we need to extract meaning from data.
Learning from data inherently involves data modeling, which is the process of finding or defining a model representation of its underlying structure. As a result, fitting and comparing models is one of the cornerstones of data analysis. This process of obtaining meaningful descriptions from data is called inference.
This book is aimed primarily at junior scientists and scientists who want a more substantial base in data science. It is not intended to be a complete data science introduction but a hopefully good reference entry point. It assumes some knowledge of calculus but no specific experience with probability, statistics, or machine learning. In this book, I emphasize the concepts and the understanding of the methods with examples, using analytical and numerical methods. This manuscript is not a mathematical textbook but aims to develop intuition and knowledge of the methods. I will point out complex derivations and formal proofs to literature references.
I hope this book will be helpful to also more experienced scientists with limited exposure and those raising awareness of those techniques by providing an overview of the main concepts and methods. One of my goals is to show how one can convert ideas into applicable codes and solutions, and this is one motivation to use jupyter-book as the format of this book. Throughout these pages, you will find python codes and notebooks that you can run with Binder.