Essential of Python programming for Data Science#
In this part, I aim to introduce the fundamentals of using Python for data science. We’ll cover data structures, basic programming, code testing, and documentation and use libraries like NumPy and Pandas for data exploration and analysis.
Through these chapters, you will first learn how to
translate fundamental programming concepts, such as loops, conditionals, etc.;
understand the elementary data structures, write functions, and assess if they are correct via unit testing;
apply good practice to abstract code (e.g., into functions or classes) to make codes more modular and robust;
produce human-readable code that incorporates best practices of programming, documentation, and coding style;
use NumPy to perform typical data wrangling and computational tasks in Python;
use Pandas to create and manipulate data structures like Series and DataFrames.
In later chapters, we will cover more advanced topics, such as code optimization, Cython, and parallelization.
Why Python?#
Python is a programming language (named after the sketch troupe Monty Python). It is an interpreted high-level programming language for general-purpose programming, contrasting with compiled** programming languages such as C/C++ and Fortran.
For a few reasons, Python is (today) one of the most common languages in astronomy. First, it’s easy to learn and use. Python is (relatively) new, and it removed much of the archaic annoyances of older languages. In programming terms, it is high-level, closer to human language computer language (such as assembly or binary code).
Second, Python is open-source and free, unlike proprietary languages (such as IDL or Matlab), which require you to buy expensive licenses.
Finally, Python is supported by a large community of users collaborating to improve the Python eco-system (e.g., NumPy, Matplolib, Astropy).
Hello World!#
It’s a tradition to begin any instructional text by showing the canonical phrase “Hello World” on the screen.
print("Hello World!")
#include <iostream>
int main(const int argc, const char **argv) {
std::cout << "Hello World!";
return 0;
}
public class HelloWorld{
public static void main(string[] args){
System.out.println(“Hello World!”);
}
PROGRAM HelloWorld
PRINT *, "Hello World!"
END PROGRAM HelloWorld
A programmer can be 5-10 times more productive than others in Python.

Fig. 1 Quantitative study based on 80 programmers implementing the same program that finds, for a given phone number, all possible encodings by words, and prints them. Analysis by Lutz Prechelt.#
no reason to replace the other languages, rather cooperate#
There are reference languages in science, such as C/C++ and Fortran. Those working horses of science support zillions of codes optimized for runtime performance. There are no reasons to replace or translate these codes at this time but rather cooperate or interface them with python (e.g. f2py
, swig
; we’ll touch upon this topic later).
The drawback of these languages is that they are not really general purposes with relatively primitive datatypes. Some require manual memory management (free the pointers!) and often suffer from the slow edit/compile/test cycle curse.
Will Python remain as popular in the future? Should you wait for the following language?#
The only certainty is that Python will live for a long time on any computer, similar to C/C++, Fortran, and Java. Its popularity still grows, especially with data science libraries developed by gigantic companies such as NVIDIA, Google, Facebook, Amazon, and others. Learning Python will always be beneficial for changing to another growing language like Julia or Rust, which builds on Python lessons.
Setting the fundations and requirements for scientific programming#
Researchers and scientific software developers write software daily. But only a few of them have gone through a specific training. We learn by doing, and we often learn from our mistakes. This is not a bad thing, but it is not always the most efficient way to learn. We can also learn a lot from open source software and open science practices.
Through this book I hope to convince you that good programming practices is not difficult to implement and could make a HUGE difference!