Python: Scripts and Notebooks#

Python’s flexibility, ease of use, and rich libraries make it an ideal language for scientists working with data analysis and scientific computation. This wide collection of libraries and packages for scientific computing includes for instance NumPy, Pandas, Matplotlib, SciPy, and many more. Scientists can import these libraries and use them to perform complex computations and data visualizations.

A typical Python application may parse some command line arguments, read datafile, and call one or multiple libraries to process data and parameters to finally produce output results as new files or plots. From end to end, this corresponds to a sequence of Python instructions, that we do not want to type every time we do this analysis. Instead, we use a script.

A Python script is a sequence of instructions written in the Python programming language that can be executed by a Python interpreter. Python scripts can be used for various tasks such as automating repetitive tasks, processing data, building applications, and more. They are typically saved in plain text files with the .py file extension.

To execute a Python script, you need to have a Python interpreter installed on your computer. Once you have the interpreter installed, you can run the script from the command line by typing

python myscript.py

Python scripts can be simple or complex depending on the task at hand. They can include variables, conditional statements, loops, functions, classes, and modules. With the help of libraries and frameworks, Python scripts can be used for a wide range of applications, including web development, scientific computing, data analysis, machine learning, and more.

Typical Python script and how to run it#

To start with the basics, here is a Hello World script in Python:

%%file sample_script.py
#!/usr/bin/env python3
""" Example of python script """

print("hello world!")

which you can run with the Python interpreter

! python3 sample_script.py
hello world!

The first line in this example has a shebang (#!) and tells how your Unix shell should execute this file. This line only exists if one plans to use it as an executable. The second line contains a triple quote string, which corresponds to the documentation of your script. This is the place where you give what the script does, possibly state your name as the author etc. (We’ll see below why this is important).

! chmod +x sample_script.py
! ./sample_script.py
hello world!

The second line of this script calls the print function with the argument of hello world!. print is a built-in Python function that allows users to output messages to the command line.

For a slightly more advanced script, we create a function called greetings, which uses three arguments.

%%file sample_script.py

def greetings(first_name, last_name, where):
  """ Greetings to you """
  print(f"Hello, {first_name:s} {last_name:s}! How's the weather today in {where:s}?")

first_name = "Joe"
last_name = "Doe"
where = "Antartica"

greetings(first_name, last_name, where)
! python3 sample_script.py
Hello, Joe Doe! How's the weather today in Antartica?

the __name__ == __main__ condition#

Going for more complex is relatively straightforward, but you may not want to have very long files with thousands of lines of code. Using multiple files is simple in Python thanks to the import mechanism, which allows you to link almost any script to use in another script. For example, one script can contain functions to handle reading and writing the data from/to a remote database, while others deal with the complex training of your neural network model. The most natural way to separate these two is to regroup all I/O functionality in one file, and the training operations in another. The training file may import the I/O file through import ... or from ... import A, B, C.

import sample_script
Hello, Joe Doe! How's the weather today in Antartica?

It is important to realize that import makes Python run the script. This is why we have the output of the print line. But Python can make a difference between the two behaviors by checking the name of the executing environment, i.e.:

if __name__ == '__main__':
  ...

Note

By convention, any variable starting with _ (e.g., _variablename) means that the variable is special. Variables have two prefix and suffix underscores in their name, these are even more special. These are called dunder, for “Double Under (Underscores)”.

The block content below this test is executed only if we are running the script, in contrast with an import statement. __main__ corresponds to the top-level script environment.

%%file sample_script.py

def greetings(first_name, last_name, where):
  """ Greetings to you """
  print(f"Hello, {first_name:s} {last_name:s}! How's the weather today in {where:s}?")

first_name = "Joe"
last_name = "Doe"
where = "Antartica"

if __name__ == '__main__':
  greetings(first_name, last_name, where)
! python3 sample_script.py
Hello, Joe Doe! How's the weather today in Antartica?

But now the following code does not print anything but note that anything outside the __main__ block is available

import sample_script

print("Variables:", sample_script.first_name, sample_script.last_name, sample_script.where)
Variables: Joe Doe Antartica
import sample_script
help(sample_script)
Help on module sample_script:

NAME
    sample_script

FUNCTIONS
    greetings(first_name, last_name, where)
        Greetings to you

DATA
    first_name = 'Joe'
    last_name = 'Doe'
    where = 'Antartica'

FILE
    /home/runner/work/astro_ds/astro_ds/astro_ds/chapters/python/sample_script.py
import sample_script
help(sample_script.greetings)
Help on function greetings in module sample_script:

greetings(first_name, last_name, where)
    Greetings to you

Using command line arguments#

The previous script is very static. The variables are set. However, Python can use arguments passed in from the command line for the script through the sys module and the sys.argv variable.

%%file sample_script.py
#!/usr/bin/env python3
import sys

if __name__ == '__main__':
  print("You passed {0:,d} arguments to the script:\n\t {1:s}".format(len(sys.argv), str(sys.argv)))
! python sample_script.py Lorem ipsum dolor sit amet
You passed 6 arguments to the script:
	 ['sample_script.py', 'Lorem', 'ipsum', 'dolor', 'sit', 'amet']
! chmod +u sample_script.py
! ./sample_script.py "Lorem ipsum dolor sit amet"
You passed 2 arguments to the script:
	 ['./sample_script.py', 'Lorem ipsum dolor sit amet']

Regardless of how you call your script, the first argument is the script name itself (following Unix script conventions).

Exercise#

Below is an asteroid.csv file containing information on 50 of some near-earth asteroids (taken from the Small-Body Database). Given this list, write a script that finds all asteroids with semi-major axis (a - 1) within 0.2 AU of earth, and with eccentricities (e) less than 0.5. And print the list in alphabetical order and by semi-major axis to the Earth.

%%file asteroids.csv
name,a,e,orbclass
Eros,1.457916888347732,0.2226769029627053,AMO
Albert,2.629584157344544,0.551788195302116,AMO
Alinda,2.477642943521562,0.5675993715753302,AMO
Ganymed,2.662242764279804,0.5339300994578989,AMO
Amor,1.918987277620309,0.4354863345648127,AMO
Icarus,1.077941311539208,0.826950446001521,APO
Betulia,2.196489260519891,0.4876246891992282,AMO
Geographos,1.245477192797457,0.3355407124897842,APO
Ivar,1.862724540418448,0.3968541470639658,AMO
Toro,1.367247622946547,0.4358829575017499,APO
Apollo,1.470694262588244,0.5598306817483757,APO
Antinous,2.258479598510079,0.6070051516585434,APO
Daedalus,1.460912865705988,0.6144629118218898,APO
Cerberus,1.079965807367047,0.4668134997419173,APO
Sisyphus,1.893726635847921,0.5383319204425762,APO
Quetzalcoatl,2.544270656955212,0.5704591861565643,AMO
Boreas,2.271958775354725,0.4499332278634067,AMO
Cuyo,2.150453953345012,0.5041719257675564,AMO
Anteros,1.430262719980132,0.2558054402785934,AMO
Tezcatlipoca,1.709753263222791,0.3647772103513082,AMO
Midas,1.775954494579457,0.6503697243919138,APO
Baboquivari,2.646202507670927,0.5295611095751231,AMO
Anza,2.26415089613359,0.5371603112900858,AMO
Aten,0.9668828078092987,0.1827831025175614,ATE
Bacchus,1.078135348117527,0.3495569270441645,APO
Ra-Shalom,0.8320425524852308,0.4364726062545577,ATE
Adonis,1.874315684524321,0.763949321566,APO
Tantalus,1.289997492877751,0.2990853014998932,APO
Aristaeus,1.599511990737142,0.5030618532252225,APO
Oljato,2.172056090036035,0.7125729402616418,APO
Pele,2.291471988746353,0.5115484924883255,AMO
Hephaistos,2.159619960333728,0.8374146846143349,APO
Orthos,2.404988778495748,0.6569133796135244,APO
Hathor,0.8442121506103012,0.4498204013480316,ATE
Beltrovata,2.104690977122337,0.413731105995413,AMO
Seneca,2.516402574514213,0.5708728441169761,AMO
Krok,2.152545170235639,0.4478259793515817,AMO
Eger,1.404478323548423,0.3542971360331806,APO
Florence,1.768227407864309,0.4227761019048867,AMO
Nefertiti,1.574493139339916,0.283902719273878,AMO
Phaethon,1.271195939723604,0.8898716672181355,APO
Ul,2.102493486378346,0.3951143067760007,AMO
Seleucus,2.033331705805067,0.4559159977082651,AMO
McAuliffe,1.878722427225527,0.3691521497610656,AMO
Syrinx,2.469752836845105,0.7441934504192601,APO
Orpheus,1.209727780883745,0.3229034563257626,APO
Khufu,0.989473784873371,0.468479627898914,ATE
Verenia,2.093231870619781,0.4865133359612604,AMO
Don Quixote,4.221712367193639,0.7130894892477316,AMO
Mera,1.644476057737928,0.3201425983025733,AMO
Hide solution
%%file exercise.py
import csv

def is_selected(row):
  """ semi-major axis (`a - 1`) within 0.2 AU of earth, and with eccentricities (`e`) less than 0.5 """
  _, a, e, _ = row
  return abs(float(a) - 1) < 0.2 and float(e) < 0.5


def get_selected_data(incsv, select_function):
  """ extract selected asteroids """
  data = []
  with open(fname) as incsv:
    reader = csv.reader(incsv)
    next(reader) # skip header
    for row in reader:
      if select_function(row):
        data.append(row)
  return data


def row2str(row):
  """ Nicely print a row """
  return "{0:>12s}  {1:>0.5f}  {2:>0.5f}  {3:s}".format(
    row[0], float(row[1]), float(row[2]), row[3])


if __name__ == '__main__':
  fname = 'asteroids.csv'
  data = get_selected_data(fname, is_selected)
  print("Sorted by name")
  for obj in sorted(data, key=lambda row: row[0]):
    print(row2str(obj))
  print('')  # empty line
  print("Sorted by semi-major axis")
  for obj in sorted(data, key=lambda row: abs(float(row[1]) - 1)):
    print(row2str(obj))
! python exercise.py
Sorted by name
        Aten  0.96688  0.18278  ATE
     Bacchus  1.07814  0.34956  APO
    Cerberus  1.07997  0.46681  APO
      Hathor  0.84421  0.44982  ATE
       Khufu  0.98947  0.46848  ATE
   Ra-Shalom  0.83204  0.43647  ATE

Sorted by semi-major axis
       Khufu  0.98947  0.46848  ATE
        Aten  0.96688  0.18278  ATE
     Bacchus  1.07814  0.34956  APO
    Cerberus  1.07997  0.46681  APO
      Hathor  0.84421  0.44982  ATE
   Ra-Shalom  0.83204  0.43647  ATE
# Clean up
! rm -f exercise.py asteroids.csv