Python: Scripts and Notebooks#
Python’s flexibility, ease of use, and rich libraries make it an ideal language for scientists working with data analysis and scientific computation. This wide collection of libraries and packages for scientific computing includes for instance NumPy, Pandas, Matplotlib, SciPy, and many more. Scientists can import these libraries and use them to perform complex computations and data visualizations.
A typical Python application may parse some command line arguments, read datafile, and call one or multiple libraries to process data and parameters to finally produce output results as new files or plots. From end to end, this corresponds to a sequence of Python instructions, that we do not want to type every time we do this analysis. Instead, we use a script.
A Python script is a sequence of instructions written in the Python programming language that can be executed by a Python interpreter.
Python scripts can be used for various tasks such as automating repetitive tasks, processing data, building applications, and more. They are typically saved in plain text files with the .py
file extension.
To execute a Python script, you need to have a Python interpreter installed on your computer. Once you have the interpreter installed, you can run the script from the command line by typing
python myscript.py
Python scripts can be simple or complex depending on the task at hand. They can include variables, conditional statements, loops, functions, classes, and modules. With the help of libraries and frameworks, Python scripts can be used for a wide range of applications, including web development, scientific computing, data analysis, machine learning, and more.
Typical Python script and how to run it#
To start with the basics, here is a Hello World script in Python:
%%file sample_script.py
#!/usr/bin/env python3
""" Example of python script """
print("hello world!")
which you can run with the Python interpreter
! python3 sample_script.py
hello world!
The first line in this example has a shebang (#!) and tells how your Unix shell should execute this file. This line only exists if one plans to use it as an executable. The second line contains a triple quote string, which corresponds to the documentation of your script. This is the place where you give what the script does, possibly state your name as the author etc. (We’ll see below why this is important).
! chmod +x sample_script.py
! ./sample_script.py
hello world!
The second line of this script calls the print
function with the argument of hello world!
. print
is a built-in Python function that allows users to output messages to the command line.
For a slightly more advanced script, we create a function called greetings
, which uses three arguments.
%%file sample_script.py
def greetings(first_name, last_name, where):
""" Greetings to you """
print(f"Hello, {first_name:s} {last_name:s}! How's the weather today in {where:s}?")
first_name = "Joe"
last_name = "Doe"
where = "Antartica"
greetings(first_name, last_name, where)
! python3 sample_script.py
Hello, Joe Doe! How's the weather today in Antartica?
the __name__ == __main__
condition#
Going for more complex is relatively straightforward, but you may not want to have very long files with thousands of lines of code. Using multiple files is simple in Python thanks to the import
mechanism, which allows you to link almost any script to use in another script. For example, one script can contain functions to handle reading and writing the data from/to a remote database, while others deal with the complex training of your neural network model. The most natural way to separate these two is to regroup all I/O functionality in one file, and the training operations in another. The training file may import the I/O file through import ...
or from ... import A, B, C
.
import sample_script
Hello, Joe Doe! How's the weather today in Antartica?
It is important to realize that import
makes Python run the script. This is why we have the output of the print line. But Python can make a difference between the two behaviors by checking the name of the executing environment, i.e.:
if __name__ == '__main__':
...
Note
By convention, any variable starting with _
(e.g., _variablename
) means that the variable is special. Variables have two prefix and suffix underscores in their name, these are even more special. These are called
dunder, for “Double Under (Underscores)”.
The block content below this test is executed only if we are running the script, in contrast with an import
statement. __main__
corresponds to the top-level script environment.
%%file sample_script.py
def greetings(first_name, last_name, where):
""" Greetings to you """
print(f"Hello, {first_name:s} {last_name:s}! How's the weather today in {where:s}?")
first_name = "Joe"
last_name = "Doe"
where = "Antartica"
if __name__ == '__main__':
greetings(first_name, last_name, where)
! python3 sample_script.py
Hello, Joe Doe! How's the weather today in Antartica?
But now the following code does not print anything but note that anything outside the __main__
block is available
import sample_script
print("Variables:", sample_script.first_name, sample_script.last_name, sample_script.where)
Variables: Joe Doe Antartica
import sample_script
help(sample_script)
Help on module sample_script:
NAME
sample_script
FUNCTIONS
greetings(first_name, last_name, where)
Greetings to you
DATA
first_name = 'Joe'
last_name = 'Doe'
where = 'Antartica'
FILE
/home/runner/work/astro_ds/astro_ds/astro_ds/chapters/python/sample_script.py
import sample_script
help(sample_script.greetings)
Help on function greetings in module sample_script:
greetings(first_name, last_name, where)
Greetings to you
Using command line arguments#
The previous script is very static. The variables are set. However, Python can use arguments passed in from the command line for the script through the sys
module and the sys.argv
variable.
%%file sample_script.py
#!/usr/bin/env python3
import sys
if __name__ == '__main__':
print("You passed {0:,d} arguments to the script:\n\t {1:s}".format(len(sys.argv), str(sys.argv)))
! python sample_script.py Lorem ipsum dolor sit amet
You passed 6 arguments to the script:
['sample_script.py', 'Lorem', 'ipsum', 'dolor', 'sit', 'amet']
! chmod +u sample_script.py
! ./sample_script.py "Lorem ipsum dolor sit amet"
You passed 2 arguments to the script:
['./sample_script.py', 'Lorem ipsum dolor sit amet']
Regardless of how you call your script, the first argument is the script name itself (following Unix script conventions).
Exercise#
Below is an asteroid.csv
file containing information on 50 of some near-earth asteroids (taken from the Small-Body Database). Given this list, write a script that finds all asteroids with semi-major axis (a - 1
) within 0.2 AU of earth, and with eccentricities (e
) less than 0.5. And print the list in alphabetical order and by semi-major axis to the Earth.
%%file asteroids.csv
name,a,e,orbclass
Eros,1.457916888347732,0.2226769029627053,AMO
Albert,2.629584157344544,0.551788195302116,AMO
Alinda,2.477642943521562,0.5675993715753302,AMO
Ganymed,2.662242764279804,0.5339300994578989,AMO
Amor,1.918987277620309,0.4354863345648127,AMO
Icarus,1.077941311539208,0.826950446001521,APO
Betulia,2.196489260519891,0.4876246891992282,AMO
Geographos,1.245477192797457,0.3355407124897842,APO
Ivar,1.862724540418448,0.3968541470639658,AMO
Toro,1.367247622946547,0.4358829575017499,APO
Apollo,1.470694262588244,0.5598306817483757,APO
Antinous,2.258479598510079,0.6070051516585434,APO
Daedalus,1.460912865705988,0.6144629118218898,APO
Cerberus,1.079965807367047,0.4668134997419173,APO
Sisyphus,1.893726635847921,0.5383319204425762,APO
Quetzalcoatl,2.544270656955212,0.5704591861565643,AMO
Boreas,2.271958775354725,0.4499332278634067,AMO
Cuyo,2.150453953345012,0.5041719257675564,AMO
Anteros,1.430262719980132,0.2558054402785934,AMO
Tezcatlipoca,1.709753263222791,0.3647772103513082,AMO
Midas,1.775954494579457,0.6503697243919138,APO
Baboquivari,2.646202507670927,0.5295611095751231,AMO
Anza,2.26415089613359,0.5371603112900858,AMO
Aten,0.9668828078092987,0.1827831025175614,ATE
Bacchus,1.078135348117527,0.3495569270441645,APO
Ra-Shalom,0.8320425524852308,0.4364726062545577,ATE
Adonis,1.874315684524321,0.763949321566,APO
Tantalus,1.289997492877751,0.2990853014998932,APO
Aristaeus,1.599511990737142,0.5030618532252225,APO
Oljato,2.172056090036035,0.7125729402616418,APO
Pele,2.291471988746353,0.5115484924883255,AMO
Hephaistos,2.159619960333728,0.8374146846143349,APO
Orthos,2.404988778495748,0.6569133796135244,APO
Hathor,0.8442121506103012,0.4498204013480316,ATE
Beltrovata,2.104690977122337,0.413731105995413,AMO
Seneca,2.516402574514213,0.5708728441169761,AMO
Krok,2.152545170235639,0.4478259793515817,AMO
Eger,1.404478323548423,0.3542971360331806,APO
Florence,1.768227407864309,0.4227761019048867,AMO
Nefertiti,1.574493139339916,0.283902719273878,AMO
Phaethon,1.271195939723604,0.8898716672181355,APO
Ul,2.102493486378346,0.3951143067760007,AMO
Seleucus,2.033331705805067,0.4559159977082651,AMO
McAuliffe,1.878722427225527,0.3691521497610656,AMO
Syrinx,2.469752836845105,0.7441934504192601,APO
Orpheus,1.209727780883745,0.3229034563257626,APO
Khufu,0.989473784873371,0.468479627898914,ATE
Verenia,2.093231870619781,0.4865133359612604,AMO
Don Quixote,4.221712367193639,0.7130894892477316,AMO
Mera,1.644476057737928,0.3201425983025733,AMO
Show a solution
%%file exercise.py
import csv
def is_selected(row):
""" semi-major axis (`a - 1`) within 0.2 AU of earth, and with eccentricities (`e`) less than 0.5 """
_, a, e, _ = row
return abs(float(a) - 1) < 0.2 and float(e) < 0.5
def get_selected_data(incsv, select_function):
""" extract selected asteroids """
data = []
with open(fname) as incsv:
reader = csv.reader(incsv)
next(reader) # skip header
for row in reader:
if select_function(row):
data.append(row)
return data
def row2str(row):
""" Nicely print a row """
return "{0:>12s} {1:>0.5f} {2:>0.5f} {3:s}".format(
row[0], float(row[1]), float(row[2]), row[3])
if __name__ == '__main__':
fname = 'asteroids.csv'
data = get_selected_data(fname, is_selected)
print("Sorted by name")
for obj in sorted(data, key=lambda row: row[0]):
print(row2str(obj))
print('') # empty line
print("Sorted by semi-major axis")
for obj in sorted(data, key=lambda row: abs(float(row[1]) - 1)):
print(row2str(obj))
! python exercise.py
Sorted by name
Aten 0.96688 0.18278 ATE
Bacchus 1.07814 0.34956 APO
Cerberus 1.07997 0.46681 APO
Hathor 0.84421 0.44982 ATE
Khufu 0.98947 0.46848 ATE
Ra-Shalom 0.83204 0.43647 ATE
Sorted by semi-major axis
Khufu 0.98947 0.46848 ATE
Aten 0.96688 0.18278 ATE
Bacchus 1.07814 0.34956 APO
Cerberus 1.07997 0.46681 APO
Hathor 0.84421 0.44982 ATE
Ra-Shalom 0.83204 0.43647 ATE
# Clean up
! rm -f exercise.py asteroids.csv