Python 03 | How to Use Control Flow to Achieve Batch Processing and Automation of Geoscience Data?

Introduction

In the previous session, we discussed Python’s most basic variables and some data types. They form the fundamental cells of our program design.

So, how can we chain together the processing methods for individual problems so that by modifying only part of the content, we can obtain results corresponding to new inputs, thereby avoiding manual, repetitive changes to variables and code?

Two Examples

A more relatable example: suppose we need to process a year’s worth of temperature data. By reading the NetCDF file (.nc) for the corresponding year, we can calculate the average temperature for that year.

Let’s say our current time series uses temperature data from a certain GCM in CMIP6 under the SSP2-4.5 pathway, plus its corresponding Historical reconstruction data. That means our current time series spans the years 1850-2100. Moreover, the data isn’t stored by combining many years into one nc file, but—unbelievably—with one nc file per year (yes, we’re looking at you, EC family models).

If we only had 20 years, perhaps we could grit our teeth and manually change the nc file path 20 times, running the reading code repeatedly to extract averages and combine them into a series.

However, for a project with 151 years, this is clearly too laborious. We need to seek a more efficient solution.

This is where the power of control flow comes in. Assuming our nc filenames differ only by the year, we can use a loop over the years to achieve batch extraction and processing.

Taking it a step further, suppose we are now extracting precipitation data. Monthly data often requires multiplying by the number of days in the corresponding month. Particularly for February, we need to determine if it’s a common year or a leap year to accurately calculate the total monthly precipitation.

We can add a conditional check within our loop to determine if it’s a leap year and get the corresponding number of days for February.

These are two very common application scenarios for control flow, but they are sufficient to glimpse the important role control flow plays in program design.

If the various variables and data types mentioned earlier are individual cells, then control flow is the nervous system and circulatory system spread throughout the body.

Only by using control flow to connect its various components can it become a true organic whole.

Conditional Statements

As shown in the previous example, oftentimes we cannot process all situations in the same way. Therefore, we need to define conditions to differentiate between cases and perform different operations. Hence, we have conditional statements, common in Python.

In Python, we can use if to set a condition. If the condition is met, the subsequent operations will be executed; if not, the content under the if statement will be skipped.

However, many times we have many different situations requiring different handling methods. Therefore, we can use multiple elif statements at the same logical level as the if statement to perform conditional checks.

Furthermore, if we are certain that all other cases besides the ones we specify should be handled identically, we can use the else statement.

python

month = 3

if month in [1, 3, 5, 7, 8, 10, 12]:
    print("31 days")
elif month in [4, 6, 9, 11]:
    print("30 days")
else:
    print("28 or 29 days")

# Output: 31 days

Here, we use a conditional statement to determine and output the number of days in a given month. By providing lists grouping months with the same number of days, and using in to check if the month belongs to a list, we can output the corresponding number of days.

Note a fundamental aspect of Python formatting here—it’s part of Python’s syntax and distinguishes it from other programming languages.

We mentioned earlier that “if, elif, and else are at the same logical level.” This implies there are many contents not at their level, such as the subordinate content contained within if statements.

In other programming languages like C, we use symbols like {} to enclose that content, clarifying the containment relationship of code blocks.

In Python, we use : to indicate the start of a code block and use different levels of indentation to express different logical hierarchies.

In the code above, print("31 days") follows if month in [1, 3, 5, 7, 8, 10, 12]: and is indented, meaning it is subordinate to the if statement.

elif month in [4, 6, 9, 11]: is indented to the same position as the if statement, indicating they are at the same logical level.

When programming in Python, special attention must be paid to indentation to avoid bugs caused by hierarchical confusion.

Understanding this allows us to perform “Russian doll”-style nesting of conditional statements:

python

year, month = 2100, 2

if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
    if month == 2:
        print("February has 29 days.")
    elif month in [4, 6, 9, 11]:
        print("April, June, September, and November have 30 days.")
    else:
        print("All other months have 31 days.")
else:
    if month == 2:
        print("February has 28 days.")
    elif month in [4, 6, 9, 11]:
        print("April, June, September, and November have 30 days.")
    else:
        print("All other months have 31 days.")

# Output: February has 28 days.

Here, we first check if the year is a leap year, then determine the number of days in February. Thus, we find that since 2100 is a common year, its February has 28 days.

Loop Statements

Above, we handled a specific problem by determining which case it belonged to and performing the corresponding operation.

But if we deal with multiple problems that are largely similar in their processing, we can use loops to achieve repeated calculations by modifying corresponding parts within the loop body.

This avoids rewriting code with the same functionality, improving code reusability.

for Loop

First, the most common type of loop is when we know exactly what the input content to loop over is, and the number of iterations matches the number of input variables.

We can use a for loop to achieve the above:

python

temperature = [-10, -5, 0, 5, 10, 15, 20, 20, 15, 10, 5, 0]

for i in range(len(temperature)):
    print(f'{i + 1:02d}月的平均气温为: {temperature[i] + 273.15}K')

print('\n')

for i in temperature:
    print(f'{temperature.index(i) + 1:02d}月的平均气温为: {i + 273.15}K')

# Output:
# 01月的平均气温为: 263.15K
# 02月的平均气温为: 268.15K
# 03月的平均气温为: 273.15K
# 04月的平均气温为: 278.15K
# 05月的平均气温为: 283.15K
# 06月的平均气温为: 288.15K
# 07月的平均气温为: 293.15K
# 08月的平均气温为: 293.15K
# 09月的平均气温为: 288.15K
# 10月的平均气温为: 283.15K
# 11月的平均气温为: 278.15K
# 12月的平均气温为: 273.15K
#
# 01月的平均气温为: 263.15K
# 02月的平均气温为: 268.15K
# 03月的平均气温为: 273.15K
# 04月的平均气温为: 278.15K
# 05月的平均气温为: 283.15K
# 06月的平均气温为: 288.15K
# 07月的平均气温为: 293.15K
# 07月的平均气温为: 293.15K
# 06月的平均气温为: 288.15K
# 05月的平均气温为: 283.15K
# 04月的平均气温为: 278.15K
# 03月的平均气温为: 273.15K

We introduced two writing styles above; their output is consistent. The former is similar to many programming languages: calculate the variable for looping, then index the variable within the loop body.

The latter is one of Python’s distinctive features. It can directly iterate over iterable objects like lists, automatically taking elements from the object as the loop variable.

Similarly, dictionaries, tuples, even arrays we learned before, and DataFrames in Pandas (which we’ll learn next), can all be directly iterated over.

Of course, for situations needing both the index and the iterated variable, Python provides a built-in function:

python

precipitation = [10, 20, 30, 40, 50, 100, 250, 300, 50, 40, 30, 20]

for i, pr in enumerate(precipitation):
    print(f'{i + 1:02d}the precipitation is: {pr:3d} mm')

# Output:
# 01the precipitation is:  10 mm
# 02the precipitation is:  20 mm
# 03the precipitation is:  30 mm
# 04the precipitation is:  40 mm
# 05the precipitation is:  50 mm
# 06the precipitation is: 100 mm
# 07the precipitation is: 250 mm
# 08the precipitation is: 300 mm
# 09the precipitation is:  50 mm
# 10the precipitation is:  40 mm
# 11the precipitation is:  30 mm
# 12the precipitation is:  20 mm

Additionally, for multiple variables that need to be looped simultaneously, besides using range to index elements at the same position, you can also use the zip function to parse multiple sequences at once.

python

temperature = [-10, -5, 0, 5, 10, 15, 20, 20, 15, 10, 5, 0]
precipitation = [10, 20, 30, 40, 50, 100, 250, 300, 50, 40, 30, 20]

# Traditional method
for i in range(len(temperature)):
    print(f'{i + 1:02d}the average temperature is: {temperature[i]}°C, the precipitation is: {precipitation[i]} mm')

print('\n')

# Using zip function to parse multiple sequences simultaneously
for tmp, pr in zip(temperature, precipitation):
    print(f'{temperature.index(tmp) + 1:02d}the average temperature: {tmp}°C, the precipitation is: {pr} mm')

# Output:
# 01the average temperature is: -10°C, the precipitation is: 10 mm
# 02the average temperature is: -5°C, the precipitation is: 20 mm
# 03the average temperature is: 0°C, the precipitation is: 30 mm
# 04the average temperature is: 5°C, the precipitation is: 40 mm
# 05the average temperature is: 10°C, the precipitation is: 50 mm
# 06the average temperature is: 15°C, the precipitation is: 100 mm
# 07the average temperature is: 20°C, the precipitation is: 250 mm
# 08the average temperature is: 20°C, the precipitation is: 300 mm
# 09the average temperature is: 15°C, the precipitation is: 50 mm
# 10the average temperature is: 10°C, the precipitation is: 40 mm
# 11the average temperature is: 5°C, the precipitation is: 30 mm
# 12the average temperature is: 0°C, the precipitation is: 20 mm
#
# 01the average temperature is: -10°C, the precipitation is: 10 mm
# 02the average temperature is: -5°C, the precipitation is: 20 mm
# 03the average temperature is: 0°C, the precipitation is: 30 mm
# 04the average temperature is: 5°C, the precipitation is: 40 mm
# 05the average temperature is: 10°C, the precipitation is: 50 mm
# 06the average temperature is: 15°C, the precipitation is: 100 mm
# 07the average temperature is: 20°C, the precipitation is: 250 mm
# 07the average temperature is: 20°C, the precipitation is: 300 mm
# 06the average temperature is: 15°C, the precipitation is: 50 mm
# 05the average temperature is: 10°C, the precipitation is: 40 mm
# 04the average temperature is: 5°C, the precipitation is: 30 mm
# 03the average temperature is: 0°C, the precipitation is: 20 mm

while Loop

Besides the above looping methods, sometimes we face situations where we need to repeatedly iterate a piece of code until certain conditions are met (e.g., calibrating a parameter until the model error is less than a tolerance). In such cases, the while loop is needed.

Suppose we know the surface temperature and set the temperature lapse rate to 0.65 °C/km. When we want to find the height of the 0°C line, we can consider using a while loop:

python

surface_temperature = 25.5                   # Surface temperature

tolerance = 0.1                              # Set iteration temperature tolerance
height = 0                                   # Initial altitude

while tolerance > 0.01:
    temperature_height = surface_temperature - height * 0.0065              # Calculate temperature at corresponding height
    tolerance = abs(temperature_height - 0)                                 # Calculate error between current altitude temperature and 0°C
    height += 1                                                             # Increase altitude by 1 meter

print(height - 1)
print(25.5 - 3922 * 0.0065)

# Output:
# 3922
# 0.0070000000000014495

Here we can see, when the sea-level temperature is 25.5°C with a lapse rate of 0.65°C/km, the 0°C line is approximately at 3922 meters altitude. This matches our verification result closely, with an error of only 0.007°C, less than the tolerance of 0.1°C.

Of course, this is a very simple example that could easily be implemented without a while loop. It serves merely as an introductory case. The use of while loops has broader application scenarios, such as Newton’s method, the bisection method for implementing various numerical analysis algorithms, and model parameter calibration we are familiar with, where while loops are an extremely important method and need to be applied flexibly according to the specific scenario.

Keywords in Control Flow

Within conditional and loop statements, besides the regular execution of programs, there are some special keywords that help us control the program’s execution flow. Mainly: pass, break, continue:

pass

This statement performs no operation; it’s used only as a placeholder to maintain the program’s structural integrity and logic. For example, when collaborating on code with others, we can use the pass statement to reserve a position for a module belonging to another person and use comments to describe the module’s content.

break

The break statement is used to terminate the current loop and exit the loop body. For instance, if we discover during a loop that certain conditions are met, we can use break to exit the loop and avoid an infinite loop.

continue

The continue statement is used to skip the current iteration of the loop and start the next one. For example, if we find during a loop that certain conditions are met, we can use continue to skip the current iteration and proceed to the next.

python

# pass statement used only as a placeholder
if 894656 > 56323:
    pass                            # This branch is correct, but the pass statement won't perform any operation
else:
    print("894656 is not greater than 56323")

# Using break to exit a loop early
for i in range(10):
    if i == 5:
        break

print(i, '\n')

# Using continue to skip the rest of the current iteration
for i in range(5):
    if i == 3:
        continue
    print(i)                    # Output for 3 will be skipped

# Output:
# 5
#
# 0
# 1
# 2
# 4

『Case』Nested Control Flow

Below, we can try using the content of control flow above to implement a simple nested structure:

python

import numpy as np

# Create a temperature grid array
temperature = np.array([[20, 25, 30, 35, 40], [10, 15, 20, 25, 30], [0, 5, 10, 15, 20]])
print(temperature, '\n')

# Loop over all latitudes
for i in range(temperature.shape[0]):
    # Loop over all longitudes
    for j in range(temperature.shape[1]):
        # If temperature >= 20°C, convert to Kelvin
        if temperature[i][j] >= 20:
            temperature[i][j] += 273.15
        # If temperature < 20°C and >= 10°C, assign value 1
        elif 10 <= temperature[i][j] < 20:
            temperature[i][j] = 1
        # Other cases
        else:
            # If temperature is 5°C, skip this iteration
            if temperature[i][j] == 5:
                continue
            # Otherwise assign value 0
            else:
                temperature[i][j] = 0

print(temperature)

# Output:
# [[20 25 30 35 40]
#  [10 15 20 25 30]
#  [ 0  5 10 15 20]]
#
# [[293 298 303 308 313]
#  [  1   1 293 298 303]
#  [  0   5   1   1 293]]

Postscript

The above covers the basic concepts and usage of control flow in Python; only the exception handling part remains unmentioned. Based on personal experience, that’s content used later on. We’ll save it for when we discuss functional programming or some advanced techniques.

The power of control flow lies in connecting our scattered, independent modules. Through data indexing and variable assignment, it makes batch processing possible, thereby granting us the ability to use programming languages to develop large-scale numerical simulation projects and complex data processing workflows.

Easy Python

Python 03 | How to Use Control Flow to Achieve Batch Processing and Automation of Geoscience Data?

New Article

Related articles