Introduction
When we need to compute values at specific positions or regions within an array, how should we proceed? Translated to earth science: when we have global gridded data, we generally need to extract the portion corresponding to our study area.
Another issue: the arrays we created previously had their dimensions and sizes specified at creation. However, in reality, many arrays might be stored separately and need to be aggregated to achieve the functionality we desire. How should this be handled?
An example:
Our monthly temperature data are stored in separate arrays. To calculate the annual average temperature, we need to aggregate them into one array and compute the mean.
You might say, “It’s only 12 months, I can read them separately, add from item 1 to 12, then divide by 12.”
That’s possible. But what if there are 1,200 months? How would you handle that?
Then you say, “Then I’d use a loop.”
Sorry, we haven’t covered loops yet; I’ll assume we don’t know how.
Besides, even if a loop can simply compute the mean, we often have more complex parameters to calculate.
For example, finding the time trend for each grid cell, which is the slope
kin a linear function. For a bunch of scattered arrays, I don’t think using a loop can easily solve this problem.Therefore, array restructuring is particularly important. Of course, this isn’t the only simple example; they have broader applications, which can be explored gradually later.
Personally, I feel that extracting parts of arrays containing earth science data is also a change in data structure, so “indexing” doesn’t get a place in the title here.
And the title looks odd, ending with
½. That’s because while writing, I found I had written a messy bunch, and the first section’s length had already seriously exceeded expectations (underestimating my own ability to ramble, I guess).So, let’s split it into two parts!
Indexing and Slicing
Indexing
We previously mentioned the concept of indexing: indexing refers to accessing an element at a specified position; list indexing is similar to array indexing.
Moreover, since Python follows C language’s syntax conventions, indexing starts from 0. This differs from more mathematics-oriented languages like MATLAB, R, and Julia, and requires attention. When an array has multiple dimensions, multiple subscripts are needed to access a specific element.
It’s worth mentioning that array indexing includes forward and reverse order. When indexing from start to finish, counting from 0 determines the element’s subscript. If it’s more convenient to determine an element’s position from the end, we can count from the last element backwards as -1, -2, -3, and so on.
Let’s demonstrate with examples:
python
import numpy as np arr = np.array([1, 2, 3, 4, 5]) print(arr[0]) # First element print(arr[4]) # Fifth element print(arr[-1]) # Last element print(arr[-3]) # Third element from the end # Output: # 1 # 5 # 5 # 3
When an array has multiple dimensions, we need multiple subscripts to access a specific element.
For example, for a 2D array, we can index using arr[i][j], where i is the row index and j is the column index.
python
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(arr[0]) # First row # The following two notations are ultimately equivalent print(arr[0][0]) # Element at row 0, column 0 print(arr[0, 0]) # Same as above print(arr[-1, -3]) # Last row, third element from the end of that row # Output: # [1 2 3] # 1 # 1 # 7
Slicing
Clearly, indexing has a limitation: we can only get one value per position. Is there a more advanced way to cut out a large chunk of area to form a new array at once? Of course, that’s the higher-level indexing method – slicing.
Regular Slicing
Slicing modifies the index position to use a colon-specified range of elements.
The syntax is array[start:stop:step], meaning indexing from position start to stop (exclusive) with step size step. step defaults to 1 and can be omitted.
Note: Slicing uses a left-closed, right-open interval; the element at the last index (stop) is not included.
python
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) print(arr[1:5]) # Forward slice: elements at indices 1-4 print(arr[-5:-1]) # Reverse slice: elements at indices -5 to -2 print(arr[0:10:2]) # Forward slice with step 2 # Output: # [2 3 4 5] # [6 7 8 9] # [1 3 5 7 9] arr = np.random.rand(180, 360) # Define 1°x1° gridded data print(arr[30:61, 60:101]) # Extract grid data for lat range 30°N-60°N, lon range 60°E-100°E (since stop=101 is exclusive, 100°E is included) # Output (example): # [[0.62427043 0.82133876 0.50815278 ... 0.43005637 0.84763936 0.99592113] # [0.01961363 0.7792136 0.83044477 ... 0.08328295 0.24768328 0.02536089] # ... # [0.66423806 0.94671443 0.24224496 ... 0.46171495 0.13174959 0.21355304]]
Now a question arises: if we want to include the last element when slicing, filling stop with -1 is obviously wrong. 0? Even worse. Would we have to use the array’s size attribute and add 1 every time to ensure we get the last element? Of course not. If we leave start and stop empty when slicing, it defaults to including elements from both ends.
python
print(arr[::2]) # Forward slice with step 2 print(arr[::-1]) # Reverse the array print(arr[1:8:3]) # Forward slice indices 1-7 with step 3 # Output: # [1 3 5 7 9] # [10 9 8 7 6 5 4 3 2 1] # [2 5 8]
Of course, if you have an odd number of elements and slice with array[::2], the last element won’t appear. But it’s not that it wasn’t selected; the step size of 2 skipped over it.
python
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8]) print(arr[::2]) arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) print(arr[::2]) # Output: # [1 3 5 7] # [1 3 5 7 9]
Boolean Slicing
You might say that needing to know fixed positions for slicing is too rigid. Is there a more advanced way, like filtering precipitation for grid cells where temperature is greater than 30°C?
Thus, Boolean indexing based on logical operations appears. When we evaluate an array against a condition, the array becomes a mask array with values True and False.
We can then use this mask for more complex extractions.
python
temperature = np.random.rand(180, 360) * 100 # Define 1°x1° temperature grid data another_index = np.random.rand(180, 360) # Define 1°x1° index grid data temperature_mask = temperature > 30 # Check if temperature is above 30°C print(temperature_mask) # The mask array print(another_index[temperature_mask]) # Index another array of the same shape using the mask array # Output (mask example, index values): # [[False True True ... True False True] # [ True True True ... False True False] # ... # [ True False True ... True True True]] # [0.23491348 0.56643524 0.1230861 ... 0.97229519 0.05542499 0.35975104]
Special Slicing
Finally, let’s mention a special type of indexing for specifying irregular positions, to handle some possible special scenarios.
python
arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30]])
print(arr)
print(arr[[0, 1, 1], [0, 2, 8]]) # Fancy indexing: elements at (0,0), (1,2), (1,8)
# Output:
# [[ 1 2 3 4 5 6 7 8 9 10]
# [11 12 13 14 15 16 17 18 19 20]
# [21 22 23 24 25 26 27 28 29 30]]
# [ 1 13 19] # Corresponding to arr[0,0], arr[1,2], arr[1,8]
Array Assignment
We’ve talked a lot about using indexing and slicing to get values at specified positions. In fact, they also provide endless possibilities for modifying array element values. Let’s give some practical examples:
Without indexing/slicing: We can only operate on the entire array, like converting Celsius to Kelvin.
python
temperature_celsius = np.random.rand(180, 360) * 100 temperature_kelvin = temperature_celsius + 273.15 print(temperature_celsius) print(temperature_kelvin) # Outputs (example values)
Using indexing: Knowing a specific point’s temperature is anomalously high, we can index it and subtract the anomaly.
python
anomaly = 1 # Define anomaly value print(temperature_celsius[12, 12]) temperature_celsius[12, 12] = temperature_celsius[12, 12] - anomaly print(temperature_celsius[12, 12]) # Output (example): # 25.76296894744049 # 24.76296894744049
Using slicing: Knowing that reanalysis data for a certain region is generally high, we can slice it out and uniformly subtract the anomaly.
python
anomaly = 10 print(temperature_celsius[20:30, 50:60]) temperature_celsius[20:30, 50:60] = temperature_celsius[20:30, 50:60] - anomaly print(temperature_celsius[20:30, 50:60]) # Output (example values before and after subtraction)
More complex applications: Suppose an index exceeds a critical value when temperature is >50°C. We mask them out and apply boundary conditions.
python
another_index = np.random.rand(180, 360) * 10 # Define a 1°x1° index grid data print(another_index[temperature_celsius >= 50]) boundary_condition = 0.5 another_index[temperature_celsius >= 50] = boundary_condition # Set values where temp >= 50°C to the boundary condition print(another_index[temperature_celsius >= 50]) # Output (example): # [3.31433926 7.22395752 6.10150125 ... 0.74224767 0.4194778 8.52261236] # [0.5 0.5 0.5 ... 0.5 0.5 0.5]
Thus, the nan_to_num function for handling missing values mentioned last issue becomes somewhat unnecessary; we can implement it ourselves:
python
arr = np.array([1, 2, np.nan, 4, 5]) arr[np.isnan(arr)] = -1000 print(arr) # Output: # [ 1. 2. -1000. 4. 5.]
Supplementary
In the array computation section last time, I think there were a few more practical functions not mentioned. Let’s supplement them here.
Matrix multiplication .dot:
python
arr0 = np.array([[1, 2], [3, 4]]) arr1 = np.array([[5, 6], [7, 8]]) print(arr0.dot(arr1)) # Output: # [[19 22] # [43 50]]
In MATLAB, arrays default to matrix operations; element-wise operations require a dot to differentiate (e.g., .*, ./). In NumPy, the more commonly used element-wise operations are the default, while matrix operations are encapsulated in functions.
Degree-radian conversion deg2rad and rad2deg:
python
arr = np.array([0, 30, 45, 60, 90]) # Define angle array in degrees print(np.sin(arr * np.pi / 180)) # Convert degrees to radians manually, then compute sine print(np.deg2rad(arr)) # Built-in function conversion print(np.sin(np.deg2rad(arr))) print(np.sin(np.deg2rad(np.rad2deg(np.deg2rad(arr))))) # Russian nesting doll # Output: # [0. 0.5 0.70710678 0.8660254 1. ] # [0. 0.52359878 0.78539816 1.04719755 1.57079633] # [0. 0.5 0.70710678 0.8660254 1. ] # [0. 0.5 0.70710678 0.8660254 1. ]
Here, 2 is a homophone for “to” in English, consistent with the naming style of data type conversion functions in MATLAB. Those who know should understand.
Of course, we could calculate degree-radian conversion based on their formula definitions. But to avoid those moments when you just can’t remember what the formula looks like, let’s use the “syntactic sugar” NumPy has packaged for us.