Python 02 | Quick Start: Variables and Data Types

Introduction

Before officially stepping into geoscience applications, we still need to learn Pandas and Matplotlib. They are used for DataFrame (or you can think of it as a table) processing and visualization, respectively, enabling us to quickly handle time series and analyze multiple variables.

But while preparing to start with Pandas as previously announced, I realized that postponing Python’s basic syntax further would create a lot of complications.

An obvious situation is that when we use Excel spreadsheets, we usually need column names to distinguish what each column contains. In Python, when we create a DataFrame and specify its column names, a very common method is to use a dictionary. But if we haven’t introduced the concept of dictionaries, it would be awkward. We would need to expand on it like we did with lists earlier, requiring several sections and making the whole system bloated.

So, the plan is to spend two installments quickly introducing Python’s basic syntax to clear the path for future content. Today we start with Python’s variables and data types; next time, we will briefly cover control flow and functions. Mastering these topics will satisfy the basic requirements for Python programming.

Python 01 | How to quickly set up a Python environment that meets basic geoscience requirements?

Variables and Assignment

Variables are the most basic units in Python, used to store data (any type of content can be assigned to a specified variable name). Any operation we need the computer to perform requires assigning the corresponding computational content to a variable for the calculation to proceed.

It’s worth noting that, unlike languages such as C and Fortran, Python does not require declaring a variable’s data type first; it can automatically recognize the variable’s type. (This approach simplifies learning the programming language, as we don’t need to consider many details like memory control, but it also contributes to Python being slower than other languages to some extent.)

Assignment

Assignment means using the assignment symbol = to assign the value on the right to the variable on the left.

Since Python is a dynamic language, variable types can change at any time, and there is no need to declare the data type during assignment.

Therefore, when we assign a value, the variable is also created.

python

temperature = 15
wind_direction = 'N'
city = 'Peking (北京)'
msg = 'Hello World!'

print(temperature)
print(wind_direction)
print(city)
print(msg)

text

15
N
Peking (北京)
Hello World!

Above, we performed a few simple assignment operations. We can see that variables can accept numbers, characters (single letters), and strings (words or sentences).

In fact, not only the lists mentioned earlier but also dictionaries, tuples, etc., which we will discuss next, and even later when we directly read Shapefiles, NetCDF, or rasters as variables, are all possible.

Variables are the building blocks of Python program design.

Data Types: Individual Data

In the variable assignment operations above, we used several built-in native data types in Python. Let’s take a closer look at what they specifically include.

Numeric Types

Since our computer is called a computer, it’s naturally related to calculation. Therefore, the foremost types are our numeric types. Mainly includes: integers (int), floating-point numbers (float), and complex numbers (complex).

Let’s see how to represent them:

python

int_var = 10
float_var = 3.14
complex_var = 2 + 3j
sci_var = 3.14e10                   # Floating-point number using scientific notation, 3.14 * 10^10
sci_var2 = 3.14e-3                  # 3.14 * 10^-3

print(int_var, type(int_var))
print(float_var, type(float_var))
print(complex_var, type(complex_var))
print(sci_var, type(sci_var))
print(sci_var2, type(sci_var2))

text

10 <class 'int'>
3.14 <class 'float'>
(2+3j) <class 'complex'>
31400000000.0 <class 'float'>
0.00314 <class 'float'>

Here, we used Python’s built-in function type() to identify the variable’s data type. It can be widely applied in various scenarios to ensure the data type meets our requirements.

Since we have created numbers, we naturally need to calculate. The following are Python’s built-in basic calculation methods:

python

a = 10
b = 3
c = a + b
d = a - b
e = a * b
f = a / b
g = a ** b              # Exponentiation
h = a // b              # Floor division
i = a % b               # Modulus

print(a, b, c, d, e, f, g, h, i)

text

10 3 13 7 30 3.3333333333333335 1000 3 1

Sharp-eyed friends might have noticed that there is an inaccuracy in the above calculations. That is, the result of 10/3 should be 1/3, an infinite repeating decimal, but the last digit in the above result is 5, which is clearly a precision error.

This situation is caused by floating-point precision overflow, as computers can only store a limited number of decimal places. Generally, such errors do not affect our overall results and can be ignored.

Additionally, Python’s calculation operators can also use a special notation. This notation is similar to the shallow copy calculation mentioned earlier. Since they share memory addresses, it runs slightly faster than the notation above:

python

a = 10

a += 1
print(a)

a -= 2
print(a)

a *= 3
print(a)

a /= 4
print(a)

a **= 2
print(a)

a //= 3
print(a)

a %= 5
print(a)

text

String Type

After covering numeric types, we naturally need to know the actual meaning behind those numbers. Using another set of numbers as codes to represent meanings instead of text would be too abstract, so here we introduce the string type.

python

city = 'プリンストン'
province = '新泽西州'
nation = "U.S."

print("I come from", city, "in", province, "of", nation)

text

I come from プリンストン in 新泽西州 of U.S.

We can see that Python accepts a rich variety of character types.

Here, we mixed '' and "" quotes. In Python, there is no difference between them; you can use them interchangeably. Be careful to distinguish their usage from other languages, such as in Julia where "" indicates strings and '' indicates only characters.

Strings consist of multiple characters, so various operations can be performed on strings, such as concatenation, slicing, and indexing.

python

looong_string = "This is a very looooong string that is used to demonstrate the use of Python's string manipulation functions!"

# Indexing/Slicing
print(looong_string[0])
print(looong_string[-1])
print(looong_string[10:23])
print(looong_string[-10:])
print(looong_string[::-1], '\n')

# Case Conversion
print(looong_string.upper())                # All uppercase
print(looong_string.lower())                # All lowercase
print(looong_string.capitalize())           # Capitalize first letter
print(looong_string.title(), '\n')          # Capitalize first letter of each word

# String Concatenation
print("Hello World! " + looong_string + '\nActually, a very looooong string.\n')

# String Replacement
print(looong_string.replace('looooong', 'long'), '\n')

# String Splitting
print(looong_string.split(' '))             # Split into list by spaces
print(looong_string.split(' ', 2))          # Split into list by spaces, only 2 times

# String Statistics
print(looong_string.count('o'))             # Count occurrences of character 'o'
print(looong_string.find('string'))         # Find position of substring 'string'
print(looong_string.rfind('string'))        # Find position of substring 'string', searching from the right

text

T
!
very looooong
functions!
!snoitcnuf noitalupinam gnirts s'nohtyP fo esu eht etartsnomed ot desu si taht gnirts gnoooool yrev a si sihT

THIS IS A VERY LOOOOONG STRING THAT IS USED TO DEMONSTRATE THE USE OF PYTHON'S STRING MANIPULATION FUNCTIONS!
this is a very looooong string that is used to demonstrate the use of python's string manipulation functions!
This is a very looooong string that is used to demonstrate the use of python's string manipulation functions!
This Is A Very Looooong String That Is Used To Demonstrate The Use Of Python'S String Manipulation Functions!

Hello World! This is a very looooong string that is used to demonstrate the use of Python's string manipulation functions!
Actually, a very looooong string.

This is a very long string that is used to demonstrate the use of Python's string manipulation functions!

['This', 'is', 'a', 'very', 'looooong', 'string', 'that', 'is', 'used', 'to', 'demonstrate', 'the', 'use', 'of', "Python's", 'string', 'manipulation', 'functions!']
['This', 'is', "a very looooong string that is used to demonstrate the use of Python's string manipulation functions!"]
11
24
79

Boolean Type

Often, we are not limited to basic numerical operations; logical operations are also very important. Therefore, to judge the relationships between different variables, we need to use Boolean data.

python

print(3 == 3.0)                                         # Use == to judge equality
print(3 != 3.1)                                         # Use != to judge inequality
print('a' in 'apple')                                   # Use in to judge containment
print('a' not in 'apple')                               # Use not in to judge non-containment
print(True == 1)                                        # Special usage: True equals 1
print(False == 0)                                       # Special usage: False equals 0
print(9 > 8 > 7)                                        # Chain comparison
print(9 > 8 < 7)                                        # Chain comparison
print('a' in 'apple' and 'e' in 'banana')               # Use and to judge if multiple conditions are all true
print('a' in 'apple' or 'e' in 'banana')                # Use or to judge if any condition is true

# Can be used to judge beginning and end of strings
looong_string = "This is a very looooong string that is used to demonstrate the use of Python's string manipulation functions!"
print(looong_string.startswith('This'))                 # Judge if string starts with 'This'
print(looong_string.endswith('functions!'))             # Judge if string ends with 'functions!'

text

True
True
True
False
True
True
True
False
False
True
True
True

Data Type Conversion

Based on the above data types, we can easily convert between them.

python

a = 10.5
print(a, type(a))

b = int(a)
print(b, type(b))

c = str(a)
print(c, type(c))

text

10.5 <class 'float'>
10 <class 'int'>
10.5 <class 'str'>

Data Types: Data Containers

The data types we introduced above are all single values (only strings are somewhat different; if composed of multiple characters, slicing and indexing can extract parts). When multiple such values form a whole, they create a data cluster.

Let’s temporarily call these array types that can store multiple values “data containers.”

Native Python’s built-in data containers mainly include the following four: Lists, Tuples, Sets, and Dictionaries.

List

A list is an ordered collection that can store multiple values and can contain data of any type, represented by []. We mentioned it in the first installment and have used it many times.

Let’s briefly review it through a code block:

python

ls = [1, 'Alice', 3.14, True, '上海', 'プリンストン']

# Indexing/Slicing
print(ls)
print(ls[1])
print(ls[-3])
print(ls[1:3])
print(ls[::2])
print(ls[::-1])
print(ls[-2:], '\n')

# Assignment Modification
ls[1] = 'Iris'
print(ls)

# Append
ls.append('Antarctica')
print(ls)

text

[1, 'Alice', 3.14, True, '上海', 'プリンストン']
Alice
True
['Alice', 3.14]
[1, 3.14, '上海']
['プリンストン', '上海', True, 3.14, 'Alice', 1]
['上海', 'プリンストン']

[1, 'Iris', 3.14, True, '上海', 'プリンストン']
[1, 'Iris', 3.14, True, '上海', 'プリンストン', 'Antarctica']

The above are list operations we are already familiar with. Now, let’s supplement a few more list usages.

python

ls = [1, 'Iris', 3.14, True, '上海', 'プリンストン', 'Antarctica']

print(len(ls))                       # List length

print('Iris' in ls)                  # Judge if 'Iris' is in the list

ls.pop()                             # Remove last element
print(ls)

ls.pop(2)                            # Remove element at index 2
print(ls)

ls.remove('上海')                    # Remove element with value '上海'
print(ls)

ls.insert(2, 'Neptune')              # Insert 'Neptune' element at index 2
print(ls)

print(ls.index('Neptune'))           # Find index of element with value 'Neptune'

ls.reverse()                         # Reverse list
print(ls)

ls0 = [2, 4, 6, 8, 10]
ls.extend(ls0)                       # Merge lists
print(ls)

ls1 = ls + ls0                       # Same as above
print(ls1)

ls0.sort()                           # Sort list
print(ls0)

ls0.sort(reverse=True)               # Reverse sort list
print(ls0)

ls.clear()                           # Clear list
print(ls)

text

7
True
[1, 'Iris', 3.14, True, '上海', 'プリンストン']
[1, 'Iris', True, '上海', 'プリンストン']
[1, 'Iris', True, 'プリンストン']
[1, 'Iris', 'Neptune', True, 'プリンストン']
2
['プリンストン', True, 'Neptune', 'Iris', 1]
['プリンストン', True, 'Neptune', 'Iris', 1, 2, 4, 6, 8, 10]
['プリンストン', True, 'Neptune', 'Iris', 1, 2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
[2, 4, 6, 8, 10]
[10, 8, 6, 4, 2]
[]

Tuple

A tuple, in short, is almost identical to a list in other characteristics. The only difference is its immutability, meaning once created, it cannot be modified using assignment like an array.

Therefore, tuples are very suitable for storing constants used in our modeling to avoid accidental modification. It uses parentheses ().

python

tp = (1, 'Iris', 3.14, True, '上海', 'プリンストン', 'Antarctica')

# Indexing/Slicing
print(tp[1])
print(tp[1:3])
print(tp[-3:])

print(tp.index('Iris'))        # Find position of element
print(len(tp))                 # Tuple length

tp[1] = 'Setosa'               # Using assignment to modify tuple elements is not allowed

text

Iris
('Iris', 3.14)
('上海', 'プリンストン', 'Antarctica')
1
7

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Cell In[75], line 11
      8 print(tp.index('Iris'))        # Index element position
      9 print(len(tp))                 # Tuple length
---> 11 tp[1] = 'Setosa'               # Using assignment to modify tuple elements is not allowed

TypeError: 'tuple' object does not support item assignment

Set

Sets are similar to the concept in mathematics, representing an unordered collection of unique elements. They are represented using curly braces {}.

python

set1 = {1, 2, 3, 4, 5}

set1.add(6)                       # Add element
print(set1)

set1.remove(2)                    # Remove element
print(set1)

print(2 in set1)                  # Check if element exists

set1[0]                           # Since sets are unordered, elements cannot be accessed by index

text

{1, 2, 3, 4, 5, 6}
{1, 3, 4, 5, 6}
False

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Cell In[83], line 11
      7 print(set1)
      9 print(2 in set1)                  # Check if element exists
---> 11 set1[0]                           # Since sets are unordered, elements cannot be accessed by index

TypeError: 'set' object is not subscriptable

The main uses of sets are also consistent with mathematics, mainly for finding union, intersection, difference, symmetric difference, etc.

python

set0 = {1, 2, 3, 4, 5}
set1 = {4, 5, 6, 7, 8}

set2 = set0.union(set1)                                 # Union
print(set2)

set3 = set0.intersection(set1)                          # Intersection
print(set3)

set4 = set0.difference(set1)                            # Difference
print(set4)

set5 = set0.symmetric_difference(set1)                  # Symmetric difference
print(set5)

text

{1, 2, 3, 4, 5, 6, 7, 8}
{4, 5}
{1, 2, 3}
{1, 2, 3, 6, 7, 8}

Additionally, sets can be directly created using the set() function. Since set values are unique, we can extend a non-intended usage: using it to find unique values (deduplication):

python

ls = [1, 3, 1, 0, 2, 4, 1, 5, 6, 2, 3, 4, 5, 6, 7, 8, 9, 10]

set0 = set(ls)
print(set0)

text

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Dictionary

A dictionary is an unordered collection of key-value pairs in Python, used to store keys and their associated values. We can understand it literally, like a “Xinhua Dictionary” where each pinyin corresponds to many characters. We can think of pinyin as the key and the characters as the values.

Therefore, we know the following rules: dictionary keys are unique (each pinyin appears only once in the index), but values can be of any length (one pinyin can correspond to multiple characters).

Dictionaries are created using {} and use : to separate keys and values.

python

continent_info = {'name': 'Europe',
               'population': 73000000,
               'area': 102e4,
               'abbreviation': 'EU',
               'capital': 'Brussels',
               'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'],
               'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm']
               }

print(continent_info)

text

{'name': 'Europe', 'population': 73000000, 'area': 1020000.0, 'abbreviation': 'EU', 'capital': 'Brussels', 'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], 'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm']}

Thus, we can index, add, delete, and modify corresponding values based on the key names:

python

print(continent_info.keys())                           # Use keys() method to get all keys
print(continent_info.values())                         # Use values() method to get all values

print(continent_info['name'])                          # Use key to index value
print(continent_info['bigcity'][4])                    # Continue indexing from multiple values

del continent_info['area']                             # Use del to delete key-value pair
print(continent_info)

continent_info.pop('population')                       # Use pop to delete key-value pair
print(continent_info)

continent_info['area'] = 102e4                         # Directly assign to add key-value pair
print(continent_info)

continent_info.update({'population': 732000000, 'Chinese name': '欧洲'})          # Use update to update key-value pairs
print(continent_info)

print('capital' in continent_info)                     # Use in to check if key exists

text

dict_keys(['name', 'population', 'area', 'abbreviation', 'capital', 'language', 'bigcity'])
dict_values(['Europe', 73000000, 1020000.0, 'EU', 'Brussels', ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm']])
Europe
Zurich
{'name': 'Europe', 'population': 73000000, 'abbreviation': 'EU', 'capital': 'Brussels', 'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], 'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm']}
{'name': 'Europe', 'abbreviation': 'EU', 'capital': 'Brussels', 'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], 'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm']}
{'name': 'Europe', 'abbreviation': 'EU', 'capital': 'Brussels', 'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], 'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm'], 'area': 1020000.0}
{'name': 'Europe', 'abbreviation': 'EU', 'capital': 'Brussels', 'language': ['English', 'French', 'German', 'Italian', 'Spanish', 'Portuguese', 'Romanian', 'Turkish', 'Russian'], 'bigcity': ['Paris', 'London', 'Berlin', 'Rome', 'Zurich', 'Moscow', 'Vienna', 'Barcelona', 'Istanbul', 'Oslo', 'Stockholm'], 'area': 1020000.0, 'population': 732000000, 'Chinese name': '欧洲'}
True

Additionally, dictionaries also have the array copy issue mentioned in the previous installment. By default, it is a shallow copy, which means modifying the dictionary’s values will also affect the original dictionary.

python

# Shallow copy
dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = dict1
dict2['d'] = 4

print(dict1)
print(dict2, '\n')

# Deep copy
dict2 = dict1.copy()
dict2['e'] = 5

print(dict1)
print(dict2)

text

{'a': 1, 'b': 2, 'c': 3, 'd': 4}
{'a': 1, 'b': 2, 'c': 3, 'd': 4}

{'a': 1, 'b': 2, 'c': 3, 'd': 4}
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

Afterword

The above is the content of this installment, mainly introducing some basic Python syntax. Undoubtedly, for friends with some programming foundation, these things are much simpler compared to languages like C. Moreover, it shares many similarities with languages like MATLAB and R, making it very suitable for seamless switching for multi-platform practitioners.

Easy Python

Python 02 | Quick Start: Variables and Data Types

New Article

Related articles