MongoDB is a non-relational database written in C++. It is an open-source database system based on distributed file storage. Its content is stored in a JSON-like object format. Its field values can contain other documents, arrays, and arrays of documents, making it highly flexible. In this article, CloudDude (Yunduojun) will learn MongoDB storage operations with you in Python 3.
1. Preparations
Before starting, please ensure MongoDB is installed, its service is started, and the PyMongo library for Python is installed.
2. Connecting to MongoDB
To connect to MongoDB, we need to use MongoClient from the PyMongo library. Generally, you need to pass the IP address and port of MongoDB. The first parameter is the host address, and the second is the port (if not specified, it defaults to 27017):
python
import pymongo client = pymongo.MongoClient(host='localhost', port=27017)
This creates a MongoDB connection object.
Alternatively, the first parameter host of MongoClient can also accept a MongoDB connection string starting with mongodb://, for example:
python
client = pymongo.MongoClient('mongodb://localhost:27017/')
This achieves the same connection result.
3. Specifying a Database
MongoDB can have multiple databases. Next, we need to specify which database to operate on. Here, we’ll use the test database as an example. Specify the database to use in your program:
python
db = client.test
Here, calling the test attribute of client returns the test database. Alternatively, you can specify it like this:
python
db = client['test']
Both methods are equivalent.
4. Specifying a Collection
Each MongoDB database contains many collections, which are similar to tables in relational databases.
Next, specify the collection to operate on. Here, we’ll use a collection named students. Similar to specifying a database, there are two ways to specify a collection:
python
collection = db.students # or collection = db['students']
This declares a Collection object.
5. Inserting Data
Now, you can insert data. For the students collection, let’s create a new student record represented as a dictionary:
python
student = {
'id': '20170101',
'name': 'Jordan',
'age': 20,
'gender': 'male'
}
This specifies the student’s ID, name, age, and gender. Next, simply call the insert() method of the collection to insert the data:
python
result = collection.insert(student) print(result)
In MongoDB, each piece of data has an _id property for unique identification. If this property is not explicitly specified, MongoDB automatically generates an _id of type ObjectId. The insert() method returns the _id value after execution.
Output:
text
5932a68615c2606814c91f3d
You can also insert multiple documents at once by passing a list:
python
student1 = {
'id': '20170101',
'name': 'Jordan',
'age': 20,
'gender': 'male'
}
student2 = {
'id': '20170202',
'name': 'Mike',
'age': 21,
'gender': 'male'
}
result = collection.insert([student1, student2])
print(result)
The returned result is a collection of corresponding _ids:
text
[ObjectId('5932a80115c2606a59e8a048'), ObjectId('5932a80115c2606a59e8a049')]
Actually, in PyMongo 3.x, the insert() method is no longer officially recommended (though it still works). The recommended methods are insert_one() and insert_many() for inserting single and multiple records, respectively. Example:
python
student = {
'id': '20170101',
'name': 'Jordan',
'age': 20,
'gender': 'male'
}
result = collection.insert_one(student)
print(result)
print(result.inserted_id)
Output:
text
<pymongo.results.InsertOneResult object at 0x10d68b558> 5932ab0f15c2606f0c1cf6c5
Unlike insert(), this returns an InsertOneResult object. You can access its inserted_id attribute to get the _id.
For insert_many(), pass the data as a list:
python
student1 = {...} # As above
student2 = {...} # As above
result = collection.insert_many([student1, student2])
print(result)
print(result.inserted_ids)
Output:
text
<pymongo.results.InsertManyResult object at 0x101dea558>
[ObjectId('5932abf415c2607083d3b2ac'), ObjectId('5932abf415c2607083d3b2ad')]
This method returns an InsertManyResult type. Calling its inserted_ids attribute returns a list of _ids for the inserted data.
6. Querying
After inserting data, you can query using find_one() or find(). find_one() returns a single result, while find() returns a generator object. Example:
python
result = collection.find_one({'name': 'Mike'})
print(type(result))
print(result)
Here we query data where the name is ‘Mike’. The return type is a dictionary.
Output:
text
<class 'dict'>
{'_id': ObjectId('5932a80115c2606a59e8a049'), 'id': '20170202', 'name': 'Mike', 'age': 21, 'gender': 'male'}
Note the added _id property, which MongoDB automatically adds during insertion.
You can also query by ObjectId, which requires importing ObjectId from bson.objectid:
python
from bson.objectid import ObjectId
result = collection.find_one({'_id': ObjectId('593278c115c2602667ec6bae')})
print(result)
The query result is still a dictionary. If no result is found, None is returned.
For querying multiple documents, use the find() method. For example, to find data where age is 20:
python
results = collection.find({'age': 20})
print(results)
for result in results:
print(result)
Output:
text
<pymongo.cursor.Cursor object at 0x1032d5128>
{'_id': ObjectId('593278c115c2602667ec6bae'), 'id': '20170101', 'name': 'Jordan', 'age': 20, 'gender': 'male'}
{'_id': ObjectId('593278c815c2602678bb2b8d'), 'id': '20170102', 'name': 'Kevin', 'age': 20, 'gender': 'male'}
{'_id': ObjectId('593278d815c260269d7645a8'), 'id': '20170103', 'name': 'Harden', 'age': 20, 'gender': 'male'}
The return type is Cursor, which acts like a generator. You need to iterate through it to get all results, each being a dictionary.
To query for data where age is greater than 20:
python
results = collection.find({'age': {'$gt': 20}})
Here the query condition key’s value is not a simple number but a dictionary with the key $gt (greater than) and value 20.
Here’s a summary table of comparison operators:
| Symbol | Meaning | Example |
|---|---|---|
$lt | Less than | {'age': {'$lt': 20}} |
$gt | Greater than | {'age': {'$gt': 20}} |
$lte | Less than or equal | {'age': {'$lte': 20}} |
$gte | Greater than or equal | {'age': {'$gte': 20}} |
$ne | Not equal | {'age': {'$ne': 20}} |
$in | In array | {'age': {'$in': [20, 23]}} |
$nin | Not in array | {'age': {'$nin': [20, 23]}} |
You can also perform regular expression queries. For example, to find students whose names start with ‘M’:
python
results = collection.find({'name': {'$regex': '^M.*'}})
Here, $regex specifies the regular expression match. ^M.* is a regex meaning “starts with M”.
Here’s a table summarizing some functional operators:
| Symbol | Meaning | Example | Explanation |
|---|---|---|---|
$regex | Match regex | {'name': {'$regex': '^M.*'}} | Name starts with M |
$exists | Property exists | {'name': {'$exists': True}} | Name property exists |
$type | Type check | {'age': {'$type': 'int'}} | Age is of type int |
$mod | Modulo operation | {'age': {'$mod': [5, 0]}} | Age modulo 5 equals 0 |
$text | Text search | {'$text': {'$search': 'Mike'}} | Text-type property contains string ‘Mike’ |
$where | Advanced conditional query | {'$where': 'obj.fans_count == obj.follows_count'} | Own follower count equals following count |
For more detailed usage of these operators, refer to the official MongoDB documentation:
https://docs.mongodb.com/manual/reference/operator/query/
7. Counting
To count the number of documents in a query result, use the count() method. For example, to count all documents:
python
count = collection.find().count() print(count)
Or to count documents matching a condition:
python
count = collection.find({'age': 20}).count()
print(count)
The result is a numerical value representing the count.
8. Sorting
To sort, call the sort() method, passing the field to sort by and the sort order flag. Example:
python
results = collection.find().sort('name', pymongo.ASCENDING)
print([result['name'] for result in results])
Output:
text
['Harden', 'Jordan', 'Kevin', 'Mark', 'Mike']
Here, pymongo.ASCENDING specifies ascending order. For descending order, use pymongo.DESCENDING.
9. Offset and Limit
In some cases, you might want to skip a number of results. Use skip() to offset. For example, to skip the first two results:
python
results = collection.find().sort('name', pymongo.ASCENDING).skip(2)
print([result['name'] for result in results])
Output:
text
['Kevin', 'Mark', 'Mike']
You can also use limit() to specify the number of results to return:
python
results = collection.find().sort('name', pymongo.ASCENDING).skip(2).limit(2)
print([result['name'] for result in results])
Output:
text
['Kevin', 'Mark']
Without limit(), three results would be returned. With the limit, only two are returned.
Note: When dealing with very large datasets (tens of millions or billions), avoid using large offsets in queries as they can cause memory issues. Instead, consider queries like this:
python
from bson.objectid import ObjectId
collection.find({'_id': {'$gt': ObjectId('593278c815c2602678bb2b8d')}})
This requires keeping track of the last _id from the previous query.
10. Updating
To update data, use the update() method, specifying the condition and the new data. Example:
python
condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 25
result = collection.update(condition, student)
print(result)
Here we update the age for the student named ‘Kevin’: first query the data, modify the age, then call update() with the original condition and the modified data.
Output:
text
{'ok': 1, 'nModified': 1, 'n': 1, 'updatedExisting': True}
The result is a dictionary. ok indicates success, nModified indicates the number of documents affected.
Alternatively, you can use the $set operator to update data:
python
result = collection.update(condition, {'$set': student})
This updates only the fields present in the student dictionary. Other existing fields remain unchanged and are not deleted. Without $set, the entire document would be replaced by the student dictionary, potentially deleting other fields.
The update() method is also not officially recommended in newer versions. The recommended methods are update_one() and update_many(). Their second parameter must use an operator like $set as a dictionary key. Example:
python
condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 26
result = collection.update_one(condition, {'$set': student})
print(result)
print(result.matched_count, result.modified_count)
This calls update_one(). The second parameter is {'$set': student}. The return type is UpdateResult. The matched_count and modified_count attributes give the number of matched and modified documents.
Output:
text
<pymongo.results.UpdateResult object at 0x10d17b678> 1 0
Another example:
python
condition = {'age': {'$gt': 20}}
result = collection.update_one(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)
Here, the query condition is age > 20, and the update operation is {'$inc': {'age': 1}} (increment age by 1). This increments the age of the first matching document by 1.
Output:
text
<pymongo.results.UpdateResult object at 0x10b8874c8> 1 1
If update_many() is called, all matching documents are updated:
python
condition = {'age': {'$gt': 20}}
result = collection.update_many(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)
Output:
text
<pymongo.results.UpdateResult object at 0x10c6384c8> 3 3
11. Deleting
Deletion is straightforward. Call remove() with the deletion condition. All matching documents will be deleted. Example:
python
result = collection.remove({'name': 'Kevin'})
print(result)
Output:
text
{'ok': 1, 'n': 1}
There are also two recommended methods: delete_one() and delete_many(). Example:
python
result = collection.delete_one({'name': 'Kevin'})
print(result)
print(result.deleted_count)
result = collection.delete_many({'age': {'$lt': 25}})
print(result.deleted_count)
Output:
text
<pymongo.results.DeleteResult object at 0x10e6ba4c8> 1 4
delete_one() deletes the first matching document. delete_many() deletes all matching documents. Both return a DeleteResult object, and the deleted_count attribute gives the number of deleted documents.
12. Other Operations
PyMongo also provides combined methods like find_one_and_delete(), find_one_and_replace(), and find_one_and_update() for find-and-delete, find-and-replace, and find-and-update operations. Their usage is similar to the methods above.
You can also perform index operations using methods like create_index(), create_indexes(), and drop_index().
For detailed usage of PyMongo, refer to the official documentation: http://api.mongodb.com/python/current/api/pymongo/collection.html.
Operations on databases and collections themselves are also supported. Refer to the official documentation for more: http://api.mongodb.com/python/current/api/pymongo/.