MongoDB with Python PyMongo Usage
Now that you've learned how to work with MongoDB using Node.js, let's explore how to integrate MongoDB with Python using PyMongo. Python's simplicity and extensive data science ecosystem make it an excellent choice for MongoDB applications, especially in data analysis, machine learning, and scripting scenarios.
Learning Goals:
- Install and configure PyMongo
- Connect to MongoDB from Python applications
- Perform CRUD operations using PyMongo
- Work with PyMongo's cursor objects and data conversion
- Handle common Python-MongoDB integration patterns
Installing PyMongo
First, install PyMongo using pip:
pip install pymongo
For development with modern Python features, you might also want:
pip install pymongo[srv] # For MongoDB SRV connection strings
Connecting to MongoDB
PyMongo provides several ways to connect to your MongoDB instance:
from pymongo import MongoClient
import pprint
# Basic local connection
client = MongoClient('mongodb://localhost:27017/')
# Connection with authentication
client = MongoClient(
'mongodb://username:password@localhost:27017/'
)
# MongoDB Atlas connection (using SRV)
# client = MongoClient('mongodb+srv://username:password@cluster.mongodb.net/')
# Test the connection
try:
client.admin.command('ping')
print("Successfully connected to MongoDB!")
except Exception as e:
print(f"Connection failed: {e}")
# Access a database and collection
db = client['school']
students = db['students']
Always use connection pooling by reusing your MongoClient instance rather than creating new connections for each operation. The client is thread-safe and manages connections efficiently.
Basic CRUD Operations
Creating Documents
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['school']
students = db['students']
# Insert a single document
student1 = {
"name": "Alice Johnson",
"age": 20,
"major": "Computer Science",
"gpa": 3.8,
"courses": ["CS101", "MATH202", "PHYS101"]
}
result = students.insert_one(student1)
print(f"Inserted document with ID: {result.inserted_id}")
# Insert multiple documents
student2 = {
"name": "Bob Smith",
"age": 22,
"major": "Mathematics",
"gpa": 3.5,
"courses": ["MATH202", "STAT301"]
}
student3 = {
"name": "Carol Davis",
"age": 21,
"major": "Physics",
"gpa": 3.9,
"courses": ["PHYS101", "MATH202", "CHEM101"]
}
result = students.insert_many([student2, student3])
print(f"Inserted {len(result.inserted_ids)} documents")
Reading Documents
from pymongo import MongoClient
import pprint
client = MongoClient('mongodb://localhost:27017/')
students = client['school']['students']
# Find one document
alice = students.find_one({"name": "Alice Johnson"})
print("Found Alice:")
pprint.pprint(alice)
# Find all documents
print("\nAll students:")
for student in students.find():
pprint.pprint(student)
# Find with projection (only return specific fields)
print("\nStudent names and majors only:")
for student in students.find(
{},
{"name": 1, "major": 1, "_id": 0}
):
pprint.pprint(student)
# Find with query filters
cs_students = students.find({"major": "Computer Science"})
print(f"\nComputer Science students: {list(cs_students)}")
Updating Documents
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
students = client['school']['students']
# Update one document
result = students.update_one(
{"name": "Alice Johnson"},
{"$set": {"gpa": 3.9, "graduated": True}}
)
print(f"Modified {result.modified_count} document(s)")
# Update multiple documents
result = students.update_many(
{"age": {"$gte": 21}},
{"$set": {"senior": True}}
)
print(f"Modified {result.modified_count} document(s)")
# Upsert - insert if doesn't exist
result = students.update_one(
{"name": "David Wilson"},
{
"$set": {
"age": 23,
"major": "Chemistry",
"gpa": 3.4
}
},
upsert=True
)
print(f"Upserted document")
Deleting Documents
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
students = client['school']['students']
# Delete one document
result = students.delete_one({"name": "David Wilson"})
print(f"Deleted {result.deleted_count} document(s)")
# Delete multiple documents
result = students.delete_many({"senior": True})
print(f"Deleted {result.deleted_count} document(s)")
# Delete all documents (be careful!)
# result = students.delete_many({})
# print(f"Deleted all {result.deleted_count} documents")
Working with Cursors and Data Types
PyMongo automatically converts between BSON and Python types:
from pymongo import MongoClient
from datetime import datetime
import uuid
client = MongoClient('mongodb://localhost:27017/')
events = client['analytics']['events']
# Insert document with various data types
event = {
"event_id": uuid.uuid4(),
"timestamp": datetime.now(),
"user_id": 12345,
"action": "login",
"metadata": {
"ip_address": "192.168.1.1",
"user_agent": "Mozilla/5.0"
},
"tags": ["authentication", "success"],
"is_active": True
}
result = events.insert_one(event)
print(f"Inserted event with ID: {result.inserted_id}")
# Query and work with results
recent_events = events.find({
"timestamp": {"$gte": datetime(2024, 1, 1)}
}).sort("timestamp", -1).limit(5)
print("Recent events:")
for event in recent_events:
print(f"- {event['action']} at {event['timestamp']}")
Advanced Query Operations
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
students = client['school']['students']
# Complex queries with operators
high_gpa_students = students.find({
"gpa": {"$gte": 3.7},
"age": {"$lte": 22}
})
# Array queries
math_students = students.find({
"courses": {"$in": ["MATH202", "STAT301"]}
})
# Text search (requires text index)
# students.create_index([("name", "text")])
# johnsons = students.find({"$text": {"$search": "Johnson"}})
# Aggregation pipeline
pipeline = [
{"$group": {
"_id": "$major",
"average_gpa": {"$avg": "$gpa"},
"count": {"$sum": 1}
}},
{"$sort": {"average_gpa": -1}}
]
major_stats = students.aggregate(pipeline)
print("\nGPA by major:")
for stat in major_stats:
print(f"{stat['_id']}: {stat['average_gpa']:.2f} (n={stat['count']})")
When working with PyMongo cursors, be mindful of memory usage with large result sets. Use .limit() and batch processing for large queries, and always close cursors when done.
Error Handling and Best Practices
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure, DuplicateKeyError, OperationFailure
try:
client = MongoClient('mongodb://localhost:27017/', serverSelectionTimeoutMS=5000)
# Test connection with timeout
client.admin.command('ping')
db = client['school']
students = db['students']
# Ensure unique index
students.create_index("email", unique=True)
try:
# This will fail if email already exists
students.insert_one({
"name": "Test User",
"email": "test@example.com"
})
except DuplicateKeyError:
print("Email already exists!")
except ConnectionFailure:
print("Failed to connect to MongoDB")
except OperationFailure as e:
print(f"Database operation failed: {e}")
finally:
client.close()
Common Pitfalls
- Not closing connections: While PyMongo's connection pooling helps, explicitly close connections in long-running applications
- Ignoring BSON types: Remember that PyMongo uses BSON types which may differ from Python native types
- Large result sets: Loading entire collections into memory can crash your application; use batching
- Date handling: Python
datetimeobjects work well, but timezone awareness is important - Connection strings: Different environments (local, Atlas, replica sets) require different connection string formats
Summary
PyMongo provides a Pythonic interface to MongoDB that feels natural to Python developers. You've learned how to connect to MongoDB, perform all CRUD operations, work with different data types, execute advanced queries, and handle errors properly. The library's design mirrors MongoDB's query language while providing Python-specific conveniences.
Show quiz
-
What is the recommended way to manage MongoDB connections in a Python application? A) Create a new MongoClient for each operation B) Use a global MongoClient instance C) Create connection pools manually D) Use connection strings with every call
-
Which method would you use to insert multiple documents at once? A)
insert()B)insert_one()C)insert_many()D)bulk_insert() -
How does PyMongo handle the conversion between Python dates and MongoDB dates? A) It doesn't support date conversion B) Uses strings for all dates C) Automatically converts between Python datetime and BSON Date D) Requires manual conversion using strftime
-
What is the purpose of the
upsertparameter in update operations? A) To update only specific fields B) To insert a document if it doesn't exist C) To update multiple documents at once D) To optimize query performance -
Why should you be cautious when working with large query results in PyMongo? A) PyMongo has a 100-document limit B) Large results can exhaust memory C) MongoDB charges per document retrieved D) Python cannot handle large datasets
Answers:
- B - Use a global MongoClient instance (it's thread-safe and manages connection pooling)
- C -
insert_many()is used for inserting multiple documents - C - PyMongo automatically converts between Python datetime and BSON Date
- B - The
upsertparameter inserts a document if no matching document exists - B - Large results can exhaust memory; use batching and limits for large datasets