Troubleshooting Common System Issues
Welcome to the final lesson of our Ubuntu course! By now, you've built a solid foundation in Linux system administration. In this lesson, we'll put all those skills together to tackle real-world system problems. You'll learn systematic approaches to diagnose and resolve common issues that administrators face daily.
Learning Goals
- Develop a systematic troubleshooting methodology
- Diagnose and fix boot problems
- Resolve package dependency and installation issues
- Troubleshoot network connectivity problems
- Recover from disk space and filesystem issues
- Fix user authentication and permission problems
Systematic Troubleshooting Approach
Effective troubleshooting follows a logical process. Start with the most obvious solutions before diving deep.
# 1. Gather information
journalctl -f # Monitor system logs in real-time
dmesg | tail -20 # Check recent kernel messages
systemctl status <service> # Check service status
# 2. Reproduce the issue
# Try to recreate the problem consistently
# 3. Isolate the cause
# Test components individually
# 4. Implement and test solutions
# Apply fixes one at a time
Always start with the system logs! The journalctl command is your best friend for understanding what's happening behind the scenes.
Boot Issues and Recovery
Boot problems can be stressful, but Ubuntu provides several recovery options.
GRUB Bootloader Issues
# During boot, hold Shift (or Esc for UEFI) to access GRUB menu
# Select "Advanced options for Ubuntu"
# Choose recovery mode for troubleshooting options
# From recovery mode, you can:
- fsck: Check and repair filesystem
- clean: Free disk space
- dpkg: Repair broken packages
- root: Drop to root shell prompt
Emergency Mode and Root Shell
If your system won't boot normally, you might need emergency mode:
# In GRUB, edit the boot entry and add:
systemd.unit=emergency.target
# Or for more functionality:
systemd.unit=rescue.target
Package Management Problems
APT dependency issues are common but usually fixable.
# Update package lists
sudo apt update
# Fix broken dependencies
sudo apt --fix-broken install
# Clean up partial installations
sudo apt autoclean
sudo apt autoremove
# Reconfigure problematic packages
sudo dpkg --configure -a
# As last resort, remove and reinstall
sudo apt remove --purge <problem-package>
sudo apt install <package>
- Held Packages Issue
- Repository Problems
# Check for held packages
apt list --installed | grep held
# Why is a package held?
apt-cache policy <package-name>
# Force upgrade if safe
sudo apt install <package>
# Check repository connectivity
sudo apt update
# If repositories fail, check sources
sudo nano /etc/apt/sources.list
# Clear cached package lists
sudo rm -rf /var/lib/apt/lists/*
sudo apt update
Network Troubleshooting
When network connectivity fails, follow this diagnostic path.
# Check interface status
ip addr show
ip link show
# Test connectivity
ping -c 4 8.8.8.8 # Test basic connectivity
ping -c 4 google.com # Test DNS resolution
# Check routing
ip route show
traceroute google.com
# DNS troubleshooting
systemd-resolve --status
cat /etc/resolv.conf
Don't forget to check your firewall! A common mistake is troubleshooting for hours only to find UFW is blocking the connection.
Common Network Fixes
# Restart networking
sudo systemctl restart systemd-networkd
sudo systemctl restart NetworkManager
# Reset network interface
sudo ip link set enp0s3 down
sudo ip link set enp0s3 up
# Flush DNS cache
sudo systemd-resolve --flush-caches
Disk Space and Filesystem Issues
Running out of disk space can cause various system problems.
# Check disk usage
df -h # Filesystem usage
du -sh /home/* # Directory sizes
# Find large files
find /home -type f -size +100M -exec ls -lh {} \;
# Check inode usage (often overlooked)
df -i
Cleaning Up Disk Space
# Clean package cache
sudo apt clean
# Remove old kernels (keep current and one previous)
sudo apt autoremove --purge
# Clear system logs (rotate instead of delete)
sudo journalctl --vacuum-time=7d
# Find and remove large cache files
find /var/cache -type f -size +10M
User and Permission Problems
Authentication and permission issues can prevent users from accessing resources.
# Check if user exists
getent passwd <username>
id <username>
# Verify password status
sudo passwd -S <username>
# Check group membership
groups <username>
# Test sudo access
sudo -l -U <username>
Permission Issue Resolution
# Check current permissions
ls -la /path/to/directory
# Fix ownership
sudo chown -R username:groupname /path/to/directory
# Fix permissions
sudo chmod -R 755 /path/to/directory # For executables
sudo chmod -R 644 /path/to/directory # For regular files
# Check SELinux/AppArmor status (if enabled)
aa-status
Service and Process Issues
When services fail to start or behave unexpectedly.
# Check service status
systemctl status <service-name>
# View service logs
journalctl -u <service-name> -f
# Restart problematic service
sudo systemctl restart <service-name>
# Reload service configuration
sudo systemctl reload <service-name>
# Check service dependencies
systemctl list-dependencies <service-name>
Stuck Process Resolution
# Find processes using a file or port
lsof /path/to/file
lsof -i :80
# Kill processes gracefully
kill <PID>
kill -TERM <PID>
# Force kill if necessary
kill -KILL <PID>
# Find and kill by name
pkill <process-name>
Common Pitfalls
- Skipping logs: Always check
journalctland relevant service logs first - Overcomplicating: Start with simple solutions before complex ones
- Making multiple changes: Fix one thing at a time to understand what worked
- Ignoring disk space: Many mysterious issues are caused by full filesystems
- Forgetting backups: Always backup critical data before major changes
- Rushing solutions: Take time to understand the root cause, not just symptoms
- Network assumptions: Test connectivity at each layer (physical, IP, DNS, application)
Summary
In this lesson, you've learned systematic approaches to troubleshooting common Ubuntu system issues. Remember to:
- Start with gathering information from logs and system status
- Follow a logical diagnostic path from simple to complex
- Use the specialized tools we've covered for each problem type
- Document your process and solutions for future reference
- Always have recovery options available before making major changes
Troubleshooting is as much about methodology as it is about technical knowledge. With practice, you'll develop intuition for where to look and what to try first.
Quiz
Show quiz
- What's the first command you should run when a service isn't working properly?
- How can you access emergency mode if your system won't boot normally?
- What command fixes broken package dependencies in APT?
- How do you check if a full disk is causing system issues?
- Why is it important to fix one issue at a time during troubleshooting?
Answers:
systemctl status service-nameandjournalctl -u service-nameto check service status and logs- Edit the GRUB boot entry and add
systemd.unit=emergency.targetorsystemd.unit=rescue.target sudo apt --fix-broken install- Run
df -hto check disk usage anddf -ito check inode usage - To understand which change actually resolved the problem and avoid introducing new issues