Skip to main content

Troubleshooting Common System Issues

Welcome to the final lesson of our Ubuntu course! By now, you've built a solid foundation in Linux system administration. In this lesson, we'll put all those skills together to tackle real-world system problems. You'll learn systematic approaches to diagnose and resolve common issues that administrators face daily.

Learning Goals

  • Develop a systematic troubleshooting methodology
  • Diagnose and fix boot problems
  • Resolve package dependency and installation issues
  • Troubleshoot network connectivity problems
  • Recover from disk space and filesystem issues
  • Fix user authentication and permission problems

Systematic Troubleshooting Approach

Effective troubleshooting follows a logical process. Start with the most obvious solutions before diving deep.

Basic troubleshooting workflow
# 1. Gather information
journalctl -f # Monitor system logs in real-time
dmesg | tail -20 # Check recent kernel messages
systemctl status <service> # Check service status

# 2. Reproduce the issue
# Try to recreate the problem consistently

# 3. Isolate the cause
# Test components individually

# 4. Implement and test solutions
# Apply fixes one at a time
tip

Always start with the system logs! The journalctl command is your best friend for understanding what's happening behind the scenes.

Boot Issues and Recovery

Boot problems can be stressful, but Ubuntu provides several recovery options.

GRUB Bootloader Issues

Accessing GRUB menu and recovery
# During boot, hold Shift (or Esc for UEFI) to access GRUB menu
# Select "Advanced options for Ubuntu"
# Choose recovery mode for troubleshooting options

# From recovery mode, you can:
- fsck: Check and repair filesystem
- clean: Free disk space
- dpkg: Repair broken packages
- root: Drop to root shell prompt

Emergency Mode and Root Shell

If your system won't boot normally, you might need emergency mode:

Emergency mode access
# In GRUB, edit the boot entry and add:
systemd.unit=emergency.target

# Or for more functionality:
systemd.unit=rescue.target

Package Management Problems

APT dependency issues are common but usually fixable.

Fixing broken packages
# Update package lists
sudo apt update

# Fix broken dependencies
sudo apt --fix-broken install

# Clean up partial installations
sudo apt autoclean
sudo apt autoremove

# Reconfigure problematic packages
sudo dpkg --configure -a

# As last resort, remove and reinstall
sudo apt remove --purge <problem-package>
sudo apt install <package>
Resolving held packages
# Check for held packages
apt list --installed | grep held

# Why is a package held?
apt-cache policy <package-name>

# Force upgrade if safe
sudo apt install <package>

Network Troubleshooting

When network connectivity fails, follow this diagnostic path.

Network diagnostic commands
# Check interface status
ip addr show
ip link show

# Test connectivity
ping -c 4 8.8.8.8 # Test basic connectivity
ping -c 4 google.com # Test DNS resolution

# Check routing
ip route show
traceroute google.com

# DNS troubleshooting
systemd-resolve --status
cat /etc/resolv.conf
warning

Don't forget to check your firewall! A common mistake is troubleshooting for hours only to find UFW is blocking the connection.

Common Network Fixes

Quick network fixes
# Restart networking
sudo systemctl restart systemd-networkd
sudo systemctl restart NetworkManager

# Reset network interface
sudo ip link set enp0s3 down
sudo ip link set enp0s3 up

# Flush DNS cache
sudo systemd-resolve --flush-caches

Disk Space and Filesystem Issues

Running out of disk space can cause various system problems.

Disk space investigation
# Check disk usage
df -h # Filesystem usage
du -sh /home/* # Directory sizes

# Find large files
find /home -type f -size +100M -exec ls -lh {} \;

# Check inode usage (often overlooked)
df -i

Cleaning Up Disk Space

Freeing disk space
# Clean package cache
sudo apt clean

# Remove old kernels (keep current and one previous)
sudo apt autoremove --purge

# Clear system logs (rotate instead of delete)
sudo journalctl --vacuum-time=7d

# Find and remove large cache files
find /var/cache -type f -size +10M

User and Permission Problems

Authentication and permission issues can prevent users from accessing resources.

User authentication troubleshooting
# Check if user exists
getent passwd <username>
id <username>

# Verify password status
sudo passwd -S <username>

# Check group membership
groups <username>

# Test sudo access
sudo -l -U <username>

Permission Issue Resolution

Fixing permission problems
# Check current permissions
ls -la /path/to/directory

# Fix ownership
sudo chown -R username:groupname /path/to/directory

# Fix permissions
sudo chmod -R 755 /path/to/directory # For executables
sudo chmod -R 644 /path/to/directory # For regular files

# Check SELinux/AppArmor status (if enabled)
aa-status

Service and Process Issues

When services fail to start or behave unexpectedly.

Service troubleshooting
# Check service status
systemctl status <service-name>

# View service logs
journalctl -u <service-name> -f

# Restart problematic service
sudo systemctl restart <service-name>

# Reload service configuration
sudo systemctl reload <service-name>

# Check service dependencies
systemctl list-dependencies <service-name>

Stuck Process Resolution

Managing stuck processes
# Find processes using a file or port
lsof /path/to/file
lsof -i :80

# Kill processes gracefully
kill <PID>
kill -TERM <PID>

# Force kill if necessary
kill -KILL <PID>

# Find and kill by name
pkill <process-name>

Common Pitfalls

  • Skipping logs: Always check journalctl and relevant service logs first
  • Overcomplicating: Start with simple solutions before complex ones
  • Making multiple changes: Fix one thing at a time to understand what worked
  • Ignoring disk space: Many mysterious issues are caused by full filesystems
  • Forgetting backups: Always backup critical data before major changes
  • Rushing solutions: Take time to understand the root cause, not just symptoms
  • Network assumptions: Test connectivity at each layer (physical, IP, DNS, application)

Summary

In this lesson, you've learned systematic approaches to troubleshooting common Ubuntu system issues. Remember to:

  1. Start with gathering information from logs and system status
  2. Follow a logical diagnostic path from simple to complex
  3. Use the specialized tools we've covered for each problem type
  4. Document your process and solutions for future reference
  5. Always have recovery options available before making major changes

Troubleshooting is as much about methodology as it is about technical knowledge. With practice, you'll develop intuition for where to look and what to try first.

Quiz

Show quiz
  1. What's the first command you should run when a service isn't working properly?
  2. How can you access emergency mode if your system won't boot normally?
  3. What command fixes broken package dependencies in APT?
  4. How do you check if a full disk is causing system issues?
  5. Why is it important to fix one issue at a time during troubleshooting?

Answers:

  1. systemctl status service-name and journalctl -u service-name to check service status and logs
  2. Edit the GRUB boot entry and add systemd.unit=emergency.target or systemd.unit=rescue.target
  3. sudo apt --fix-broken install
  4. Run df -h to check disk usage and df -i to check inode usage
  5. To understand which change actually resolved the problem and avoid introducing new issues