Shell Tutorial: Assignment

This page contains a set of practical assignments that simulate some real-world tasks you might encounter in different fields .

These assignments are meant to reinforce core Bash skills taught in the Software Carpentry Shell Novice lesson. Each problem aligns with key concepts from the tutorial and is focused on scientific data processing using Unix tools.

What You'll Practice

Problem 1: Navigating and managing astrophysical datasets using basic commands like ls, cd, mkdir, and cp.
Problem 2: Filtering and transforming cosmic ray data using pipes, loops, grep, awk, and cut.
Problem 3: Writing reusable shell scripts to analyze log files and extract important error diagnostics using find, wc, and scripting logic.

All code, input data, and solution scripts can be found in the GitHub repository: github.com/iliomar/shell-tutorial-assignment . You are encouraged to clone the repository, follow the setup instructions, and run the assignments locally to strengthen your command-line skills.

If you're new to the shell or need a refresher, make sure to check the setup.sh file for environment setup, and review the included documentation to guide you through the workflow.

Setup Instructions

Before you begin solving the assignments, make sure your environment is ready. Follow these steps to set everything up:

Clone the repository:

git clone https://github.com/iliomar/shell-tutorial-assignment.git
cd shell-tutorial-assignment

Run the setup script (optional):
This script ensures that all necessary directories and sample files are in place.
```
chmod +x setup.sh
./setup.sh
```
Note: If everything is already present (e.g., you see the data/, problems/, and solutions/ folders), you don't need to run setup.sh right away. However, it's helpful to use if:
- You accidentally delete or modify files and want to reset the environment.
- You clone the repo onto a fresh system.
- You want to understand or regenerate the structure and mock data programmatically.

Explore the repository structure:
Here's a quick overview:

shell-tutorial-assignments/
├── Problems/          # Assignment problem statements
    ├── Problem1
        ├── data/              # Sample data files (e.g., cosmic rays) 
            ├── cosmic_flux_data.txt     # data file
        ├── cosmic_rays.md     # Problem statement
    ├── Problem2
        ├── processing_cosmic_rays.md     # Problem statement
        ├── data/              # Sample data files
            ├── cosmic_flux_data.txt     # data file 
        ├── results/              # Sample data files
            ├── cosmic_flux_backup.txt     # data file 
    ├── Problem3
        ├── sim_log.md     # Problem statement
        ├── simulation_logs/              # log files
├── solutions/         # Your Bash solutions
├── setup.sh           # Environment setup script if needed

Start solving the problems:
Read each problem in the Problems/ folder. You can find the solution scripts inside solutions/.
How to run the solutions
This script ensures that you have permission to execute the solutions.
```
chmod +x Solution*.sh
./Solution*.sh
```
Make sure to run the solution as follow ./Solution1.sh ./Solution2.sh./Solution3.sh. Remeber to execute it from the solutions/ folder.

System Notes:
✔️ AlmaLinux OS 9: This system already includes Bash and the required core utilities by default—no setup needed.
✔️ macOS: You can use the default Terminal with Bash. All commands used in these assignments are compatible with macOS.

Tip: Use a terminal with Bash support (like Linux, macOS Terminal, or Git Bash on Windows). Get comfortable with commands like cd, ls, grep, cut, awk, sort, and head.

You're all set

Problems

Problems are located in the problems/ folder. Each problem is designed to challenge a specific Bash concept, such as loops, conditionals, or text processing tools.

Clone the repository and try to solve each problem using Bash commands and scripts.

Problem 1: Exploring Cosmic Ray Flux Data

Welcome to your first assignment! We'll apply the basic shell skills you've learned to explore a dataset inspired by astrophysical particle physics. The data contains simulated measurements of cosmic ray particle fluxes detected by an instrument in space.

Topics Covered

Navigating Files and Directories
Working with Files and Directories

Dataset

The file is located at: data/cosmic_flux_data.txt

Energy(GeV)	Flux(particles/m²/s/sr/GeV)	Particle
12.5	4.2e-3	Proton
55.3	1.1e-3	Electron
29.9	3.7e-3	Proton
9.8	5.5e-3	Muon
...

Each row corresponds to a cosmic ray detection event. Your job is to navigate the file system and extract meaningful scientific information using shell commands.

Problem Statement

Navigate to the data/ directory using cd.
Use ls to verify that the file cosmic_flux_data.txt exists.
Use cat or less to inspect the file content. How many columns are there? What are their headers?
Return to the parent directory and create a new folder called results/.
Copy the dataset into the results/ folder using cp.
Move into the results/ folder and rename the file to cosmic_flux_backup.txt.
Count how many lines (including the header) are in the file using wc.
Now return to the repo root using cd ../.. and continue with the next problem once finished.

Notes

Use pwd to check where you are at any moment.
Use mkdir, cp, and mv with care — these commands change your file system!
Remember: you can always use man <command> to learn more.

Create a Bash script called assignment1_solution.sh that performs all the steps above, using relative paths.

Problem 2: Processing Cosmic Ray Data with Pipes, Filters, and Loops

Building on your first assignment, now you will analyze the cosmic ray flux data more deeply using pipes, filters, and loops in Bash.

Topics Covered

Pipes and Filters (|, grep, awk, cut, sort, uniq)
Loops (for, while)
Basic File Navigation and Manipulation

Dataset

Recall the file results/cosmic_flux_backup.txt from Assignment 1. This file contains the cosmic ray flux measurements.

Problem Statement

Navigate to the results/ directory.
Using pipes and filters, find out how many unique particle types are present in the dataset.
Extract and display the top 5 events with the highest energy values.
Use a for or while loop to calculate and print the average flux for each particle type found in the dataset.
Create a new file called high_energy_protons.txt that contains only rows for protons with energy greater than 20 GeV.

Notes

Remember the dataset has a header row — skip it when processing numeric data.
Use commands like grep, awk, sort, uniq, cut, and head.
Loops are useful to automate repeated tasks for each particle type.
Print meaningful messages for clarity when running your script.

Create a Bash script called assignment2_solution.sh that performs all the steps and saves the output files inside the results/ directory.

Problem 3: Analyze Simulation Logs with Shell Scripts

You have a folder simulation_logs/ containing many .log files from astrophysics simulations. Each file may have lines with ERROR or WARNING.

Tasks

Use find to locate all .log files inside simulation_logs/.
Count and print the total number of log files found.
For each file that contains ERROR, print the filename and the number of error lines.
Count how many files contain ERROR and how many contain WARNING.
Extract all ERROR lines from all files into all_errors.txt.
(Optional) Print the top 5 most common error messages from all_errors.txt.

Notes

Write a shell script analyze_logs.sh to do the above.
Include clear messages describing each step.
Use loops, grep, find, wc, sort, and uniq.

Submit the executable shell script analyze_logs.sh that performs all tasks.

Solution to Problem 1: Exploring Cosmic Ray Flux Data

Below is a step-by-step Bash script explanation that performs the tasks requested.


#!/bin/bash

#Navigate to the data directory
echo "Navigating to data/ directory..."
cd data || { echo "data/ directory not found!"; exit 1; }

#List files to confirm dataset exists
echo "Listing files in data/:"
ls

#Inspect the file content (show first 5 lines)
echo "Displaying first 5 lines of cosmic_flux_data.txt:"
head -n 5 cosmic_flux_data.txt

#Go back to the Problem1/ folder
echo "Returning to Problem1/..."
cd ..

#Create results/ directory if it does not exist
echo "Creating results/ directory..."
mkdir -p results

#Copy dataset to results/
echo "Copying dataset to results/ folder..."
cp data/cosmic_flux_data.txt results/

#Move into results/ and rename the file
cd results || { echo "results/ directory not found!"; exit 1; }
echo "Renaming cosmic_flux_data.txt to cosmic_flux_backup.txt..."
mv cosmic_flux_data.txt cosmic_flux_backup.txt

#Count the lines including header
echo "Counting lines in cosmic_flux_backup.txt:"
wc -l cosmic_flux_backup.txt

#Return to repo root (assuming started from repo root)
echo "Returning to repo root directory..."
cd ../..

echo "Assignment 1 tasks complete."

Explanation:

cd data: changes directory to data/ folder containing the dataset.
ls: lists files to confirm the dataset file is present.
head -n 5: shows the first 5 lines so we can inspect headers and sample data.
mkdir -p results: creates a new folder results/, -p avoids errors if it exists.
cp: copies the file to the new directory.
mv: renames the copied file inside results/.
wc -l: counts lines to verify file content length.
cd ../..: moves back to the repository root.

Solution to Problem 2: Processing Cosmic Ray Data with Pipes, Filters, and Loops

This script performs all the tasks requested using pipes, filters, loops, and basic shell scripting.


#!/bin/bash

# Navigate to results directory
echo "Navigate to results/ directory..."
cd results || { echo "results/ directory not found!"; exit 1; }

# Find number of unique particle types
echo "Counting unique particle types..."
unique_particles=$(tail -n +2 cosmic_flux_backup.txt | cut -f3 | sort | uniq | wc -l)
echo "There are $unique_particles unique particle types."

# Extract top 5 highest-energy events
echo "Top 5 highest-energy events:"
echo "(Energy(GeV), Flux, Particle)"
tail -n +2 cosmic_flux_backup.txt | sort -k1,1nr | head -n 5

# Calculate average flux for each particle type using a loop
echo "Calculating average flux per particle type..."
particles=$(tail -n +2 cosmic_flux_backup.txt | cut -f3 | sort | uniq)

for particle in $particles; do
  avg_flux=$(awk -v p="$particle" 'BEGIN{sum=0;count=0} $3==p {sum+=$2; count++} END{if(count>0) print sum/count; else print 0}' cosmic_flux_backup.txt)
  echo "Average flux for $particle: $avg_flux"
done

# Create file with high-energy protons (>20 GeV)
echo "Creating high_energy_protons.txt for protons with Energy > 20 GeV..."
# Copy header first
head -n 1 cosmic_flux_backup.txt > high_energy_protons.txt
# Filter and append matching lines
awk 'NR>1 && $3=="Proton" && $1>20 {print}' cosmic_flux_backup.txt >> high_energy_protons.txt

echo "All tasks complete. File high_energy_protons.txt created in results/."

# Return to repo root
cd ../..

Explanation:

tail -n +2: skips header row for accurate data processing.
cut -f3: extracts the third column (Particle type).
sort | uniq | wc -l: counts unique particle types.
sort -k1,1nr: sorts descending by the first column (Energy).
head -n 5: picks the top 5 after sorting.
The for loop iterates over each particle type to calculate average flux using awk.
awk filters protons with energy greater than 20 GeV to create a new file with header preserved.

Solution for Problem 3: Analyze Astrophysics Simulation Logs

Below there is a shell script that accomplishes all tasks from the assignment:

#!/bin/bash

echo "Starting log analysis..."

# Find all .log files
log_files=$(find simulation_logs/ -type f -name "*.log")

if [[ -z "$log_files" ]]; then
  echo "No log files found in simulation_logs/."
  exit 1
fi

# Count total log files found
num_files=$(echo "$log_files" | wc -l)
echo "Total log files found: $num_files"

error_files=0
warning_files=0

# Loop over files to find errors and warnings
for file in $log_files; do
  error_count=$(grep -c "ERROR" "$file")
  warning_count=$(grep -c "WARNING" "$file")

  if [[ $error_count -gt 0 ]]; then
    echo "File '$file' contains $error_count ERROR lines."
    ((error_files++))
  fi
  if [[ $warning_count -gt 0 ]]; then
    ((warning_files++))
  fi
done

echo "Number of files with ERROR: $error_files"
echo "Number of files with WARNING: $warning_files"

# Extract all ERROR lines into all_errors.txt
echo "Extracting all ERROR lines into all_errors.txt..."
> all_errors.txt  # Empty the file first

for file in $log_files; do
  grep "ERROR" "$file" >> all_errors.txt
done

# (Optional): Show top 5 most common error messages
if [[ -s all_errors.txt ]]; then
  echo "Top 5 most common error messages:"
  cut -d':' -f2- all_errors.txt | sort | uniq -c | sort -nr | head -n 5
else
  echo "No ERROR lines found, skipping top error messages."
fi

echo "Log analysis completed."

Explanation

Searches for log files: Uses find to locate all .log files recursively.
Handles missing logs: If no logs are found, it prints a message and exits.
Counts logs: Displays the number of log files found.
Scans for issues: Iterates through each log file to count ERROR and WARNING messages.
Reports counts: Prints how many files contain errors or warnings.
Collects all errors: Extracts all ERROR lines into a single file called all_errors.txt.
Highlights top issues: If there are errors, it displays the top 5 most frequent error messages.

Additional Resources

Contact

If you have any questions, please don't hesitate to reach out to any PURSUE internship facilitator or mentor via Slack or any other available platform.

Email: iliomar.rodriguez@upr.edu