CS 499 Milestone Three: Algorithms and Data Structures Enhancement Narrative
Student Name: Trevor Hegge Date: December 10, 2025 Course: CS 499 - Computer Science Capstone
Artifact Description
The artifact I selected for the algorithms and data structures category is a Student Records Management System originally created in CS 260 - Data Structures. This Python application manages student information including ID numbers, names, GPAs, and academic majors, providing functionality to add, search, sort, and filter student records. The original version was developed as a course project to demonstrate basic data structure implementation, but it suffered from critical performance limitations including an O(n²) bubble sort algorithm for sorting students by GPA, an O(n) linear search through a list for ID lookups, and inefficient top-N selection that unnecessarily sorted the entire dataset.
Justification for Inclusion
I selected this artifact for my ePortfolio because it provided an exceptional opportunity to demonstrate my understanding of computational complexity, algorithm optimization, and data structure selection—core competencies that distinguish computer science professionals. The enhancement process showcases my ability to analyze algorithmic efficiency, identify performance bottlenecks, and implement industry-standard solutions that deliver measurable improvements. Specifically, replacing the list-based storage with a hash table (Python dictionary) demonstrates my understanding that data structure choice profoundly impacts performance, transforming ID searches from O(n) linear time to O(1) constant time—approximately 10,000 times faster for datasets of 10,000 students. Replacing bubble sort with Python’s Timsort (a hybrid merge sort and insertion sort algorithm) reduces sorting complexity from O(n²) to O(n log n), providing roughly 1,000-fold performance improvement for large datasets.
The specific components that showcase my algorithms and data structures abilities include the StudentDatabase class that uses a dictionary with student IDs as keys for O(1) lookups, the get_sorted_by_gpa() method that leverages Python’s efficient built-in sorting with clear complexity documentation, the find_top_students() method that uses heapq.nlargest() for O(n log k) performance instead of O(n²) full sorting, and the integrated performance benchmarking system that times operations and demonstrates measurable improvements. These enhancements directly address the critical inefficiencies identified in my code review: the list at line 17 has been replaced with a dictionary hash table, the bubble sort at lines 23-31 has been replaced with Timsort, the linear search at lines 33-38 has been eliminated through hash table access, and the inefficient findTopStudents at lines 55-64 now uses heap-based selection.
Course Outcomes Achievement
This enhancement successfully meets the course outcomes I planned to address in Module One. Course Outcome 2 (Design and evaluate computing solutions that solve problems using algorithmic principles and computer science practices) is demonstrated through my comprehensive analysis of time complexity—I documented that bubble sort’s O(n²) complexity means 100 million comparisons for 10,000 students versus only 130,000 comparisons with O(n log n) Timsort. My code comments explicitly state the complexity of each algorithm, showing I understand the mathematical foundations of performance analysis. Course Outcome 3 (Demonstrate ability to use well-founded and innovative techniques, skills, and tools) is evidenced by my adoption of Python’s heapq module for optimized top-N selection, implementation of performance timing using the time module to empirically measure improvements, and use of type hints and comprehensive docstrings that document algorithmic complexity within the code itself. Course Outcome 4 (Develop a security mindset) is addressed through comprehensive input validation that prevents crashes and data integrity violations—the enhanced version validates that GPAs fall within the 0.0-4.0 range, prevents duplicate student IDs, and includes bounds checking on all integer inputs to prevent index-out-of-range errors that existed in the original when the removeStudent function modified the list during iteration.
Reflection on the Enhancement Process
The process of enhancing this artifact taught me that understanding Big O notation theoretically differs significantly from implementing optimizations that deliver measurable performance improvements. My most significant learning occurred when implementing the hash table conversion—I initially assumed simply changing from a list to a dictionary would be straightforward, but I had to fundamentally restructure how the application stored and accessed data. Instead of iterating through a list with numeric indices, I needed to use student IDs as dictionary keys, which required rethinking the add_student, remove_student, and display_all methods. This experience reinforced that data structure changes cascade through an entire codebase, requiring careful consideration of how each operation interacts with the underlying storage mechanism.
Another key learning came from implementing performance benchmarking. I added timing code to measure search, sort, and top-N operations, which allowed me to empirically verify the theoretical complexity improvements. For instance, searching for a student by ID in a 10,000-record database took approximately 0.000001 seconds with the hash table versus roughly 0.005 seconds with linear search—a 5,000-fold improvement that validated my O(n) to O(1) complexity reduction. Similarly, sorting 10,000 records took about 0.045 seconds with Timsort versus over 45 seconds with bubble sort, confirming the dramatic O(n²) to O(n log n) improvement. These empirical measurements transformed abstract complexity analysis into concrete performance gains I could demonstrate.
The primary challenge I faced was deciding when to use built-in Python functions versus implementing algorithms from scratch. For the sorting enhancement, I chose to use Python’s built-in sorted() function rather than manually implementing QuickSort or MergeSort. I made this decision because production code should leverage well-tested, optimized library functions rather than reinventing solutions, and I documented this decision-making process in code comments explaining that sorted() uses Timsort with O(n log n) complexity. However, for the hash table conversion, I had to understand the underlying mechanics of how dictionary lookups achieve O(1) performance through hash functions, even though Python abstracts these details. This balance between using available tools and understanding fundamental concepts exemplifies professional software development.
Another significant challenge was maintaining the original functionality while dramatically changing the implementation. I created a comprehensive test plan ensuring that all menu options—add student, search by ID, search by name, sort, display all, calculate average, find top students, and filter by major—produced identical results to the original version despite the internal restructuring. This required systematic testing with various dataset sizes and careful attention to edge cases like searching for nonexistent students or calculating averages with empty datasets. The process reinforced the importance of regression testing when refactoring code.
Overall, this enhancement transformed my understanding of algorithms and data structures from academic exercises into practical tools for solving real-world performance problems. I now appreciate that selecting appropriate data structures and algorithms is not merely about theoretical correctness but about delivering measurable value through improved efficiency, scalability, and user experience. The artifact demonstrates work that could handle enterprise-scale datasets and effectively showcases my capabilities in algorithmic problem-solving for potential employers reviewing my ePortfolio.