Initial Debugging — Hunting Down Bottlenecks
Before profiling, initial hypotheses were formed about potential bottlenecks. However, the real value lay in scientifically confirming these suspicions with a profiler, identifying where most of the processing time was spent, and applying targeted fixes.
In this case, profiling showed that Redis deletion took only 5 seconds out of the 4-minute total response time, indicating the need to look deeper into the application logic.
Profiling
Using the profiler, it became clear that one function was responsible for over 90% of the time spent during processing.
- Method 1: 90% of processing time
- Method 2: 3% of processing time
- Method 3: 3% of processing time
- Method 4: 3% of processing time
The Bottleneck
This highlighted method's role was to remove keys from a list of strings based on whether they started with any of the prefixes from another collection.
For example, given keys: a, ab, abc, abcd, bcd, bcdf, bdcf
And prefixes to remove: "ab", "bcdf"
After filtering, only "a" and "bdcf" remain.
The challenge was that for each key, the code looped through every prefix, resulting in O(keys * prefixes) complexity. With 34,000 keys and 27,000 prefixes, this equated to 917,000,000 iterations—a heavy computational load.
Algorithm Complexity:
O(34,000 * 27,000) = 917,000,000 iterations
Algorithm Solution
Typically, challenges like removing multiple keys from an array can be addressed with hashsets or dictionaries, which have O(1) deletion time. Unfortunately, since the problem was about prefixes and not exact matches, this was not a straightforward solution.
After conducting thorough analysis and whiteboarding different approaches, our team developed a viable indexing strategy using a tree structure.
For illustration, consider a set of keys like: 202012, 202013, 202014
And prefixes to remove: 20201
We created a tree based on each letter of the keys. When checking if a key starts with a given prefix, the algorithm traverses the tree to the final node corresponding to the prefix. It then removes the node and all of its child nodes.
Performance Improvement:
- Before: O(keys * prefixes) = 917,000,000 iterations
- After: O(keys + prefixes) = 61,000 iterations
- Improvement: From exponential to linear complexity
Results
Before, the response time took 1.8 minutes for the whole endpoint; after this change, it went to 6 seconds. It's a massive difference in algorithm efficiency. Furthermore, it effectively converts from exponential to linear:
"Testing validated the improvement: the initial approach took 12.2 seconds, while the tree-based solution ran in only 65 milliseconds."
Key Learnings:
- Use scientific profiling to identify the actual bottleneck
- Practice and familiarity with algorithms prepare you for challenges like this
- Understanding Big O notation helps quickly identify inefficient patterns
- Tree-based indexing can transform exponential problems into linear ones
- Don't assume the obvious culprit - Redis wasn't the problem, the algorithm was
Final Results:
Response time: 1.8 minutes → 6 seconds (95% improvement)
Algorithm execution: 12.2 seconds → 65 milliseconds (99.5% improvement)
This project showcased how turning an exponential problem into a linear one through algorithmic restructuring can deliver exceptional performance results for real-world applications.