The O(n) Club: Top K Frequent Words—How To Count, Sort, and Not Cry

The O(n) Club: Top K Frequent Words—How To Count, Sort, and Not Cry

⚡ TL;DR

Count each word. Sort the results by how often each word shows up (descending), but alphabetically for ties. If you’re allergic to bugs, do both, not one. Java snack below:

Map<String, Integer> freq = new HashMap<>();
for (String w : words) freq.put(w, freq.getOrDefault(w, 0) + 1);
List<String> result = freq.keySet().stream()
    .sorted((a, b) -> freq.get(a) != freq.get(b)
        ? freq.get(b) - freq.get(a)
        : a.compareTo(b))
    .limit(k)
    .collect(Collectors.toList());

🧠 How Devs Usually Mess This Up

Someone always forgets something. If you only sort by frequency, get ready for chaos whenever two words tie—Java’s hash order is more unpredictable than a manager’s lunch break. With heaps, if you skip customizing the comparator, you’ll get a ‘top K’ where ‘potato’ beats ‘apple’ and chaos reigns. And let’s not forget those who assume K can’t exceed the number of unique words (spoiler: it absolutely can). Pro tip: treat the tie-break as seriously as you treat your favorite coffee order.

🔍 What’s Actually Going On

Picture a spelling bee. Every word spelled gets a sticker. At the end, Ms. Lexicographical sorts the charts: most stickers on top, and for kids with the same count, she picks whoever’s name comes first alphabetically. In code, that’s counting, then two-level sorting: popularity first, then rigidity of the dictionary. Same as real search engine autocompletes or leaderboard apps—the algorithmic nerd’s version of award ceremonies.

🛠️ PseudoCode

  • Count frequencies: Use a HashMap<String, Integer> and tally each word.
  • Sort or Heap: Decide if you’re going with a classic full sort (easy, O(N log N)), or a min-heap of size K with a custom comparator (harder, O(N log K)).
    • If sorting: Put map keys into a list and sort by (frequency desc, lex asc).
    • If min-heap: Use PriorityQueue<String> and a comparator that puts least-frequent (and reverse lex for ties) on top, so you end up with the actual best K.
  • Return: After sort, grab K words; with a heap, pop and reverse because min-heap likes being backward.
// Count frequencies
Map<String, Integer> freq = new HashMap<>();
for each word in words:
    freq[word] = freq.get(word, 0) + 1
 // Sort
List words = freq.keys()
words.sort((a, b) -> freq[b] - freq[a] != 0 ? freq[b] - freq[a] : a.compareTo(b))
// or heap with custom comparator if you like pain
 // Return top K
return words.subList(0, K)

💻 The Code

import java.util.*;
 public class TopKFrequentWords {
    public List<String> topKFrequent(String[] words, int k) {
        Map<String, Integer> freq = new HashMap<>();
        for (String word : words) {
            freq.put(word, freq.getOrDefault(word, 0) + 1);
        }
        List<String> wordList = new ArrayList<>(freq.keySet());
        wordList.sort((a, b) -> {
            int countCompare = freq.get(b) - freq.get(a);
            return countCompare != 0 ? countCompare : a.compareTo(b);
        });
        return wordList.subList(0, Math.min(k, wordList.size()));
    }
}

⚠️ Pitfalls, Traps & Runtime Smacks

  • Lexicographical tie-break: Miss this, and your answer is about as reliable as a vending machine after 4pm.
  • K > unique words: Out-of-bounds? Not today. Use Math.min(k, list.size()) for that extra hygienic touch.
  • Heap confusion: For min-heap, comparator should put least-relevant word at the top—the opposite of what you want at karaoke night.
  • Complexity honesty: Sorting is fine unless your dataset is the size of the British Library. Otherwise, even interviewers just want to see if you can spell ‘comparator’.

🧠 Interviewer Brain-Tattoo

“Ties go to the dictionary—because even in code, alphabetical order is less controversial than rock-paper-scissors.”

Previous Article

The Mutex Club: Avoiding Concurrency Traffic Jams

Next Article

The Mutex Club: Master Java Thread Dumps Like a Pro