The O(n) Club: Top K Frequent Words—How To Count, Sort, and Not Cry
⚡ TL;DR
Count each word. Sort the results by how often each word shows up (descending), but alphabetically for ties. If you’re allergic to bugs, do both, not one. Java snack below:
Map<String, Integer> freq = new HashMap<>(); for (String w : words) freq.put(w, freq.getOrDefault(w, 0) + 1); List<String> result = freq.keySet().stream() .sorted((a, b) -> freq.get(a) != freq.get(b) ? freq.get(b) - freq.get(a) : a.compareTo(b)) .limit(k) .collect(Collectors.toList());
🧠 How Devs Usually Mess This Up
Someone always forgets something. If you only sort by frequency, get ready for chaos whenever two words tie—Java’s hash order is more unpredictable than a manager’s lunch break. With heaps, if you skip customizing the comparator, you’ll get a ‘top K’ where ‘potato’ beats ‘apple’ and chaos reigns. And let’s not forget those who assume K can’t exceed the number of unique words (spoiler: it absolutely can). Pro tip: treat the tie-break as seriously as you treat your favorite coffee order.
🔍 What’s Actually Going On
Picture a spelling bee. Every word spelled gets a sticker. At the end, Ms. Lexicographical sorts the charts: most stickers on top, and for kids with the same count, she picks whoever’s name comes first alphabetically. In code, that’s counting, then two-level sorting: popularity first, then rigidity of the dictionary. Same as real search engine autocompletes or leaderboard apps—the algorithmic nerd’s version of award ceremonies.
🛠️ PseudoCode
- Count frequencies: Use a
HashMap<String, Integer>
and tally each word. - Sort or Heap: Decide if you’re going with a classic full sort (easy, O(N log N)), or a min-heap of size K with a custom comparator (harder, O(N log K)).
- If sorting: Put map keys into a list and sort by (frequency desc, lex asc).
- If min-heap: Use
PriorityQueue<String>
and a comparator that puts least-frequent (and reverse lex for ties) on top, so you end up with the actual best K.
- Return: After sort, grab K words; with a heap, pop and reverse because min-heap likes being backward.
// Count frequencies
Map<String, Integer> freq = new HashMap<>();
for each word in words:
freq[word] = freq.get(word, 0) + 1
// Sort
List words = freq.keys()
words.sort((a, b) -> freq[b] - freq[a] != 0 ? freq[b] - freq[a] : a.compareTo(b))
// or heap with custom comparator if you like pain
// Return top K
return words.subList(0, K)
💻 The Code
import java.util.*;
public class TopKFrequentWords {
public List<String> topKFrequent(String[] words, int k) {
Map<String, Integer> freq = new HashMap<>();
for (String word : words) {
freq.put(word, freq.getOrDefault(word, 0) + 1);
}
List<String> wordList = new ArrayList<>(freq.keySet());
wordList.sort((a, b) -> {
int countCompare = freq.get(b) - freq.get(a);
return countCompare != 0 ? countCompare : a.compareTo(b);
});
return wordList.subList(0, Math.min(k, wordList.size()));
}
}
⚠️ Pitfalls, Traps & Runtime Smacks
- Lexicographical tie-break: Miss this, and your answer is about as reliable as a vending machine after 4pm.
- K > unique words: Out-of-bounds? Not today. Use
Math.min(k, list.size())
for that extra hygienic touch. - Heap confusion: For min-heap, comparator should put least-relevant word at the top—the opposite of what you want at karaoke night.
- Complexity honesty: Sorting is fine unless your dataset is the size of the British Library. Otherwise, even interviewers just want to see if you can spell ‘comparator’.
🧠 Interviewer Brain-Tattoo
“Ties go to the dictionary—because even in code, alphabetical order is less controversial than rock-paper-scissors.”