[LeetCode] 49. Group Anagram

Description

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Example 2:

Input: strs = [""]
Output: [[""]]

Example 3:

Input: strs = ["a"]
Output: [["a"]]

Constraints:

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Explanation

To group anagrams, you need to consider the frequency of each character in the strings. Directly comparing character frequencies would typically require using structures like arrays or maps. However, these structures are not hashable in most programming languages, meaning we cannot use them directly as keys in a hash table.

Efficient Approach: Sorting Strings

A common and efficient approach is to sort each string. Sorting ensures that anagrams produce the same string when sorted. For instance, the strings “eat” and “tea” both become “aet” when sorted. This sorted string can then be used as a key in a hash table.

Sorting a string has a time complexity of O(Nlogā”N) which is generally acceptable given that string lengths are at most 100 characters.

Below is an implementation in Python:

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        hash_map = defaultdict(list)
        for str in strs:
            sorted_str = ''.join(sorted(str))
            hash_map[sorted_str].append(str)
        
        return list(hash_map.values())

In Python, the sorted function sorts the string and returns a list, which is then converted back to a string using ''.join().

Here is the equivalent implementation in Go:

func groupAnagrams(strs []string) [][]string {
    hashMap := make(map[string][]string)
    var ret [][]string
    for _, s := range strs{
        sortedStr := sorted(s)
        if v, ok := hashMap[sortedStr]; ok{
            hashMap[sortedStr] = append(v, s)
        }else{
            hashMap[sortedStr] = []string{s}
        }
    }

    for _, v := range hashMap{
        ret = append(ret, v)
    }
    return ret
}

func sorted(s string) string{
    runes := []rune(s)

    sort.Slice(runes, func(i,j int)bool{
        return runes[i] < runes[j]
    })

    return string(runes)
}

Since sorting a string is not directly supported in Go, we need to implement a comparator to sort the slice of runes.

Strings in Go consist of runes, which are another name for int32. We need to create a comparator that handles int32. Once the string is sorted, we proceed as before.

Note that on line 9, v is a copy of hashMap[sortedStr]. Therefore, after appending, you have to reassign the result back to hashMap[sortedStr].

Dive Deeper – Hashbable Arrays in Go

In Go, arrays are hashable if the element type is hashable. This includes:

  • bool
  • int (and all its variations)
  • uint (and all its variations)
  • float32
  • float64
  • complex64
  • complex128
  • string
  • pointers
  • channels
  • interface{} (if the dynamic type is hashable)
  • struct (if all fields are hashable)
  • array (if the element type is hashable)

Instead of using a sorted string, we can use a frequency count array to group anagrams:

func groupAnagrams(strs []string) [][]string {
    hashMap := make(map[[26]int32][]string)

    for _, s := range strs{
        fc := frequencyCount(s)
        if v, ok := hashMap[fc]; ok{
            hashMap[fc] = append(v,s)
        }else{
            hashMap[fc] = []string{s}
        }
    }
    var ret [][]string
    for _,v := range hashMap{
        ret = append(ret,v)
    }
    return ret
}


func frequencyCount(s string)[26]int32{
    var a [26]int32
    for _, v := range s{
        a[v-'a']++
    }
    return a
}

As Go supports various hashable types, we have one more option instead of using sorting string!

Dive Deeper – Runes in Golang

In Go, strings are encoded with UTF-8, where each character can be between 1 and 4 bytes. The UTF-8 encoding follows these rules:

  • 1 byte: 0xxxxxxx
  • 2 byte: 110xxxxx 10xxxxxx
  • 3 byte: 1110xxxx 10xxxxxx 10xxxxxx
  • 4 byte: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Converting a string to a slice of bytes does not account for multi-byte characters. Instead, Go uses the rune type, which is an alias for int32 and represents a Unicode code point. Handling characters as runes ensures that multi-byte characters are properly managed.

When working with global applications or multi-language data, always handle strings using runes instead of bytes to accommodate all possible characters.

Conclusion

Grouping anagrams efficiently involves understanding data structures and hashing mechanisms in the chosen language. By leveraging sorted strings or frequency counts, we can effectively utilize hash maps to group anagrams. Understanding the nuances of character encoding and data types further ensures robust and scalable solutions.

This refined explanation and code examples should help you understand and implement efficient anagram grouping in both Python and Go.

Leave a Reply

Your email address will not be published. Required fields are marked *