PolarSPARC

Introduction to Trie Data Structure - Part 1


Bhaskar S 08/17/2019


👧 Hey Bob !!! What's up dude - you seem to be lost in deep thought ? All okay with you ???

👦 Hey Alice !!! Thanks for noticing and asking. Yeah, have a problem to solve and don't have an anwer yet. May be chatting with you will open up ideas.

👧 Happy to discuss and may be will learn something as well. What is the problem you trying to solve ???

👦 Here is the issue - want to store some words from the English dictionary.

👧 That is easy - why not use a Map data structure ???

👦 Here is the catch though. One could also check if a prefix of the word (from the begining) exists.

👧 Hmm. Interesting. How about we store all the prefixes of the word including the word in the Map data structure ???

👦 Hmm. That is an interesting idea. Let us check it out.

After a few mins on the white-board ...🕐...🕑...

👦 Dank. Does not work. Here is the case. Assume we added the word hunter along with its prefixes h, hu, hun, hunt, and hunte. How does one distinguish between the prefix hunt and the word hunt ??? See the dilemma ???

👧 I see. Damn - thought we figured. This is indeed an interesting problem.

👲 Hey guys !!! Looks like you are having some interesting discussion. Would you mind if me joined ???

👧 Hey Charlie !!! Of course not. Join the party.

Alice and Bob describe the problem to Charlie...

👲 What a timimg !!! Recently read about an interesting data structure called Trie (pronounced as *Try*). It is also known as a Prefix Tree or a Radix Tree.

👦 👧 Interesting - tell us more about it.

👲 Well, it a Tree like data structure and is used for reTrieval (or search) of prefixes or words (or keys) in an efficient manner. It is a type of Search Tree.

👲 Given any word, each letter from the word is represented as a Node in the Trie. The starting point of the Trie is the Root Node. For example, if we are given the words ant and art, the following illustration depicts the Nodes in the example Trie:

Trie for Ant and Art
Figure-1

👲 For our discussion, we will only assume words from the English dictionary. This means each Node in the Trie can have up to 26 children (26 letters from the English alphabet).

👲 Each Node in the Trie will contain the following members:

👲 The following is the implementation of a Node of a Trie using Python:

TrieNode.py
#
# Trie Node
#
class TrieNode(object):
    def __init__(self):
        self._ancestor = None
        self._children = {}
        self._item = None
        self._leaf = False

    def __str__(self, *args, **kwargs):
        return "item => %s, children => %s, leaf => %s" % (self._item, list(self._children.keys()), self._leaf)

    @property
    def ancestor(self):
        return self._ancestor

    @ancestor.setter
    def ancestor(self, node):
        self._ancestor = node

    @property
    def item(self):
        return self._item

    @item.setter
    def item(self, item):
        self._item = item

    @property
    def children(self):
        return self._children

    @children.setter
    def children(self, children: dict):
        self._children.update(children)

    @property
    def leaf(self):
        return self._leaf

    @leaf.setter
    def leaf(self, leaf):
        self._leaf = leaf

    def clear(self):
        self._children.clear()

    def get_node(self, letter):
        return self._children.get(letter)

    def add_node(self, letter, node):
        self._children[letter] = node

    def remove_node(self, letter):
        del self._children[letter]

    def get_children_keys(self):
        return list(self._children.keys())

    def is_children_empty(self):
        if len(self._children) == 0:
            return True
        return False

👲 Next, let me explain how to insert a word into a Trie data structure using an example. Assume the word is ant. The first letter in the word is a. Starting at the Root Node as the current Node, lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (a). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the current Node to reference the newly created letter Node.

👲 The second letter in the word is n. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (n). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the current Node to reference the newly created letter Node.

👲 The last letter in the word is t. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, create a new Node, set the value of the item to be the letter (t). Set the ancestor to be the current Node and add the newly created node as the reference to the letter in the children of the current Node. Set the leaf to be True since this is the last letter of the word. And we are done.

👲 The following is the code implementation to insert a word into a Trie data structure using Python:

insert_word()
def insert_word(self, word):
    if word and len(word) > 0:
        node = self._root
        for ch in word.lower():
            # print("insert_word: ch => {}".format(ch))
            ch_node = node.get_node(ch)
            if not ch_node:
                # print("insert_word: new node for ch => {}".format(ch))
                ch_node = TrieNode()
                ch_node.item = ch
                ch_node.ancestor = node
                node.add_node(ch, ch_node)
                # print("insert_word: node => {}".format(node))
            node = ch_node
        # Last node means - it is a word
        node.leaf = True
        print("insert_word: word => {} inserted".format(word.lower()))

👲 Assume we inserted the follwoing 10 words into the Trie data structure:

👲 The following illustration depicts the Nodes in the example Trie:

Trie for 10 Words
Figure-2

👲 Next, let me explain how to look-up a prefix of a word in a Trie data structure. Assume the prefix is ben. The first letter in the prefix is b. Starting at the Root Node as the current Node, lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, the prefix is not found and we are done.

👲 The second letter in the prefix is e. Lookup the letter in the children of the current Node. If found, set the current Node to reference the letter Node. Else, the prefix is not found and we are done.

👲 The last letter in the prefix is n. Lookup the letter in the children of the current Node. If found, the prefix is found successfully and we are done. Else, the prefix is not found and we are done.

👲 The following is the code implementation to search a prefix from a Trie data structure using Python:

is_prefix()
def is_prefix(self, prefix) -> (bool, TrieNode):
    present = False
    node = None
    if prefix and len(prefix) > 0:
        present = True
        node = self._root
        for ch in prefix.lower():
            # print("is_prefix: ch => {}".format(ch))
            ch_node = node.get_node(ch)
            if not ch_node:
                present = False
                break
            node = ch_node
    return present, node

👧 👦 WoW !!! This is awesome.

👲 Next, let me explain how to look-up a word in a Trie data structure. Assume the word is art. We use the same logic as looking up the prefix art. Except, we add one more check to see if the Node corresponding to the last letter t has the leaf flag set as True. That is it.

👲 The following is the code implementation to search a word from a Trie data structure using Python:

is_word()
def is_word(self, word) -> (bool, TrieNode):
    (present, node) = self.is_prefix(word)
    if present and node.leaf:
        return True, node
    return False, None

👧 👦 This is really COOL that we can leverage the is_prefix method to find a word.

👲 We are not done yet. Next, let me explain how to remove a word from a Trie data structure. Assume the word is ball. For this, we use the same logic as looking up the word ball. If the word exists, we also get a reference to the last letter Node (l). If there are no children in the last letter Node, we remove the reference to this last Node from its ancestor Node. We then move to the ancestor Node and perform the same step of removing the Node (only if children is empty). We continue this process till we hit the Root Node.

👲 The following illustration depicts the state of the Trie after the word ball is removed:

Trie After Remove
Figure-3

👲 The following is the code implementation to remove a word from a Trie data structure using Python:

remove_word()
def remove_word(self, word):
    (present, node) = self.is_word(word)
    if present:
        while node != self._root:
            if node.is_children_empty():
                node.ancestor.remove_node(node.item)
                node = node.ancestor
            else:
                break
        print("remove_word: word => {} deleted".format(word))

👲 The following is the source code for a basic Trie data structure using Python:

BasicTrie
#
# Name:   BasicTrie
# Author: Bhaskar S
# Site:   https://www.polarsparc.com
#

from Trie.TrieNode import TrieNode


#
# BasicTrie Data Structure
#
class BasicTrie(object):
    def __init__(self):
        self._root = TrieNode()

    def insert_word(self, word):
        if word and len(word) > 0:
            node = self._root
            for ch in word.lower():
                # print("insert_word: ch => {}".format(ch))
                ch_node = node.get_node(ch)
                if not ch_node:
                    # print("insert_word: new node for ch => {}".format(ch))
                    ch_node = TrieNode()
                    ch_node.item = ch
                    ch_node.ancestor = node
                    node.add_node(ch, ch_node)
                    # print("insert_word: node => {}".format(node))
                node = ch_node
            # Last node means - it is a word
            node.leaf = True
            print("insert_word: word => {} inserted".format(word.lower()))

    def is_prefix(self, prefix) -> (bool, TrieNode):
        present = False
        node = None
        if prefix and len(prefix) > 0:
            present = True
            node = self._root
            for ch in prefix.lower():
                # print("is_prefix: ch => {}".format(ch))
                ch_node = node.get_node(ch)
                if not ch_node:
                    present = False
                    node = None
                    break
                node = ch_node
        return present, node

    def is_word(self, word) -> (bool, TrieNode):
        (present, node) = self.is_prefix(word)
        if present and node.leaf:
            return True, node
        return False, None

    def remove_word(self, word):
        (present, node) = self.is_word(word)
        if present:
            while node != self._root:
                if node.is_children_empty():
                    node.ancestor.remove_node(node.item)
                    node = node.ancestor
                else:
                    break
            print("remove_word: word => {} deleted".format(word))

    def get_words(self) -> list:
        words = []
        node_stack = []
        prefix_stack = []
        prefix = ""
        node_stack.append(self._root)
        prefix_stack.append(prefix)
        while len(node_stack) > 0:
            node = node_stack.pop()
            prefix = prefix_stack.pop()
            keys = node.get_children_keys()
            # print("get_words: node.item => {}, keys => {}, prefix => {}".format(node.item, keys, prefix))
            if len(keys) > 0:
                for i in range(len(keys) - 1, -1, -1):
                    kn = node.get_node(keys[i])
                    pf = node.item if node.item else ""
                    # print("get_words: node.item => {}, pf => {}".format(node.item, pf))
                    node_stack.append(kn)
                    prefix_stack.append(prefix+pf)
                    if node.leaf:
                        words.append(prefix+pf)
            else:
                pf = node.item if node.item else ""
                # print("get_words: node.item => {}, pf => {}".format(node.item, pf))
                if node.leaf:
                    words.append(prefix + pf)
        return words

    def print_details(self):
        node_stack = []
        node = self._root
        print("--------------------------------------------------")
        while True:
            keys = node.get_children_keys()
            print("print_details: item = {}, children = {}, leaf = {}".format(node.item, keys, node.leaf))
            for i in range(len(keys) - 1, -1, -1):
                kn = node.get_node(keys[i])
                node_stack.append(kn)
            if len(node_stack) == 0:
                break
            node = node_stack.pop()
        print("--------------------------------------------------")


#
# Main function
#
def main():
    trie = BasicTrie()

    trie.insert_word("ant")
    trie.insert_word("art")
    trie.insert_word("ball")
    trie.insert_word("bend")
    trie.insert_word("car")
    trie.insert_word("cart")
    trie.insert_word("hunt")
    trie.insert_word("hunter")
    trie.insert_word("hunted")
    trie.insert_word("hung")

    (present, _) = trie.is_prefix("ben")
    print("Is prefix ben present => {}".format(present))
    (present, _) = trie.is_prefix("bet")
    print("Is prefix bet present => {}".format(present))
    (present, _) = trie.is_prefix("hunt")
    print("Is prefix hunt present => {}".format(present))

    (present, _) = trie.is_word("ball")
    print("Is word ball present => {}".format(present))
    (present, _) = trie.is_word("car")
    print("Is word car present => {}".format(present))
    (present, _) = trie.is_word("care")
    print("Is word care present => {}".format(present))
    (present, _) = trie.is_word("cart")
    print("Is word cart present => {}".format(present))
    (present, _) = trie.is_word("hunter")
    print("Is word hunter present => {}".format(present))

    trie.remove_word("ball")
    trie.remove_word("ben")
    trie.remove_word("hunter")

    (present, _) = trie.is_prefix("ba")
    print("Is prefix ba present => {}".format(present))
    (present, _) = trie.is_prefix("ben")
    print("Is prefix ben present => {}".format(present))

    (present, _) = trie.is_word("art")
    print("Is word art present => {}".format(present))
    (present, _) = trie.is_word("ball")
    print("Is word ball present => {}".format(present))
    (present, _) = trie.is_word("hunter")
    print("Is word hunter present => {}".format(present))

    all_words = trie.get_words()
    print("All words from the trie => {}".format(all_words))

    trie.print_details()


#
# Invoke main()
#
main()

👲 Executing the above Python code will generate the following output:

Output.1

insert_word: word => ant inserted
insert_word: word => art inserted
insert_word: word => ball inserted
insert_word: word => bend inserted
insert_word: word => car inserted
insert_word: word => cart inserted
insert_word: word => hunt inserted
insert_word: word => hunter inserted
insert_word: word => hunted inserted
insert_word: word => hung inserted
Is prefix ben present => True
Is prefix bet present => False
Is prefix hunt present => True
Is word ball present => True
Is word car present => True
Is word care present => False
Is word cart present => True
Is word hunter present => True
remove_word: word => ball deleted
remove_word: word => hunter deleted
Is prefix ba present => False
Is prefix ben present => True
Is word art present => True
Is word ball present => False
Is word hunter present => False
All words from the trie => ['ant', 'art', 'bend', 'car', 'cart', 'hunt', 'hunted', 'hung']
--------------------------------------------------
print_details: item = None, children = ['a', 'b', 'c', 'h'], leaf = False
print_details: item = a, children = ['n', 'r'], leaf = False
print_details: item = n, children = ['t'], leaf = False
print_details: item = t, children = [], leaf = True
print_details: item = r, children = ['t'], leaf = False
print_details: item = t, children = [], leaf = True
print_details: item = b, children = ['e'], leaf = False
print_details: item = e, children = ['n'], leaf = False
print_details: item = n, children = ['d'], leaf = False
print_details: item = d, children = [], leaf = True
print_details: item = c, children = ['a'], leaf = False
print_details: item = a, children = ['r'], leaf = False
print_details: item = r, children = ['t'], leaf = True
print_details: item = t, children = [], leaf = True
print_details: item = h, children = ['u'], leaf = False
print_details: item = u, children = ['n'], leaf = False
print_details: item = n, children = ['t', 'g'], leaf = False
print_details: item = t, children = ['e'], leaf = True
print_details: item = e, children = ['d'], leaf = False
print_details: item = d, children = [], leaf = True
print_details: item = g, children = [], leaf = True
--------------------------------------------------

👲 Hope was able to add value to this discussion and share what learnt recently.

👧 👦 This was really super useful and we learnt something new today. Thanks for sharing !!!

👲 You welcome. Well, the algorithm for the basic Trie data structure we just covered isn't space efficient. We should discuss that option as well. Well, am getting late for my next meeting, so let me go ...



© PolarSPARC