Trie Data Structure and Finding Patterns in a Collection of Words
- Get link
- X
- Other Apps
I faced a very hard problem recently about finding substrings based on a collection of patterns.
Example
For words = ["Apple", "Melon", "Orange", "Watermelon"]
and parts = ["a", "mel", "lon", "el", "An"]
, the output should be findSubstrings(words, parts) = ["Apple", "Me[lon]", "Or[a]nge", "Water[mel]on"]
.
While "Watermelon"
contains three substrings from the parts
array, "a"
, "mel"
, and "lon"
, "mel"
is the longest substring that appears first in the string.
Note: Number of words and number of parts can be huge! A brute force method or careless usage of Java String methods would make it impossible to handle a large number of words and parts (patterns).
Solution Approach
Whenever we want to search patterns inside a string, we should think about the Trie data structure:
https://en.wikipedia.org/wiki/Trie
In computer science, a trie, also called prefix tree, is a type of search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters.
public class TrieST<Value> {
private static final int R = 256;
private Node root;
private int n;
public static class Node {
public Object val;
public Node[] next = new Node[R];
}
public void put(String key, Value val) {
if (key == null) throw new IllegalArgumentException("first argument to put() is null");
else setRoot(put(getRoot(), key, val, 0));
}
private Node put(Node x, String key, Value val, int d) {
if (x == null) x = new Node();
if (d == key.length()) {
if (x.val == null) n++;
x.val = val;
return x;
}
char c = key.charAt(d);
x.next[c] = put(x.next[c], key, val, d+1);
return x;
}
public Node getRoot() {
return root;
}
public void setRoot(Node root) {
this.root = root;
}
}
- Get link
- X
- Other Apps
Comments