The WeakReference class, monitoring memory leak and garbage collection in a Java application

Image
 Below is a Stack implementation that uses an internal resizeable array structure.  public class MyStack< T > implements Stack< T > { private static final int CAPACITY = 100 ; private Object[] array ; private int pos = 0 ; public MyStack () { this . array = new Object[ CAPACITY ] ; } @Override public void push ( T item) { if ( pos >= array . length / 2 ) { Object[] newArray = new Object[ pos * 2 ] ; System. arraycopy ( array , 0 , newArray , 0 , array . length ) ; array = newArray ; } array [ pos ++] = item ; } @Override public T pop () { if (isEmpty()) { throw new RuntimeException( "empty stack" ) ; } @SuppressWarnings ( "unchecked" ) T item = ( T ) array [ pos - 1 ] ; pos -= 1 ; return item ; } @Override @SuppressWarnings ( "unchecked" ) public T peek...

Java Bit Manipulation For Repeated DNA Sequences


Problem description from LeetCode:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solution:
The idea is that we can use 2 bits to store a letter. 
We use a map to count occurences of 10 letter sequences.
When we get a new letter, we shift 2 bits to the left and then append the new letter.
After appending 11th character, we must remove the first char on the left.
To remove it, since we store 10 letters in 20 bits, our bit mask is: 0xFFFFF

Source Code:

public class RepeatedDNA {

    public static List findRepeatedDnaSequences(String string) {

        if (string.isEmpty()) {
            return new ArrayList<>();
        }

        if (string.length() <= 10) {
            return new ArrayList<>();
        }

        List result = new ArrayList<>();

        char[] s = string.toCharArray();

        Map map = new HashMap<>();

        int code = 0;
        for (int i = 0; i < 10; ++i) {
            code = (code << 2) | mapping(s[i]);
            //    System.out.println(code);
        }

        map.put(code, 1);

        for (int i = 10; i < s.length; ++i) {

            code = ((code << 2) & (0xFFFFF)) | mapping(s[i]);

            // System.out.println(Integer.toBinaryString(code));

            Integer count = map.get(code);

            if (count == null) {
                map.put(code, 0);
                count = 0;
            }

            if (count == 1) {
                result.add(string.substring(i - 9, i + 1));
            }

            map.put(code, count + 1);
        }
        
        return result;
    }

    static int mapping(char s) {
        if (s == 'A') return 0;
        if (s == 'C') return 1;
        if (s == 'G') return 2;
        if (s == 'T') return 3;

        return -1;
    }

    public static void main(String[] args) {

        List result = findRepeatedDnaSequences(
                "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
        );

        for (String item : result) {
            System.out.println(item);
        }
    }
}

Comments

Popular posts from this blog

Trie Data Structure and Finding Patterns in a Collection of Words

My Crappy Looking Solution to "Binary Tree Common Ancestor" Problem

A Graph Application in Java: Using WordNet to Find Outcast Words