当前位置:网站首页>Repeated DNA sequences for leetcode topic resolution

Repeated DNA sequences for leetcode topic resolution

2022-06-23 06:17:00 ruochen

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:

"AAAAACCCCC", "CCCCCAAAAA".

Review bitmap . Bitwise operation ,A C G T Use the following bits Express :

A   00
C   01
G   10
T   11

therefore 10 Two consecutive characters , It only needs 20 Bit representation , And one int(32 position ) I can represent . Defining variables hash, after 20 Bit represents a sequence of strings , Other digits 0 .

Define a set Used to store what has already appeared hash, New calculation hash when , If there has been , Just put the result set in .

    public List<String> findRepeatedDnaSequences(String s) {
        if (s == null || s.length() < 11) {
            return new ArrayList<String>();
        }
        int hash = 0;
        Set<Integer> appear = new HashSet<Integer>();
        Set<String> set = new HashSet<String>();
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            hash = (hash << 2) + map.get(c);
            hash &= (1 << 20) - 1;
            if (i >= 9) {
                if (appear.contains(hash)) {
                    set.add(s.substring(i - 9, i + 1));
                } else {
                    appear.add(hash);
                }
            }
        }
        return new ArrayList<String>(set);
    }
原网站

版权声明
本文为[ruochen]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/01/202201141141224867.html