HDU 3695 Computer Virus on Planet Pandora (AC自动机)


Computer Virus on Planet Pandora

Time Limit: 6000/2000 MS (Java/Others)    Memory Limit: 256000/128000 K (Java/Others)

Total Submission(s): 2879    Accepted Submission(s): 808

Problem Description

    Aliens on planet Pandora also write computer programs like us. Their programs only consist of capital letters (‘A’ to ‘Z’) which they learned from the Earth. On
planet Pandora, hackers make computer virus, so they also have anti-virus software. Of course they learned virus scanning algorithm from the Earth. Every virus has a pattern string which consists of only capital letters. If a virus’s pattern string is a substring of a program, or the pattern string is a substring of the reverse of that program, they can say the program is infected by that virus. Give you a program and a list of virus pattern strings, please write a program to figure out how many viruses the program is infected by.
 

Input
There are multiple test cases. The first line in the input is an integer T ( T<= 10) indicating the number of test cases.

For each test case:

The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.

Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there
are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.

The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and
“compressors”. A “compressor” is in the following format:

[qx]

q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means
‘KKKKKK’ in the original program. So, if a compressed program is like:

AB[2D]E[7K]G

It actually is ABDDEKKKKKKKG after decompressed to original format.

The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.
 

Output
For each test case, print an integer K in a line meaning that the program is infected by K viruses.
 

Sample Input
3 2 AB DCB DACB 3 ABC CDE GHI ABCCDEFIHG 4 ABB ACDEE BBB FEEE A[2B]CD[4E]F
 

Sample Output
0 3 2
Hint
In the second case in the sample input, the reverse of the program is ‘GHIFEDCCBA’, and ‘GHI’ is a substring of the reverse, so the program is infected by virus ‘GHI’.
 

Source
 

题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3695

题目大意:求不同的单词或其倒叙在文章中出现的次数,文章可能以压缩的形式展现,[qx]表示连续‘x‘个q

题目分析:因为要考虑倒叙,能想到两种方案,一个是用单词和其倒叙构建字典树,由于这题数据量太大,这样会MLE,只能采取第二种方法,将文章正序匹配一次再倒叙匹配一次,在这里我用了一句while(i < (int)strlen(get)) {...}因为这个T了一下午,改成int l = strlen(get);
while(i < l) {...}就行了,不明白为什么

#include <cstdio>
#include <cstring>
#include <queue>
#include <algorithm>
using namespace std;
int const MAX = 5100005;
char s[1005], get[MAX], text[MAX];

struct node
{
    node *next[26];
    node *fail;
    bool end;
    node()
    {
        memset(next, NULL, sizeof(next));
        fail = NULL;
        end = false;
    }
};

void Insert(node *p, char *s)
{
    for(int i = 0; s[i] != '\0'; i++)
    {
        int idx = s[i] - 'A';
        if(p -> next[idx] == NULL)
            p -> next[idx] = new node();
        p = p -> next[idx];
    }
    p -> end = true;
}

void AC_Automation(node *root)
{
    queue <node*> q;
    q.push(root);
    while(!q.empty())
    {
        node *p = q.front();
        q.pop();
        for(int i = 0; i < 26; i++)
        {
            if(p -> next[i])
            {
                if(p == root)
                    p -> next[i] -> fail = root;
                else
                    p -> next[i] -> fail = p -> fail -> next[i];
                q.push(p -> next[i]);
            }
            else
            {
                if(p == root)
                    p -> next[i] = root;
                else
                    p -> next[i] = p -> fail -> next[i];
            }
        }
    }
}

int Query(node *root, char *s)
{
    int ans = 0;
    node *p = root;
    for(int i = 0; s[i] != '\0'; i++)
    {
        int idx = s[i] - 'A';
        while(!p -> next[idx] && p != root)
            p = p -> fail;
        p = p -> next[idx];
        if(!p)
        {
            p = root;
            continue;
        }
        node *tmp = p;
        while(tmp != root)
        {
            if(tmp -> end)
            {
                ans ++;
                tmp -> end = false;
            }
            else
                break;
            tmp = tmp -> fail;
        }
    }
    return ans;
}

void reverse(char *s)
{
    char tmp;
    int len = strlen(s) - 1;
    for(int i = 0; i < len / 2 + 1; i++)
        swap(s[i], s[len - i]);
    s[len + 1] = '\0';
}

int main()
{
    int T;
    scanf("%d", &T);
    while(T--)
    {
        int n, ans = 0;
        node *root = new node();
        memset(get, 0, sizeof(get));
        scanf("%d", &n);
        while(n--)
        {   
            scanf("%s", s);
            Insert(root, s);
        }
        AC_Automation(root);
        scanf("%s", get);
        int num = 0, i = 0, l = strlen(get);
        while(i < l)
        {
            if(get[i] <= 'Z' && get[i] >= 'A')
                text[num++] = get[i++];
            else if(get[i] == '[')
            {
                i++;
                int len = 0;
                while(get[i] >= '0' && get[i] <= '9')
                    len = 10 * len + (get[i++] - '0');
                for(int j = 0; j < len; j++)
                    text[num++] = get[i];
                i += 2;
            }
        }
        text[num] = '\0';
        ans += Query(root, text);
        reverse(text);
        ans += Query(root, text);
        printf("%d\n", ans);
    }
}


郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。