按关键词阅读: Improved An 英语论文 Watermarking Page web
1、An Improved Web Page WatermarkingZhu Ping1, Ding Wei2, Lu Ming21. College of Information and Security Engineering, Wuhan, 4300702. College of Computer Science and Technology, Wuhan, 430070Abstract: Web Page Watermarking is a research branch of the text watermarking. It is relatively difficult to emb 。
2、ed the watermarking into the Web Page. For the particularity of the Web Page, this paper proposes an effectively improved Web Page Watermarking program which can not only protect the Web Pages completeness from tampering, but also can protect the whole Web sites copyright, integrity and consistency. 。
3、 This program, which can be effectively used for copyright protection, is able to test whether the Web Page suffers tampering, and to locate tampering.Keywords: Text Digit Watermarking, Web Page Watermarking, Fragile Watermarking, Robust Watermarking1. IntroductionThe Digital Watermarking Technology 。
4、, the basic idea of which derives from the early steganography technique, secretly embeds the specific markers into the digital content. This kind of markers usually is invisible, and only can be seen through the special detector or reader. According to the different types of the carrier, the digita 。
5、l watermarking technology can be divided into Text Digit Watermarking, Image Watermarking, and Video Watermarking. The text usually consists of words, sentences, paragraphs, punctuations and other regular structures. It is not easy to embed the watermarking into the text and not to be found by invad 。
6、ers. With little redundant information in the text, the common text digital technology includes: the shift coding, word shift coding and feature coding and so on. The Web Page Watermarking in this paper is one of the text digital watermarking.Web Page is different from the ordinary plain text docume 。
7、nt. HTML document is a non-formatted text with labels and web page information constituted. Labels, not case-sensitive, is used to control the format and the display effect of web page information, and can be divided into single-label and dual-label. The single-label can be used alone with the forma 。
8、t as ;
the dual-label contains a start label and an end one with the format as web page information . At present, the program of embedding the text digital watermarking into the web page is based on characteristics of HTML grammar and labels. There are: method based on the invisible characters (for。
9、example, the web browser is non-sensitive to the extra Tab characters and Space characters in the HTML documents.), method based on the non-sensitive HTML labels and one based on the order of the label attributes.2. Word WatermarkingWord Watermarking is to generate a label string of every word by th 。
10、e Hash algorithm, to calculate the accumulated value of all the characters ASDII value in the label string, and finally with it as the seed of the pseudo-random function to generate the corresponding 0-1 coding of the word. However, in the Web Page, it is to extract all the words of the body parts b 。
11、etween the HTML labels and generate the corresponding watermarking information of every word. First, it is to encode each word by the SHA-1 security Hash algorithm and generate the 160 bit-length binary sequence (lets suppose the key is Key1), which means Hex string with the length of 40 and from 0。
12、to F;
then to make the accumulative operation of the ASCII value of each character in the Hexadecimal string sequence to get the sum;
with the sum as the seed of the pseudo-random sequence algorithm to generate the six binary 0-1 sequence that is the word watermarking of the word. Later these 0-1 se 。
13、quences should be converted into the corresponding “Spaces-Tabs” sequences, and embed them into the web page through the browsers invisibility. The generation process is shown below and expressed by the formula 1:WW(Wi) = Random(sum(Hash(Wi, Key1), i = 1,2,M (1)Each wordSHA-1encryption algorithm, He 。
14、x string c1,c2,c3, to get the cs code value and make accumulative operationKey1Word watermarking six 0-1 sequence, pseudo-random sequence generates algorithm Random, accumulative sum.Figure 1 the word watermarking generation process of the web pageAmong these, Wi means the No.i word in the body part 。
15、 of the web page;
and WW(Wi) indicates its corresponding watermarking information;
Key1 is the key in the Hash algorithm;
M is the number of the total words in the body parts;
sum is the accumulative algorithm process.3. Line WatermarkingLine Watermarking is to generate the character string of each。
【An|An Improved Web Page Watermarking英语论文】16、line through the Hash algorithm and calculate the accumulative value of all the ASCII value in the character string, then with it as the seed of the pseudo-random function to generate the corresponding 0-1 coding of the line.The below is the line watermarking generation process of the Web Page:First 。
17、, all the words in the line should be extracted, and to generate the respective word watermarking by the above method, next to operate on these word watermarking, and finally the line watermarking of the line is generated. The generation process is shown below and expressed by the formula2:LW(Li) =。
18、WW(Wi1)WW(Wi2)WW(Wij)WW(WiN) (2)Among these, Li means the No.i line;
and LW(Li) indicates the line watermarking of the line;
WW(Wij) denotes the No.i line, No.j words word watermarking. M is the number of the total lines and N is the number of the total words. Figure 2: the line watermarking generat 。
19、ion process of the web page4. Improved Watermarking Algorithm4.1 The generation of the watermarkingThe specific applications and the purposes of the text digital watermarking are different in each web page, so the demand on the function of the Robust Watermarking is different and the generation prog 。
20、ram of embedding the watermarking information into the web page is also different. Since the word watermarking and the line watermarking is used for protecting the web page from tampering, they and the specific contents of the web page should be closely connected and are randomly embedded into the w 。
21、eb page. The watermarking distributed in the Web navigator and corresponded with the web page is used to identify the copyright information, thus the watermarking information is generated by the owner information of the web page, the serial number and icon and others. In order to facilitate descript 。
22、ion, so just the English web page is taken into consideration and this program can be easily applied to the web page with other character formats.The watermarking generation process of the web page is described as follows:Each character in Java can be expressed through the form of the binary, and th 。
23、e binary sequence can be generated by the method of String to get the Bytes. To embed the adscription of the watermarking proof copyright, the author information or the serial number can be showed by the Bytes, and then be encoded into the binary sequence, next the watermarking sequence is achieved: 。
24、 Wm=Wi(i=1,2,m), finally the encrypted watermarking information by operating through the key sequence Key2=Kj(i=1,2,n) is Wm=Wi(i=1,2,m). The specific operation process can be expressed by the equation 3:Wi = Wi Kj ,i = 1 m;
j = i % n (3)Among these, m is the length of the watermarking sequence and。
25、n is the length of the key. It should be used circularly when the length of the initial watermarking sequence is greater than that of the key.4.2 Embedding of the WatermarkingThe functions of the watermarking in each web page are different, thus, so are the specific embedding processes, methods and。
26、the selection of the embedding points of various kinds of watermarking information. For the different types of the watermarking, “0” in the watermarking sequence indicates the letters inserted into the spaces or labels should maintain the lowercase which is not to be replaced;
“1” means the letters。
27、inserted into the spaces or labels should be capitalized. The specific inserting processes are as follows:1. The word watermarking inserting process of the Web pageSuppose there are M words in the body part of some web page, which can generate M word watermarking. Since spaces are used between the w 。
28、ords in the English web page to represent interval and the Web browser will automatically ignore the extra spaces between the words or the tabs, the watermarking information can be embedded behind each word with M positions. The initial position sequence is 1,2,M. In order to improve the invisibilit 。
29、y for the purpose that the real effective information can not be obtained even if the watermarking is illegally extracted, the watermarking of the word is not embedded behind the word, but generates a new position sequence L1,L2,LM through the pseudo-random number algorithm (shuffling algorithm) con 。
30、trolled under the Key3, then the word watermarking WW(Wi) is converted into the sequence “spaces-tabs” which will be embedded into its new corresponding location Li, that is to say, the watermarking generated by the No.i word is embedded behind the No. Li.2. The line watermarking inserting process o 。
31、f the Web pageSuppose there are N lines in the body part of certain web page, which can generate N line watermarking under the generation program of the line watermarking. Likewise, these will generate a new sequence through the pseudo-random number algorithm controlled by the Key4;
then the lime wa 。
32、termarking information is inserted into the new position. The specific description is below:If the line watermarking encoding length is more than the character numbers of the HTML tabs in this line, the partial watermarking information will be embedded into the labels by changing the labels uppercas 。
33、e or lowercase. As for the excess watermarking coding, it can be transformed into the “spaces-tabs” sequence and embedded behind the line by the embedding method of the line watermarking.If the line watermarking encoding length is less than the character numbers of the HTML tabs in this line, the wa 。
34、termarking will be embedded into the labels through the circular embedding.If the line watermarking encoding length is equal to the character numbers of the HTML tabs in this line, all the watermarking information will be embedded into the labels.3. Watermarking embedding process of the Web navigati 。
35、on pageFor the English Web page, the alphabets and Chinese Characters encoded with the UTF-8 format respectively symbolize the 8 bit and 24 bit binary sequence. Suppose there are N navigation pages and the identification information used to generate the watermarking contains M English letters, the l 。
36、ength of the generated and encrypted watermarking information is 8M bit which should be evenly embedded into N web pages. Thus, the successive lengths of the partial watermarking embedded in the N web pages are (8M/N, , 8M/N, 8M/N +8M%N).Firstly, the HTML document should be made a pretreatment. All。
37、the English letters in the labels (regardless of the attribute value part in the labels) are initialized into the lowercase and count the number of all the labels alphabets in the Start Label and End Label: N. A meaningless HTML label will be appended before the End Label to contain all the watermar 。
38、king information for this web page, such as, if the length of the watermarking information sequence embedded into the Web page is more than N. Thus, the carrier documents can contain all the watermarking information, meanwhile;
have little impact on the visual effect of the carries documents. Howeve 。
39、r, if the length of the watermarking information sequence embedded into the Web page is less than or equal to N, the watermarking information will be circularly embedded into the Web page until the label ends.Next, to traverse the pretreated and to be embedded HTML documents from the Start Label to。
40、the End Label . When the pointer points to the alphabet and the alphabet is HTML label, the replacing operation of the uppercase and lowercase format is below on the basis of the encrypted watermarking sequence Wi(i=1,2,m) calculated through the formula 3.To improve the invisibility, the embedding p 。
41、ositions of the word watermarking and the line watermarking should be disrupted through the pseudo-random number sequence generation algorithm, next the HTML documents will be traversed and the word watermarking will be embedded behind the disrupted position according to the English words stored in。
42、HashMap and the word watermarking key. Next, the line watermarking stored in the number group LineLine_Num will be embedded according to the disrupted positions. Finally, for the watermarking information in the navigation page, it is necessary to traverse the HTML documents from label and traverse t 。
43、he binary watermarking sequence Wm=Wi(i=1,2,m). Lets suppose their pointer is i, when I points to the alphabet which is the name of HTML label or the attribute name, the alphabet should be changed into the uppercase if Wi=1;
it keeps the same if Wi=0. It continues until im ends.4.3 The testing of th 。
44、e watermarkingThe Key3 and Key4 are needed to determine the embedded positions of word watermarking and line watermarking in order to detect the integrity of the Web pages content. For the to-be-tested ordinary web pages, the new word watermarking and line watermarking are regenerated through the wa 。
45、termarking generation algorithm;
meanwhile, the embedded position sequence of the watermarking information is calculated according to the key and the word watermarking and line watermarking are extracted from their corresponding positions in the Web page. The extracting principles include extracting 。
46、 “0” from the spaces behind the word or line, extracting “1” from the tabs behind the word or line, extracting “0” from the lowercase in HTML labels and extracting “1” from the uppercase. Finally, whether the word watermarking and line watermarking extracted from the to-be-tested web pages are the s 。
47、ame with the newly-generated word watermarking and line watermarking is the way to prove whether the contents of the web page suffer tampering. If they are completely the same, they are not tampered, otherwise, it is tampered.To prove the copyright of the digital works, for the web page correspondin 。
48、g to the navigation bar of the web site, firstly, the length of the embedded watermarking information in each navigation page is calculated through the method of embedding algorithm. Next, the HTML documents is to be traversed word by word from label to the one. If the pointer points to the alphabet 。
49、 which is the name of the label and is the lowercase, “0”is extracted;
if it is uppercase, “1”is extracted, until the extracted watermarking binary sequence length is equal to the watermarking length embedded into the web page. Finally, the partial watermarking information extracted from all the web 。
50、 pages corresponding to the navigation bar will be merged and the encrypted watermarking sequence is obtained. The non-encrypted binary sequence is generated according to the formula 5.3 and the key sequence Key2=Kj(i=1,2,n). The effective information (such as the author information of the web page。
51、or the serial number) used to generate watermarking is produced by the construction method of new String(Byte) to prove the copyright.5. Experimental AnalysisAfter the detailed description of the algorithm of the Web page watermarking, on the basis of which, the code should be composed for achieving 。
52、 related functions to verify the correctness of the algorithm. Figure 3 is the screenshot of the source code of the original web page;
figure 4 is the screenshot of the source code into which the fragile watermarking is embedded;
figure 5 is the source coed of the tampered web page.Figure 3 the sour 。
53、ce code of the original web page待添加的隐藏文字内容1Figure 4 the source code into which the fragile watermarking is embeddedFigure 5 the partial source code of the tampered Web pages may suffer from three kinds of invasion:One, the content of the web page is tampered, but the watermarking is complete;
Two, th 。
54、e content of the web page is tampered and the watermarking is damaged;
Three, the content of the web page is tampered and embedded into the forged watermarking.It is easy to conclude that the web page suffers from tampering by comparing the newly-generated watermarking with the extracted one, since t 。
55、he invaders do not know the key of generating the watermarking and generating the embedded position, and the forged watermarking is definitely different from the real watermarking code, besides, the forged watermarking is not embedded into the real position.The watermarking information is evenly emb 。
56、edded into the Web navigation page by using the web Robust watermarking program. When the copyright is needed proving, partial watermarking information will be extracted from these navigation web pages and combined into the complete copyright information to identify the ownership. Figure 6 is the so 。
57、urce code embedded into the Robust watermarking:Figure 6 the source code embedded into the Robust watermarkingThe watermarking is secretly embedded into the HTML labels, so whether the body part of the web page is modified, deleted, or copied would not damage the watermarking. Meanwhile, the waterma 。
58、rking embedded into each web page is only a part of the whole one generated by the copyright information, thus even if the invades get to know certain web pages watermarking, but can not deduce the copyright information, let alone tamper.6. ConclusionThis paper puts forward an effectively improved W 。
59、eb Page Watermarking program which applies both the word watermarking and line watermarking, therefore, it can not only protect the completeness of the web page, but also can protect the copyright, integrity and consistency. The experimental result shows that this program is able to effectively test 。
60、 whether the web page suffers from tampering, and can locate the tampering.7. Bibliography1 X.Z.Long, H.Peng, C.L.Zhang, et.al. A fragile watermarking scheme for tamper-proof of web pages, 2009 WASE International Conference on Information Engineering, 2009, 2:155-158.2 Jin Cong, Xu Hongfeng, Zhang X 。
61、iaoliang. Web pages tamper-proof method using virus-based watermarking. In Proc of Audio, Language and Image Processing. Shanghai, China. 2008:1012-1015.3 Zhao Q J, Lu H T. A PCA-based watermarking scheme for tamper-proof of web pagesJ. Pattern recognition. 2005, 38(8):1321-1323.4 Wu C C, Chang C C, 。
62、 Yang S R. An efficient fragile watermarking for web pages tamper-prrof. Lecture Notes in Computer Science. 2007, 4537:654-663.5 Liu Xiangyang, Lu Hongtao. Fragile Watermarking Schemes for Tamper-proof Web Pages. In Proc of the 5th international symposium on Neural Networks: Advances in Neural Networks. Beijing, China. 2009:552-559.6 龙银香. 基于HTML标记的信息隐藏方法J. 微计算机信息. 200622(21):129-131.7 胡岚 , 尤新刚. 现有的超文本(HTML) 信息隐藏技术分析C. 全国第三届信息隐藏学术研讨会论文集(CIHW2001). 西安:西安电子科技大学出版社. 2001:62-67 。

稿源:(未知)
【傻大方】网址:/a/2021/0526/0022272465.html
标题:An|An Improved Web Page Watermarking英语论文