傻大方


首页 > 知识库 > >

statistical|A statistical approach for sex identification in chat mediums


按关键词阅读: approach mediums statistical chat in identificati sex for

1、Cemal KSE, Vasif NAB?YEV, zcan ZYURTA STATISTICAL APPROACH FOR SEX IDENTIFICATION IN CHAT MEDIUMS Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University, 61080 Trabzon, TURKEYckose, vasif, oozyurtktu.edu.trAbstract. Chat mediums are becoming an important part of h 。

2、uman life in societies and provide quite useful information about people such as current interests, habits, social behaviors and tendency of the people. In this study, we have presented an identification system that is designed to identify the sex of a person in a Turkish chat medium. Here, the sex。

3、identification is taken as a base study in the information mining in chat mediums. This identification system acquires data from a chat medium, and then automatically detects the chatters sex from the information exchanged between chatters and compares them with the known identities of the chatters. 。

4、 To do this task a simple discrimination function is proposed. The system has achieved over accuracy 80% in the sex identification in the real chat medium.1. IntroductionA chat medium contains a vast amount of information, which is potentially relevant to a societys current interests, habits, social 。

5、 behaviors, crime and other tendencies. Users may spend a large portion of their time to find out information in chat mediums. An intelligent system may help the users in finding the interested information in the medium 1, 2, 3, 6, 8. In a chat conversation, chatter considers the corresponding chatt 。

6、ers sex, and the course and contents of the conversation may be shaped according to the corresponding persons sexual identity. Therefore, an example identification system is implemented to determine chatters sex identity in chat mediums. To do this, many conversations are acquired from a chat medium 。

7、s designed on purpose, and then statistical results are obtained from the conversations 4,5, 7. These results are used to determine weighting coefficients of the proposed discrimination function. The proposed function includes some important parameters representing a group of words and signs such as 。

8、 abbreviations, interjections, shouting, and sex and interest related words. Each weighting coefficient of the proposed function is determined with respect to usage frequency of words in a group and determinative characteristic of each word groups. In this paper we presented an intelligent identific 。

9、ation system to collect information from chat mediums and evaluate the information for sex identification 5,7. This system with the discrimination function is evaluated on the data acquired from a purposely-designed chat system. Performance of the system is also measured in real chat mediums. The re 。

10、st of this paper is organized as follows. The proposed discrimination function for sex identification is presented in Section 2. A detailed description of methods used in the system is given in the same section. The implementation and results are discussed in Section 3. The conclusion and future wor 。

11、k are given in Section 4.2. The Identification SystemTo evaluate the identification system, real information is collected and extracted from the chat mediums. Also some statistical data collected from specially designed medium is used to evaluate the discrimination function. The most frequently used 。

12、 signs from the specially designed chat medium (SDCM) and the (mIRC) or real Internet medium (RIM) are also used to evaluate the system.2.1. Words and Word GroupsIn a chat medium, many word groups may be defined to identify chatters sex in a dialogue. In this study, eighth word groups are defined to 。

13、 cover as many sex related concepts and subjects as possible in a chat medium. These groups are abbreviations and signs, slang and jargon words, politeness and delicacy words, interjections and shouting, sex and age related words, question words, particle and conjunction words, and other word groups 。

14、. Table 1. Some most frequently used words in each word groupsNoAbbreviationand signsSlang and jargon wordsPoliteness delicacy wordsInterjections Shouting words1Hi (Slm) My son! (Oglum) Nice (Gzel)Hey!/Man! (Yaw)2Answer (Cvp)Man! (Lan)Thanks (Tk)Hmm (Hmm)3What is the news (Nbr)Uncle! (Day!)Well done 。

【statistical|A statistical approach for sex identification in chat mediums】15、 (Aferin)And, soo (Ee)4You! (u)Go away! (Defol)Yes! (Efendim)Oh! (Aa)5Thank you (tk)Repentance! (Tvbe!)You (Siz)Well (?i)NoParticle and conjunction wordsAge and sexuality related wordsQuestionWordsOther words1Such/so/that (yle)Age (Ya) What (for)? (Niye?) You (Sen)2If not/otherwise (Yoksa)Sexuality。

16、(Cinsiyet)Why? (Neden?)I/me (Ben)3In order to (Diye)My love (Akm)Which? (Hangi?)If only. (Olsun)4Another /Other/ (Baka)My lady (Bayanm)Where? (Nerde?)You (Seni)5Thus/so/such (Byle)My man/gent. (Erkeim) Where are you? (nerdesin?)Look (Bak)These groups and some important words in the groups are listed 。

17、 in Tables 1. The weight coefficients of each word group are assigned related to usage frequencies and determinative power of words in each group.2.2. Statistical Sex IdentificationA simple discrimination function is designed to identify sex of a person in a chat medium. This function considers each 。

18、 word in conversations separately and collectively. Therefore, statistical information related to each chosen word is collected from the purposely designed and Internet chat mediums. By using the statistical information a weight coefficient is determined for each word in each group. Practically, wei 。

19、ght coefficient of any word is determined by equation (1).(1)where, is the usage frequency of a word by female chatters, is the usage frequency of a word by male chatters, is the weight coefficient of a word in a word group for female, and is the weight coefficient of a word in a word group for male 。

20、. Each weight coefficient is normalized into the interval from 0 to 1, and then each word in a group is also normalized by the number of words in the group that exists in the conversation. If a word is female dominant, varies from 0.5 to 1 but if the word is male dominant, varies from 0 to 0.5. For。

21、each conceptually related word group, a sexual identity value is calculated by equation (2). (2)(3)where, gi varies from 0 to 1 and determines the chatters sexual identity as female or male for i.th word group, is the weight coefficient of j.th word in i.th word group and varies related to the numbe 。

22、r of words in the interested text, represent the existing j.th words in the interested text (if a word exists in the text, then =1.0 else = 0.0), k is the number of word in i.th word group and is normalization divider for the current number of existing words in the i.th word group and calculated by。

23、equation (3). As explained before, words are also classified into several groups considering the conceptual relations. Thus, the importance of some word groups can be emphasized collectively. So, several word groups are defined considering words acquired from the conversations in the chat mediums. A 。

24、 weight coefficient is also determined for each word group. Then, the proposed discrimination function is formed for the sex identification as Equation (4). The equation can be used to determine sex identity of any chatter in a conversation. (4)where, varies from 0 to 1 and determines the chatters s 。

25、exual identity as female or male, is the weight coefficient for i.th female or male word group and is normalization divider for the current number of existing groups in a conversation and it is calculated by equation (5).(5)where, is the weight coefficient of i.th groups. Hence, the weight coefficie 。

26、nts of each group are determined according to dominant sexual identity of the group. Then, the sex of the chatters may be identified as female when is determined between 0.5 and 1. On the other hand, chatters may be identified as male when is determined between 0.0 and 0.5. Here, the accuracy of the 。

27、 results increases that it shows female or male gender when approaches to 0.0 and 1.0 respectfully.3. ResultsIn this paper, we have presented a full-scale implementation of a chat system to collect information from conversations and a method to identify chatters profiles. This method describes how t 。

28、o use a discrimination function for sex identification in the medium. About two hundreds conversations have been collected from specially designed chat and real mediums. Forty-nine of the conversations are chosen as the training set and including ninety-eight chatters (forty-four female and fifty-fo 。

29、ur male) for testing. Experimental results are indicating that the proposed discrimination function has sufficient discriminative power for the sex identification in the chat mediums. We also find that the system can quite accurately predict the chatters sex in the mediums. Table 2. The general resu 。

30、lt of sex identification for the specially designed medium and mIRCMale ChattersFemale ChattersSDCMMIRCSDCMmIRCNumber of chatters 5419448Number of correct decision 455365Number of wrong decisions 6483Number of undecided results 3000Percent. of correct decision 83.3%78.9%81.8%62.5%Percent. of wrong d 。

31、ecisions11.1%21.1%18.2%37.5%Percent. of undecided results 5.6%0.0%0.0%0.0%Table 2 presents sex classification results for the conversations between chatters in the medium. The accuracy of decision of the system reaches to 83.3% percentage.4. Conclusions and Future WorkNowadays chat mediums are becom 。

32、ing an important part of human life and provide quite useful information about people in a society. In this paper, a simple discrimination function is defined for the sex identification. The identification system with the discrimination function achieves accuracy over 80% in the sex identification i 。

33、n the mediums.In the future work, a Neuro-Fuzzy method considering the intersection of the word groups, can be employed to determine the weighting coefficients of the proposed discrimination function. Then, the weighting coefficients of the proposed discrimination function would be calculated more p 。

34、recisely and accuracy of the identification system could be improved. References1. Baumgartner R., Eiter T., Gottlob G., Herzog M., Koch C., Information extraction for the semantic, Lecture Notes in Computer Science Reasoning Web., Vol. 3564, pp. 275-289, 2005.2. Gao Xiaoying, Zhang Mengjie, Learnin 。

35、g knowledge bases for information extraction from multiple text based Web sites, IEEE/WIC International Conference on Intelligent Agent Technology, pp. 119 125, 2003.3. Iiritano S., Ruffolo M., Managing the knowledge contained in electronic documents: a clustering method for text mining., 12th Inter 。

36、national Workshop on Database and Expert Systems Applications, pp. 454 458, 2001.4. Kaban A., Wang Xin., Context based identification of user communities from Internet chat., IEEE International Joint Conference on Neural Networks, Vol. 4, pp. 3287 3292, 2004.5. Khan Faisal M., Fisher Todd A., Shuler 。

37、 Lori, Wu Tianhao and Pottenger William M., Mining Chatroom Conversations for Social and Semantic Interactions., Lehigh University Technical Report LU-CSE-02-011, 2002.6. Pazzani M, Billsus D.,Learning and revising user profiles: The identification of interesting Web sites, Machine Learning 27 (3):。

38、pp. 313-331, 1997.7. Wu Tianhao, Khan Faisal M., Fisher Todd A., Shuler Lori A. and Pottenger William M., Error-Driven Boolean-Logic-Rule-Based Learning for Mining Chat-room Conversations., Lehigh University Technical Report LU-CSE-02-008, 2002 8. Nabiyev, V.V., “Artificial Intelligence: Problems , Methods , Algorithms” , Second Edition, Sekin Publishing , Ankara, 2005, 764 pp. (in Turkish 。


    来源:(未知)

    【学习资料】网址:/a/2021/0324/0021765310.html

    标题:statistical|A statistical approach for sex identification in chat mediums


    上一篇:减隔震支墩|减隔震支墩施工方案

    下一篇:半导体|半导体光电子学第7章半导体中的光吸收和光