English Automatic Dictionary Creation with Natural Language Processing Doǧal Dil Işleme Ile Ingilizce Otomatik Sözlök Oluşturma


Toprak A., TURAN M.

2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019, İzmir, Türkiye, 31 Ekim - 02 Kasım 2019 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/asyu48272.2019.8946431
  • Basıldığı Şehir: İzmir
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: automatic dictionary creation, dictionary similarity scale, helmholtz principle, wordnet
  • İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

Studies in the area of language lexicography are focused on automatic dictionary creation. In this article, an English document is given as an initial reference. In the study, meaningful words representing the reference document were identified. For this purpose, the Helmholtz Principle has been applied. The first dictionary words consist of the meaningful words of the reference document we call this seed. Then, with a loop, Web search is performed in the Azure Web Cognitive Web Search system using meaningful words from the most recently processed document. The first document from the search result has meaningful words with the Helmholtz Principle as applied to the reference document. The meaningful words found during the cycle are not added directly to the dictionary this time, and using the WordNet dictionary to avoid deviations, the similarity of each meaningful word with the dictionary formed is calculated. The meaningful words with similarity values higher than a certain threshold value are added to the dictionary and the search cycle is repeated using these words, and finally, when the desired number of words for the dictionary is reached, ends. In order to measure the performance of the dictionary, WordNet similarity calculation was used. Dictionaries with an average of % 38,93 similarity can be generated in tests performed with reference documents given in different subjects.