A Stem-Based Classification Approach for Identifying Author Specialty
Keywords:
Classification, Vector Space Model, Cosine Similarity, Modified TF-IDF, Levenshtein Edit DistanceAbstract
Researchers and readers of scientific articles face
the problem with identifying the articles and
scientific research papers categories and hence the
difficulty in determining authors' specialty. Many
researchers face the problem of selecting a journal
that is suitable for publishing his/her scientific
research paper. Many experiences assist
researchers in choosing the appropriate journal.
However, no one addresses the problem of
determining the publisher's specialty of the
scientific paper according to his / her article. This
paper proposes a solution to identify the author's
specialty through abstract comparison. Also, it
suggests a new method to help choose the
appropriate journal. That finds the appropriate
journal according to the abstract of the article that
is required to be published. A classification model
designs to find the correct category of a given
article. Accordingly, the author's specialty is
determined. The classifier also finds the Scimago
journal categories according to the journal's scope.
We built the classifier using a vector space model
based on a cosine similarity measure. Also, we use
M-TF-IDF weight which is a TF IDF, but we have
suggested a modified method that helps us with the
measurement. After classifying the article category,
a second classifier based on the Levenshtein
algorithm selects the appropriate journal for
publishing an article. Our dataset is divided into
three groups: the scopes of journals, the abstract of
articles, and the title of the journal and its scope
datasets—all datasets in the main category fromthe
Scimago website. The proposed measure shows
good performance of results.
References
J. Shaikh, “Machine Learning, NLP: Text
Classification using scikit-learn, python and
NLTK.,” Towards Data Science, 30-Oct-2017.
B. Stecanella, “What is TF-IDF,”
MonkeyLearn Blog, 10-May-2019.
J. Han, M. Kamber, and undefined undefined
undefined, “Getting to Know Your Data,” in
Data mining: concepts and techniques, Third
edition., Burlington, MA: Elsevier, 2012, pp.
–82.
jolasa Iñaki, “Text Classification: Data Science
and Machine Learning,” Kaggle, 17-Jul-2019.
M. Habibi and P. W. Cahyo, “Journal
Classification Based on Abstract Using Cosine
Similarity and Support Vector Machine,”
JISKA (Jurnal Informatika Sunan Kalijaga),
vol. 4, pp. 185–192, Jan-2020.
P. Y. Ristanti, A. P. Wibawa and U. Pujianto,
"Cosine Similarity for Title and Abstract of
Economic Journal Classification," 2019 5th
International Conference on Science in
Information Technology (ICSITech),
Yogyakarta, Indonesia, 2019, pp. 123-127.
P. D. Nurfadila, A. P. Wibawa, I. A. E. Zaeni,
and A. Nafalski, “Journal Classification Using
Cosine Similarity Method on Title and
Abstract with Frequency-Based Stopword
Removal,” International Journal of Artificial
Intelegence Research, vol. 3, pp. 28–37, Dec2019.
E. Haddi, X. Liu, and Y. Shi, “The Role of
Text Pre-processing in Sentiment Analysis,”
Procedia Computer Science, vol. 17, pp. 26–
, 2013.
D. M. Eler, D. Grosa, I. Pola, and R. E. Garcia,
“Analysis of Document Pre-Processing Effects
in Text and Opinion Mining,” Information,
vol. 9, p. 100, Apr-2018.
Chris I, “Let’s Understand the Vector Space
Model in Machine Learning by Modelling
Cars,” Towords Data Science, 04-Nov-2019.
G. Salton and C. Buckley, “Term-weighting
approaches in automatic text retrieval,”
Information Processing & Management, vol.
, no. 5, pp. 513–523, 19-Jul-2002.
S. Qaiser and R. Ali, “Text Mining: Use of TFIDF to Examine the Relevance of Words to
Documents,” International Journal of
Computer Applications, vol. 181, no. 1, pp.
–29, 16-Jul-2018.
D. Medhat, A. Hassan and C. Salama, "A
hybrid cross-language name matching
iJournals: International Journal of Software & Hardware Research in Engineering (IJSHRE)
ISSN-2347-4890
Volume 9 Issue 5 May 2021
© 2021, iJournals All Rights Reserved www.ijournals.in
© 2020, iJournals All Rights Reserved www.ijournals.in
Page 83
technique using novel modified Levenshtein
Distance," 2015 Tenth International
Conference on Computer Engineering &
Systems (ICCES), Cairo, Egypt, 2015, pp.
-209.
V. C. Trejo, G. Sidorov, S. M. Jiménez, and
M. Moreno, “Latent Dirichlet Allocation
complement in the vector space model for
Multi-Label Text Classification,” International
Journal of Combinatorial Optimization
Problems and Informatics, vol. 6, pp. 7–19,
Apr-2015.
K. A. R. E. N. S. P. A. R. C. K. JONES, “A
Statistical Interpretation of Term Specificity
and Its Application in Retrieval,” Journal of
Documentation, vol. 28, no. 1, pp. 11–21, 01-
Jan-1972.