A Stem-Based Classification Approach for Identifying Author Specialty

Sara Mohammed

Authors

Sara Mohammed Information System Department, Faculty of Computers & Artificial Intelligence, Benha University, Benha, Egypt

Keywords:

Classification, Vector Space Model, Cosine Similarity, Modified TF-IDF, Levenshtein Edit Distance

Abstract

Researchers and readers of scientific articles face
the problem with identifying the articles and
scientific research papers categories and hence the
difficulty in determining authors' specialty. Many
researchers face the problem of selecting a journal
that is suitable for publishing his/her scientific
research paper. Many experiences assist
researchers in choosing the appropriate journal.
However, no one addresses the problem of
determining the publisher's specialty of the
scientific paper according to his / her article. This
paper proposes a solution to identify the author's
specialty through abstract comparison. Also, it
suggests a new method to help choose the
appropriate journal. That finds the appropriate
journal according to the abstract of the article that
is required to be published. A classification model
designs to find the correct category of a given
article. Accordingly, the author's specialty is
determined. The classifier also finds the Scimago
journal categories according to the journal's scope.
We built the classifier using a vector space model
based on a cosine similarity measure. Also, we use
M-TF-IDF weight which is a TF IDF, but we have
suggested a modified method that helps us with the
measurement. After classifying the article category,
a second classifier based on the Levenshtein
algorithm selects the appropriate journal for
publishing an article. Our dataset is divided into
three groups: the scopes of journals, the abstract of
articles, and the title of the journal and its scope
datasets—all datasets in the main category fromthe
Scimago website. The proposed measure shows
good performance of results.

References

J. Shaikh, “Machine Learning, NLP: Text

Classification using scikit-learn, python and

NLTK.,” Towards Data Science, 30-Oct-2017.

B. Stecanella, “What is TF-IDF,”

MonkeyLearn Blog, 10-May-2019.

J. Han, M. Kamber, and undefined undefined

undefined, “Getting to Know Your Data,” in

Data mining: concepts and techniques, Third

edition., Burlington, MA: Elsevier, 2012, pp.

–82.

jolasa Iñaki, “Text Classification: Data Science

and Machine Learning,” Kaggle, 17-Jul-2019.

M. Habibi and P. W. Cahyo, “Journal

Classification Based on Abstract Using Cosine

Similarity and Support Vector Machine,”

JISKA (Jurnal Informatika Sunan Kalijaga),

vol. 4, pp. 185–192, Jan-2020.

P. Y. Ristanti, A. P. Wibawa and U. Pujianto,

"Cosine Similarity for Title and Abstract of

Economic Journal Classification," 2019 5th

International Conference on Science in

Information Technology (ICSITech),

Yogyakarta, Indonesia, 2019, pp. 123-127.

P. D. Nurfadila, A. P. Wibawa, I. A. E. Zaeni,

and A. Nafalski, “Journal Classification Using

Cosine Similarity Method on Title and

Abstract with Frequency-Based Stopword

Removal,” International Journal of Artificial

Intelegence Research, vol. 3, pp. 28–37, Dec2019.

E. Haddi, X. Liu, and Y. Shi, “The Role of

Text Pre-processing in Sentiment Analysis,”

Procedia Computer Science, vol. 17, pp. 26–

, 2013.

D. M. Eler, D. Grosa, I. Pola, and R. E. Garcia,

“Analysis of Document Pre-Processing Effects

in Text and Opinion Mining,” Information,

vol. 9, p. 100, Apr-2018.

Chris I, “Let’s Understand the Vector Space

Model in Machine Learning by Modelling

Cars,” Towords Data Science, 04-Nov-2019.

G. Salton and C. Buckley, “Term-weighting

approaches in automatic text retrieval,”

Information Processing & Management, vol.

, no. 5, pp. 513–523, 19-Jul-2002.

S. Qaiser and R. Ali, “Text Mining: Use of TFIDF to Examine the Relevance of Words to

Documents,” International Journal of

Computer Applications, vol. 181, no. 1, pp.

–29, 16-Jul-2018.

D. Medhat, A. Hassan and C. Salama, "A

hybrid cross-language name matching

iJournals: International Journal of Software & Hardware Research in Engineering (IJSHRE)

ISSN-2347-4890

Volume 9 Issue 5 May 2021

Page 83

technique using novel modified Levenshtein

Distance," 2015 Tenth International

Conference on Computer Engineering &

Systems (ICCES), Cairo, Egypt, 2015, pp.

-209.

V. C. Trejo, G. Sidorov, S. M. Jiménez, and

M. Moreno, “Latent Dirichlet Allocation

complement in the vector space model for

Multi-Label Text Classification,” International

Journal of Combinatorial Optimization

Problems and Informatics, vol. 6, pp. 7–19,

Apr-2015.

K. A. R. E. N. S. P. A. R. C. K. JONES, “A

Statistical Interpretation of Term Specificity

and Its Application in Retrieval,” Journal of

Documentation, vol. 28, no. 1, pp. 11–21, 01-

Jan-1972.

A Stem-Based Classification Approach for Identifying Author Specialty

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section