Abstract
As social media platforms gain popularity and influence, content integrity and user accountability issues become more critical. Authorship attribution (AA) is a powerful tool for tackling such issues by accurately determining the real author of online posts. This study proposes an AA approach using machine and deep learning algorithms to accurately predict the author of unknown posts on social media platforms. It introduces Temporal Convolutional Networks (TCN) for short texts, investigates the effectiveness of combining Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), and explores the use of an Autoencoder combined with Adaboost classifier. This approach was tested on a Twitter dataset, achieving 52.77% accuracy in AA through multiple experiments across various scenarios.
Similar content being viewed by others
Data availability
Not applicable.
References
Khanday AMUD, Khan QR, Rabani ST (2021) Identifying propaganda from online social networks during COVID-19 using machine learning techniques. Int J Inform Technol 13:115–122. https://doi.org/10.1007/s41870-020-00550-5
Akuma S, Lubem T, Adom IT (2022) Comparing bag of words and tf-idf with different models for hate speech detection from live tweets. Int J Inform Technol 14(7):3629–3635. https://doi.org/10.1007/s41870-022-01096-4
Kotiyal B, Pathak H, Singh N (2023) Debunking multi-lingual social media posts using deep learning. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01288-6
Reshi JA, Ali R (2023) Leveraging transfer learning for detecting misinformation on social media. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01541-y
Mendenhall TC (1887) The characteristic curves of composition. Science. https://doi.org/10.1126/science.ns-9.214S.237
Yule GU (1939) On sentence- length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30:363–390
Zipf GK (1932) Selected studies of the principle of relative frequency in language (Harvard University Press, Cambridge, MA and London, England). https://doi.org/10.4159/harvard.9780674434929
Kah AE, Airej AE, Zeroual I (2022) Arabic authorship attribution on twitter: what is really matters? Indonesian J Electric Eng Comput Sci 28:1730–1737. https://doi.org/10.11591/ijeecs.v28.i3.pp1730-1737
Theophilo A, Padilha R, Andaló FA, Rocha A (2022) (Institute of Electrical and Electronics Engineers Inc.) pp. 2909–2913. https://doi.org/10.1109/ICASSP43922.2022.9746262
Rabab’ah A, Al-Ayyoub M, Jararweh Y, Aldwairi M (2016) In: 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp. 1–6
Suman C, Raj A, Saha S, Bhattacharyya P (2022) Authorship attribution of microtext using capsule networks. IEEE Trans Comput Soc Syst 9:1038–1047. https://doi.org/10.1109/TCSS.2021.3067736
Wang X, Iwaihara M (2021) (Springer Science and Business Media Deutschland GmbH), pp. 413–421. https://doi.org/10.1007/978-3-030-85896-4_32
Schwartz R, Tsur O, Rappoport A, Koppel M (2013) In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp. 1880–1891. https://aclanthology.org/D13-1193
Huang W, Su R, Iwaihara M (2020) (Springer Science and Business Media Deutschland GmbH), pp. 261–269. https://doi.org/10.1007/978-3-030-60290-1_20
Shrestha P, Sierra S, González FA, Rosso P, Montes-Y-Gómez M, Solorio T (2017) In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, vol. 2, pp. 669–674
Bhowmik S, Sultana S, Sajid AA, Reno S, Manjrekar A (2023) Robust multi-domain descriptive text classification leveraging conventional and hybrid deep learning models. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01559-2
Yadav V, Verma P, Katiyar V (2023) Enhancing sentiment analysis in hindi for e-commerce companies: a cnn-lstm approach with cbow and tf-idf word embedding models. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01596-x
Zulqarnain M, Alsaedi AK, Sheikh R, Javid I, Ahmad M, Ullah U (2023) An improved gated recurrent unit based on auto encoder for sentiment analysis. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01600-4
Khanday AMUD, Rabani ST, Khan QR, Rouf N, Din Mohi Ud (2020) Machine learning based approaches for detecting COVID-19 using clinical text data. Int J Inform Technol 12:731–739. https://doi.org/10.1007/s41870-020-00495-9
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aouchiche, R.I.A., Boumahdi, F., Remmide, M.A. et al. Authorship attribution in twitter: a comparative study of machine learning and deep learning approaches. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01788-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41870-024-01788-z