Skip to main content
Log in

Authorship attribution in twitter: a comparative study of machine learning and deep learning approaches

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

As social media platforms gain popularity and influence, content integrity and user accountability issues become more critical. Authorship attribution (AA) is a powerful tool for tackling such issues by accurately determining the real author of online posts. This study proposes an AA approach using machine and deep learning algorithms to accurately predict the author of unknown posts on social media platforms. It introduces Temporal Convolutional Networks (TCN) for short texts, investigates the effectiveness of combining Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), and explores the use of an Autoencoder combined with Adaboost classifier. This approach was tested on a Twitter dataset, achieving 52.77% accuracy in AA through multiple experiments across various scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Khanday AMUD, Khan QR, Rabani ST (2021) Identifying propaganda from online social networks during COVID-19 using machine learning techniques. Int J Inform Technol 13:115–122. https://doi.org/10.1007/s41870-020-00550-5

    Article  Google Scholar 

  2. Akuma S, Lubem T, Adom IT (2022) Comparing bag of words and tf-idf with different models for hate speech detection from live tweets. Int J Inform Technol 14(7):3629–3635. https://doi.org/10.1007/s41870-022-01096-4

    Article  Google Scholar 

  3. Kotiyal B, Pathak H, Singh N (2023) Debunking multi-lingual social media posts using deep learning. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01288-6

    Article  Google Scholar 

  4. Reshi JA, Ali R (2023) Leveraging transfer learning for detecting misinformation on social media. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01541-y

    Article  Google Scholar 

  5. Mendenhall TC (1887) The characteristic curves of composition. Science. https://doi.org/10.1126/science.ns-9.214S.237

    Article  PubMed  Google Scholar 

  6. Yule GU (1939) On sentence- length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30:363–390

    Google Scholar 

  7. Zipf GK (1932) Selected studies of the principle of relative frequency in language (Harvard University Press, Cambridge, MA and London, England). https://doi.org/10.4159/harvard.9780674434929

  8. Kah AE, Airej AE, Zeroual I (2022) Arabic authorship attribution on twitter: what is really matters? Indonesian J Electric Eng Comput Sci 28:1730–1737. https://doi.org/10.11591/ijeecs.v28.i3.pp1730-1737

  9. Theophilo A, Padilha R, Andaló FA, Rocha A (2022) (Institute of Electrical and Electronics Engineers Inc.) pp. 2909–2913. https://doi.org/10.1109/ICASSP43922.2022.9746262

  10. Rabab’ah A, Al-Ayyoub M, Jararweh Y, Aldwairi M (2016) In: 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp. 1–6

  11. Suman C, Raj A, Saha S, Bhattacharyya P (2022) Authorship attribution of microtext using capsule networks. IEEE Trans Comput Soc Syst 9:1038–1047. https://doi.org/10.1109/TCSS.2021.3067736

    Article  Google Scholar 

  12. Wang X, Iwaihara M (2021) (Springer Science and Business Media Deutschland GmbH), pp. 413–421. https://doi.org/10.1007/978-3-030-85896-4_32

  13. Schwartz R, Tsur O, Rappoport A, Koppel M (2013) In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp. 1880–1891. https://aclanthology.org/D13-1193

  14. Huang W, Su R, Iwaihara M (2020) (Springer Science and Business Media Deutschland GmbH), pp. 261–269. https://doi.org/10.1007/978-3-030-60290-1_20

  15. Shrestha P, Sierra S, González FA, Rosso P, Montes-Y-Gómez M, Solorio T (2017) In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, vol. 2, pp. 669–674

  16. Bhowmik S, Sultana S, Sajid AA, Reno S, Manjrekar A (2023) Robust multi-domain descriptive text classification leveraging conventional and hybrid deep learning models. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01559-2

    Article  Google Scholar 

  17. Yadav V, Verma P, Katiyar V (2023) Enhancing sentiment analysis in hindi for e-commerce companies: a cnn-lstm approach with cbow and tf-idf word embedding models. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01596-x

    Article  Google Scholar 

  18. Zulqarnain M, Alsaedi AK, Sheikh R, Javid I, Ahmad M, Ullah U (2023) An improved gated recurrent unit based on auto encoder for sentiment analysis. Int J Inform Technol. https://doi.org/10.1007/s41870-023-01600-4

    Article  Google Scholar 

  19. Khanday AMUD, Rabani ST, Khan QR, Rouf N, Din Mohi Ud (2020) Machine learning based approaches for detecting COVID-19 using clinical text data. Int J Inform Technol 12:731–739. https://doi.org/10.1007/s41870-020-00495-9

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rebeh Imane Ammar Aouchiche.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal participants

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aouchiche, R.I.A., Boumahdi, F., Remmide, M.A. et al. Authorship attribution in twitter: a comparative study of machine learning and deep learning approaches. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01788-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41870-024-01788-z

Keywords

Navigation