Robert Meyer - Analysing user comments with Doc2Vec and Machine Learning classification
Description I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Can we determine for a particular user comment from which news site it originated? Abstract Doc2Vec is a nice neural network framework for text analysis. The machine learning technique computes so called document and word embeddings, i.e. vector representations of documents and words. These representations can be used to uncover semantic relations. For instance, Doc2Vec may learn that the word "King" is similar to "Queen" but less so to "Database". I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Accordingly, given a particular comment, can we determine from which news site it originated? Are there patterns among user comments? Can we identify stereotypical comments for different news sites? Besides presenting the results of my experiments, I will give a short introduction to Doc2Vec. www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Applying the four step "Embed, Encode, Attend, Predict" framework to predict document similarity

Matti Lyra - Evaluating Topic Models

Understanding Word2Vec

Word2Vec - Skipgram and CBOW

James Powell: So you want to be a Python expert? | PyData Seattle 2017

Chris Moody introduces lda2vec

12.1: What is word2vec? - Programming with Text

Natural language processing (for the impatient) - Sebastian Dziadzio

Prof. David Blei - Probabilistic Topic Models and User Behavior

But what is a neural network? | Deep learning chapter 1

Lev Konstantinovskiy - Word Embeddings for fun and profit in Gensim

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

FastText Tutorial - How to Classify Text with FastText

How To Think SO CLEARLY People Assume You're A Genius

Natural Language Processing in Python

Lev Konstantinovskiy - Text similiarity with the next generation of word embeddings in Gensim

Topic Modeling with Python

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Beyond word2vec: GloVe, fastText, StarSpace - Konstantinos Perifanos

