Fake News Classifier | Matteo Negri

What It Does

This project implements a text classification pipeline to identify misinformation. By processing the text of news articles, the model calculates the probability of the content being legitimate or fabricated based on patterns learned from thousands of labelled examples.

Methodology

The system uses a combination of Natural Language Processing (NLP) techniques to transform raw text into numerical data that the machine learning model can understand:

TF-IDF Vectorization: Converts text into a matrix of TF-IDF features, emphasizing words that are unique to specific documents while downplaying common words.
Linear Support Vector Classifier (SVC): Chosen for its high performance in high-dimensional text classification tasks.
Text Preprocessing: Includes tokenization, stop-word removal, and case normalization using the NLTK library.

Key Features

High accuracy on the "Fake or Real News" dataset.
Robust handling of diverse news writing styles.
Efficient processing pipeline suitable for real-time classification.

Stack & Requirements

Python
scikit-learn
NLTK
Pandas
TF-IDF + Linear SVC