← Back to Portfolio
What It Does
This project implements a text classification pipeline to identify misinformation. By processing the text
of news articles, the model calculates the probability of the content being legitimate or fabricated
based on patterns learned from thousands of labelled examples.
Methodology
The system uses a combination of Natural Language Processing (NLP) techniques to transform raw text into
numerical data that the machine learning model can understand:
- TF-IDF Vectorization: Converts text into a matrix of TF-IDF features, emphasizing
words that are unique to specific documents while downplaying common words.
- Linear Support Vector Classifier (SVC): Chosen for its high performance in
high-dimensional text classification tasks.
- Text Preprocessing: Includes tokenization, stop-word removal, and case
normalization using the NLTK library.
Key Features
- High accuracy on the "Fake or Real News" dataset.
- Robust handling of diverse news writing styles.
- Efficient processing pipeline suitable for real-time classification.
Stack & Requirements
- Python
- scikit-learn
- NLTK
- Pandas
- TF-IDF + Linear SVC