FEU Institute of Technology Undergraduate Thesis

Detecting Internet Brain Rot with Multimodal AI

Visual-Qwen pairs a Q-Former vision encoder with Qwen3-4B and Whisper transcripts to flag “sludge” short-form videos: the stacked, multi-feed clips engineered to bypass single-modality moderators.

96.67%
Video-level accuracy on 300-video test split
97.19%
F1-score (95.58% precision, 98.86% recall)
~6,000
Multimodal samples in the open Kaggle dataset
4B
Parameter Qwen3 backbone, LoRA-tuned
How It Works

A frozen-projector tri-modal classifier

Visual-Qwen reads each clip across three signals at once: an EVA-CLIP frame embedding, a Q-Former cross-modal attention bottleneck, and a Whisper transcript of the audio. Qwen3-4B fuses the streams and emits a sludge / not-sludge verdict.

What is sludge?

Multi-feed short-form clips that stack unrelated content (gameplay over reaction video over text crawl) to defeat algorithmic moderation built for single coherent scenes.

Research

What the paper shows

Three headline contributions, every number traceable to the public test split.

96.67%
Video-level test accuracy
300-video held-out split
97.19%
F1-score
precision 95.58% / recall 98.86%
+0.77 pp
Lift from frozen projector
regularization finding
~6,000
Multimodal samples
open on Kaggle

Visual-Qwen: Augmenting Multimodal Deep Learning with Attention Mechanisms

FEU Institute of Technology, 2025. Open paper, open code, open dataset, open weights.

Try it Yourself

Upload Your Video

Run our fine-tuned multimodal model on your own clip. It looks for sludge: short-form video that stacks unrelated streams together (think Subway Surfers under Family Guy under soap-cutting) to defeat single-modality moderators.

Hosted on Hugging Face Spaces (free CPU). Inference takes about 1 to 2 minutes per video on the default settings, longer with deep analysis.

Open in a new tab
Our Team

Meet the Researchers

Four dedicated CS students and their extraordinary advisor from FEU Institute of Technology, combining academic excellence with entrepreneurial vision.

Justine Jude Pura

Justine Jude Pura

Project Mentor

Marc Olata

Marc Olata

Project Manager

Alpha Romer Coma

Alpha Romer Coma

Technical Lead

Job Isaac Ong

Job Isaac Ong

Data Engineer

Kristoffer Ian Sioson

Kristoffer Ian Sioson

Data Analyst

Our Supporters

Sponsors & Partners

Grateful for the support of organizations sharing our vision of healthier digital environments.

YouTube Researcher Program logo

YouTube Researcher Program

Ethical Data Collection

TPU Research Cloud logo

TPU Research Cloud

AI Infrastructure

🤝 Become a Partner

Support cutting-edge AI research and help combat internet brain rot through strategic partnerships.

💼 Corporate
Infrastructure & funding
🔬 Research
Joint opportunities
🌟 Mission
Aligned partnerships