Online platforms are running large language models at every stage of LLM content moderation, from generating training data to auditing their own systems for bias. Researchers at Google mapped how this is happening across what the authors call the Abuse Detect…
Google study finds LLMs are embedded at every stage of abuse detection
Google researchers discovered that large language models (LLMs) are embedded across every stage of content moderation abuse detection pipelines, including training data generation and bias auditing. This practice affects major online platforms like Google, Meta, and Twitter, potentially introducing new attack surfaces and bias amplification in moderation systems. The scale of impact spans global social media, cloud services, and AI-driven content platforms leveraging LLMs for moderation.