ModerateAI
See how it works with an interactive demo.
Online Platforms looking for safety solutions struggle to find a balance between:
Human Moderation
Outsourcing human moderation through BPOs becomes problematic at scale. Leading to higher costs, inconsistent quality, and lack of actionable insights.
Automation
Automated moderation systems often fall short, with high error rates, underperforming models, complex maintenance needs, and slow adaptation to policy changes.
“We can’t spend millions of dollars on manual review. We need to focus on high risk content, not manually reviewing every single image.”
– Director of Trust & Safety
Marketplace, 10M Users
“Off the shelf classifies are giving us bananas and saying they’re explicit. It costs us more to fine tune and hurts user experience.”
— Safety Team Lead
Marketplace, 10M Users
ModerateAI replaces manual content moderation with a new system that combines expert human judgment with the efficiency of powerful automation.
Cost Reduction
vs. existing BPO
Increase in Quality
from today's baseline
Under 1 week
to implement
Trust & Safety teams need a smarter solution. You need Smart Content Moderation with ModerateAI:
Instead of jumping from system to system, ModerateAI’s “smart review” approach consolidates everything into one seamless process.
You handle your platform and tools, we take care of the rest.
How ModerateAI works:
1. Content is sent to ModerateAI
Your content is sent to our system via API where we ingest it through a data exchange and translation layer.
You have the option to add a pre-filtering layer based your existing tools such as keyword lists and pre-filtering rules.
2. Smart Routing analyzes each piece of content and:
- Recognizes attributes, such as policy flags, that have been applied by your pre-filtering layer.
- Checks for similar content.
- Determines modality, language, length, and duration.
- Identifies keywords and other metadata.
- Applies custom logic and dynamically assigns the content for the fastest, most accurate, and least expensive way to moderate it.
3. Content goes through Collaborative AI and Human Content Review
Unlike traditional systems, ModerateAI uses classifiers and human reviewers working in sync to evaluate the content and enhance efficiency.
Think about this as an AI co-pilot for humans, and humans double checking our agreement with automated decisions.
4. Quality & Audit Reporting is Initiated
Content is flagged for policy violations, and relevant metadata is recorded.
This data is passed into the analytics and insights dashboard, to help us, and you, understand where we can improve.
5. Feedback Loop to You and ModerateAI for Continuous Improvement
Return actions and feedback via API or webhook and provide quality feedback to us, such as disagreements with labels, to improve the system.
With ModerateAI, Your Trust & Safety Team Can:
Get more done while paying less
We take content moderation tasks off your hands so you can reduce operational costs and free up engineering and policy resources.
Maintain quality of best raters
Label quality matches that of your best human raters. Ensuring consistent, accurate, and reliable content moderation.
Get insights on your platform & emerging threats
We deliver outstanding value to our clients and delight them with every interaction.
Effortless integration
While APIs provide the best performance and greatest flexibility, we can integrate with any internal or 3rd party moderation tool currently in use.
Spend resources on what matters most:
scaling and improving user safety.
40%+ savings with ModerateAI over 12-18 months
Smarter content moderation is almost here.
Join the Waitlist
ModerateAI FAQs
Traditional BPOs use inexpensive offshore labor to manually review content and pass back a label in accordance with your policies.
Platforms often experience issues such as constant turnover and training issues, quality control issues, communications challenges, and data security concerns.
This process is typically kept at arm's length and not deeply integrated into your workflow for these reasons.
ModerateAI fundamentally changes this paradigm by creating very tight closed loops between automation and human review and is intended to be plugged in directly to your workflow, disrupting the outdated BPO model.
Classifiers are notoriously difficult to implement and tune - you need them to catch harmful content across many harm vectors, modalities, and languages but not generate overwhelming amounts of false positives.
The ongoing effort to get classifiers to work well can cost millions of dollars and defocus your engineering team, and worse, still does not eliminate the need for human reviewers for complex or nuanced cases.
Classifiers also require continual investment or they can quickly become out of date and missing high-content. ModerateAI takes this work off your plate entirely so you can reduce your investment and focus on what’s most important.
ModerateAI is the single API call you’ll make when you need an external review of a piece of content.
Often customers prefer to maintain some basic filtering such as a keyword list that has been curated over many years and is specific to their platform, but every review after that basic filtering layer will be sent to ModerateAI.
After our review, we will pass you back a content label along with rationale on why the content violated one or more of your policies. If the content is non-violative, you will also see this.
Onboarding consists of three phases:
1. Policy intake where we will ingest your existing policies and build them into ModerateAI.
2. Policy calibration, where we will run sample content and address any gaps we may see where the policy is unclear or may need adjusting.
3. The final stage is go-live, where you begin sending us your content via our API in real-time.
From policy intake until go-live is typically between two to three weeks.
TrustLab takes a security and privacy first approach to everything we do.
We follow the best practices of audited frameworks such as SOC2, NIST CSF, and others.
We can share our SOC2 Type 2 audit documentation and letter of attestation on request.
ModerateAI tracks operational metrics such as volumes, violation rates, violation categories, average handling times, and more and makes them available through a dashboard that you and your team can access at any point.
ModerateAI supports content of the most common modalities which are text, image, video, and the combination of the three including a variety of use cases such as product listings, social media posts, chat messages, profile content, and more. We currently do not support real-time streaming or live audio.
ModerateAI sends you labels and you will need to determine the appropriate action, if any, and those decisions should be appropriate to your platform’s risk tolerance.
For example, you may choose to demonetize or "geoblock" rather than remove the content.
Much of this can be automated using the ModerateAI API and automation rules on your end to automatically remove all high-confidence hate speech content, as an example.
Our preferred method is for you to leverage the ModerateAI API, which allows you to securely send content to us directly. This is very similar to how you typically work with external classifiers and other labelling systems and tools.
Once we review your content, we return back a label that aligns to your policy areas and specifically notes for which policy area an infringement was found, along with the rationale of why and any associated evidence.
If this is too heavy of a technical lift or if you already have a tool that you’d like ModerateAI to use, we’re happy to evaluate.
ModerateAI ingests your policies and, via API or through the use of your existing tools, reviews content with a combination of AI and human review.
Platforms using ModerateAI will feel confident that the right type of resource (ie. machine or human, or both) is being applied to each specific piece of content, and purposefully evaluating it against all of their unique policies, ensuring that the risk associated with this type of content is fully mitigated.