Online communities face a constant battle against hate speech, fake comments, and toxic discussions.

Manual moderation can be overwhelming and expensive, especially for publishers and forum managers handling hundreds or thousands of daily comments. That is why more organisations are looking to automate hate speech detection – saving time, reducing workload, and keeping discussions healthy.

In the final quarter of 2017 alone, Facebook removed 1.6 million pieces of content containing online harassment and hate speech. In the second quarter of 2021, that number jumped to over 31 million hate speech comments. And that’s just one website during a single 3-month period.

Needless to say, online hate speech and disinformation are prevalent, and they are not going away any time soon. This online phenomenon identified in comments exists in all communities, from social media networks and news sites to forums and blogs.

Wherever people can openly speak their minds and practice free speech, negative, fake and hateful content and online hating behaviour can be found.

It is a growing global problem

In many countries, hate speech online is a growing hate crime problem and one that creates unique challenges for media organisations. As we all know, the publishers’ industry is responsible not only for its original content, but also for the content its users generate on their relevant media accounts and websites.

Imagine, for instance, that you run a news website with a comments section that updates in real-time.

You post a news story about migrants or minorities and it goes viral. Great, you now have a story that could be seen by millions. But you will also have thousands of comments to deal with. Those comments could flood your page at a rate of 100s per hour.

But why can this be a concern? Comments are an essential ingredient to a published piece. They can easily shift the original message of the text and make the reader misinterpret the primary meaning. Large amounts of negative comments can change the narrative using expletives and hate speech. And let’s not forget that, you, as a publisher, are responsible for each comment and its content that could cause online harm.

Most often, comments are used to predefine the narrative. Human moderators hardly can read and verify every single comment real-time. This is a vulnerability that is exploited by interested parties.

Comparing online hating based solely on personal beliefs is not feasible, as it is not feasible to check comments manually. And even if you could do that, how would you prevent hateful comments from being posted in the first place? If you’re doing it manually, it means you’ll need to hire numerous people to work around the clock manually checking the comments.

It’s expensive, it’s time-consuming, and it’s impractical.

The result is that your viral news story becomes riddled with hateful comments. Not only will these offend your readers, but online haters could also induce negative attitudes toward and harm the reputation of your brand.

After all, who wants to devote themselves to a community constantly flooded with hateful comments or online trolling? Who wants to spend time on a website that fills with negativity every time a new story is published?

We are not just talking about comments that can be perceived to be mildly inappropriate. They certainly exist, but “hate speech” typically refers to vile and obscene messages filled with online hate speech and vitriol. No one wants to witness online hating when they’re trying to catch up with the day’s news.

Existing hate speech solutions have limitations

As every branding expert can attest, comment sections are essential for promoting a brand’s organic reach, building a strong community, and ensuring that engagement stays high. So, despite online hating thrives, the solution isn’t to simply remove the comments section.

You just need to find a way of distinguishing online hating and dealing with it. Unfortunately, there are few existing systems in place for Bulgarian media outlets. It means that many businesses resort to ineffective methods.

Human intervention is one such option. But it’s simply not sustainable, and few social media platforms or outlets will consider it a viable use of their resources. It’s also fallible. Unless those human moderators have been briefed on all slang terms and phrases, they can’t catch everything and some hateful comments will slip through the net.

There are automated solutions built into existing forums and comment sections. However, these are designed to detect certain words and phrases. They are effective if you want to keep obscenities out of your community, but they don’t work for hate speech.

A commenter doesn’t need to resort to using bad language to say something hurtful and insidious. By the same token, bad language can also be seen in many harmless comments. The obvious solution is to use a business model that blends automation and human interaction.

And that’s where we come in.

1. Choose the right AI moderation tool

  • Research solutions that support your language and platform (e.g., Identrics’ solutions, Perspective API, custom AI models).
  • Consider tools with proven accuracy and support for your community’s size.

2. Integrate the tool with your platform

  • For WordPress, Disqus, or similar, check for ready-made plugins or APIs.
  • For custom sites, use available REST APIs or SDKs for integration.
  • Ensure you comply with GDPR and local regulations.

3. Set up moderation rules

  • Define the thresholds for automatic flagging or hiding of comments.
  • Set up human-in-the-loop review for borderline cases or flagged content.
  • Customise filters for specific keywords, phrases, and context.

4. Review and retrain

  • Routinely review flagged comments for accuracy.
  • Use moderator feedback to retrain and improve the model.
  • Update rules and lists based on new trends and emerging hate speech tactics.

5. Monitor, report, and stay compliant

  • Track moderation outcomes (false positives, undetected cases).
  • Generate regular reports for transparency.
  • Ensure your process aligns with legal requirements for hate speech in your region.

At Identrics, we use a human-in-the-loop hate speech detection model.

Tool/MethodAutomation levelLanguages supportedHuman-in-the-loopPricing
IdentricsHighMultilingualYesCustom
Perspective API (Google)ModerateMultilingualOptionalFree
Customer Keyword FilterLowAnyNoFree

Our software checks and flags the comments that may contain hate speech as they are posted. These comments are then sent for human moderation.

The human moderators are directed to the exact words that may contain hate speech, thus allowing them to make sound decisions.

It means that communities can benefit from the ease, simplicity, and speed of automation while still utilising the expertise that only human interaction can bring.

And that’s not all.

If the human moderator determines that the flagged comment is perfectly harmless, they can send it back to the model. The model is constantly learning and improving and grows from a moderator’s feedback on why this comment should not have been flagged, knowing not to flag such a comment in the future.

The longer the model remains active, the more comments it reviews and the more it learns. As it grows, it becomes more effective over time at making these decisions and ensures that fewer false readings are sent for moderation.

Chapter Three of the Bulgarian Criminal Code, “Crimes against the Rights of Citizens”, addresses the prevalence of hate speech and the need to eradicate it. As more of our time is invested in digital communities and more weight is placed on human rights and the importance of keeping them safe, AI will become essential.

Our hate detection model can keep communities safe, clean, and accessible. They combat the spread of AI-generated content and hateful comments in otherwise friendly communities. Just as importantly, it can also ensure that companies remain compliant with regard to current and future laws regarding hate crimes.

After all, lawmakers rarely concern themselves with how the content gets there or if the site/platform can do anything about it. They want it gone at all costs, and may punish the platforms that fail to remove it. 

The Identrics hate speech detection model is easy to implement and will adapt to the community. It learns where those comments come from, what kind of content they contain, and whether or not they need to be removed.

As noted in an article on ethics and responsible AI, Bulgaria plays a major role in the evolution of AI technologies. At Identrics, we take our work very seriously and are helping businesses across the country to utilise AI in ways that can improve their efficiency and bottom line.

Contact Identrics today and discover how our hate speech detection model can help you combat the spread of hateful comments.

How can I automate hate speech detection in community forums?

Use AI-powered moderation tools that integrate with your platform. Set up custom filters, use human-in-the-loop workflows, and routinely retrain your system for best results.

What are the best AI tools for hate speech moderation?

Popular options include Identrics, Google Perspective API, and custom models built with open-source frameworks. Choose based on language support, accuracy, and integration needs.

Can hate speech detection models work in multiple languages?

Many modern tools support multiple languages. Always confirm that your chosen tool covers your main community language(s).

How do I balance automation with human moderation?

The best approach uses automation for first-level screening, with human moderators reviewing edge cases or appeals. This ensures accuracy and fairness.

Is automated hate speech detection accurate?

Accuracy depends on the model, training data, and ongoing supervision. Combining AI with human review achieves the most reliable results.