Our CEO, Vladimir Petkov, delivered a presentation during this year’s FIBEP World Media Intelligence Congress. He talked about the challenges of the new technologies in the industry. During the ‘What to do and what not to do when it comes to a technology’ breakout session, he introduced Identrics’ Multilingual Abstractive Summarisation solution. He also discussed both the benefits and some potential challenges this new technology poses to media intelligence.
In the era of big data, text data emerges every second from all kinds of sources. The rapidly increasing text data has created the need for automatic text summarisation. Automatic text summarisation is an area of Natural Language Processing (NLP). With text summarisation technology, you can generate a summary while preserving the meaning of the original text. There are two basic types of automatic text summarisation: extractive and abstractive. The extractive summarisation gets sentences directly from the document to form a summary. Abstractive summarisation uses advanced natural language techniques to interpret and generate a new shorter version of the original text. Identrics focuses on abstractive summarisation.
Here are some key points of Vlado’s presentation:
Abstractive summarisation is one of the most important solutions in media intelligence. Generating abstract summaries is a challenging task, which involves advanced language modelling. It’s also usually more complex than the extractive summarisation approach. Abstractive summarisation interprets the information in a text and generates new sentences for the summary. It’s very similar to how people write summaries of long texts. That technology is unlike previous versions of summarisation, which generate shorter texts from input sentences. Identrics’ solution uses AI and Machine Learning to produce human-like summaries.
Generating a new text comes with some challenges, however. One such challenge is that sometimes abstractive summarisation techniques can generate false information. That’s why Identrics implements a fact-checking algorithm to solve this problem. The fact-checking algorithm ensures that sentences are factually consistent after transformation. It can also extract any inconsistent parts of the summary if they exist. After consistency is verified, our team sends the bulletproof abstractive summary.
Why abstractive summarisation?
Abstractive summary methods produce coherent and concise summaries. It reduces the length of sentences and thus results in less redundancy in texts. That is especially important if you compare it to extractive summarisation, where non-relevant parts of the original text can be included.
Abstractive summaries can help businesses avoid copywriting issues since they are entirely new texts. That can ultimately save costs and efforts in plagiarism checking. Also, manually going through a lengthy volume of text and summarising it can take a lot of time. Abstractive summaries help cut down the manual effort and time needed to do that.
Factual inaccuracy and hallucinations
One of the challenges to abstractive summarisation is hallucinations in AI. Abstractive summaries can be subject to hallucinations and generate text that is not supported by the original document. Factual inconsistency is challenging because it’s often difficult to detect. Factual accuracy is the consistency between the content of the source text and the generated summary. Identrics’ fact-checking algorithm aims to solve that problem.
Another challenge is hate-speech
With social media hate speech spreads more easily than ever. That’s why hate speech detection is a key task when it comes to automatic abstractive summarisation. Social media poses a difficulty to hate speech detection, as it includes many paralinguistic messages (such as emoticons or hashtags) or poorly written text. Another issue is that hate speech is sometimes ill-defined and can present a challenge for both machines and humans to detect.
Enormous resources to train the AI
OpenAI found that in 2018 the computational power used to train big AI models had doubled every 3.4 months since 2012. Training AI takes enormous resources, and they are doubling at rates faster than before. Furthermore, the power needed to train AI today is doubling at rates more than seven times compared to the period between 1959 and 2012. That also poses the question of AI’s carbon footprint, which by some estimates is as much as it takes to build and drive five cars. The financial cost of AI development is also enormous.
As mentioned previously, abstractive summaries can be subject to factual inaccuracies. Inconsistency can impact the quality of the final product. That is why fact-checking is a crucial part of the abstractive summarisation process. Identrics’ fact-checking algorithm ensures that generated text is correct before delivering the final summary to the client. The fact-checking algorithm will also ensure there are no textual contradictions in the summary.
Hallucination checking algorithm
Alongside the fact-checking algorithm, abstractive summarisation needs a hallucination checking algorithm. Hallucinations can be reduced by limiting summaries to general phrases, but they would not be very informative. Hallucination checking algorithm is another crucial part of providing quality abstractive summaries.
Spell check algorithm
Spell checks are a necessity for every type of summarisation. They are necessary if the original text has any misspelt words. Spelling mistakes not only reduce the overall quality of the summary but also make it difficult to understand. A spell check algorithm can reduce the number of mistakes and ultimately help produce a high-quality end product.
FIBEP is the world’s media intelligence association with members from more than 60 countries. The FIBEP World Media Intelligence Congress has more than 65-year history, which turned it into one of the top events for communications, PR, and social media monitoring.
Watch the presentation in full: