The client
A leading global provider of communications and marketing solutions.
The case
The client wanted to be able to tap into a wealth of renewable energy data, enrich it with specific brand insights, and use machine learning models to uncover new opportunities and strategies.
They needed to find a supplier who could not only handle the technical aspects of working with large data sets and machine learning frameworks but who could also provide the crucial context and insights needed to make informed decisions.
The team at Identrics was excited to take on this challenge and to provide Topic Discovery & Data labelling solutions which can help the client to get a better understanding of the data and the theme they were interested in.
Uncovering renewable energy trends and sentiments in media through topic discovery
Our client had a specific goal in mind – to analyse earned media and explore how certain topics related to renewable energy were being discussed. They needed to identify the Share of Voice (SoV) of various subtopics and understand the sentiment towards them.
This is where our Topic Discovery technology came in. By using advanced algorithms and natural language processing, we were able to analyse vast amounts of data and identify key themes and patterns. Our data visualisations provided a clear and concise view of the information, enabling our client to make informed decisions and devise the right approach to engage with their intended audiences.
This solution saved the client valuable time and resources and provided the insights they needed to succeed in a competitive marketplace.
What did we do
To achieve the goals of this project, we deployed our cutting-edge ML solutions for text enrichment, in addition to incorporating various other models and frameworks tailored to the specific needs of the data.
Our arsenal of technologies and tools included a powerful data lake complete with indexed textual media content.
- Leveraging Identrics’ Database
At Identrics, we leverage various technologies and tools to ensure the highest quality results for our clients. Our Automated Data Collection Framework (ADCF) for media websites allows us to efficiently crawl websites and extract structured or semi-structured data, providing a wealth of information for data mining and information processing.
We’ve selected a set of 779 traditional media sources containing articles about renewable energy in 5 jurisdictions, which we cleaned using various methods to remove irrelevant content.
- NLP Enrichments
We also employed our proprietary, state-of-the-art Topic Classification Model, which utilized predefined taxonomies to accurately tag text documents, such as tweets, with specific topics. This was achieved by carefully analyzing the wording of the text. Additionally, we used advanced Keywords Extraction and Named Entity Recognition systems to further refine and enrich the data.
Data labelling solutions: Accurate and efficient annotation for machine learning models
Our client faced another significant challenge – they had a large data set of 25,000 unlabelled tweets that they needed to enrich with sentiment and topic classification using machine learning.
The accuracy of their models depended on the quality of the training data, making data labelling one of the most important yet challenging tasks.
How did we do it
We leveraged our Data Labelling technology to provide the client with the robust training data set they needed. By carefully labelling the data set, we were able to improve the accuracy of the machine learning models significantly.
But we didn’t stop there. Identrics also provided tools to validate the output and benchmark it against previous work, enabling the client to track successful adoption and drive continuous improvement.
We then utilize a proprietary state-of-the-art topic classification model, which uses predefined taxonomies to tag text documents (like tweets) with specific topics. We also employ keyword extraction and Named Entity Recognition systems to further enrich the data.
At Identrics, we understand the importance of accurate sentiment analysis in a domain-specific language. To address this, we’ve developed advanced techniques for zero-shot and few-shot recognition using Transformer models. This approach ensures that rare or periodic topics are properly classified, even in the absence of specific training data.
To ensure the most accurate results possible, we employ expert analysts to refine the data taxonomy and ensure optimal performance.
Our supervised and unsupervised technologies include sentiment analysis, document similarity analysis, topic classification, NER, and keyword extraction. With our expertise and cutting-edge tools, we ensure our clients receive the highest quality data analysis and insights.
The output consisted of sample data labelled with sentiment and topics and further grouped by names and events mentioned in the tweets. Additionally, we included word clouds for the topics in a pdf format and graphics in png.
Conclusion
In conclusion, Identrics is committed to providing exceptional natural language processing (NLP) solutions and machine learning algorithms to help our clients unlock valuable insights from their data. Our team of experts combines cutting-edge technology with a deep understanding of domain-specific language to deliver tailored solutions that meet the unique needs of each client.
Through our own data lake or data provided by the client, we can leverage our Automated Data Collection Framework (ADCF) for media websites, our proprietary Topic Discovery technology, and a suite of other tools for sentiment analysis, topic classification, Named Entity Recognition (NER), and keyword extraction.
Our data labelling solutions ensure the highest quality training data for machine learning models, while our expert analysis and unsupervised exploration techniques help to uncover hidden patterns and information in the data.
At Identrics, we are proud to deliver solutions that provide valuable insights and drive business success for our clients. Whether it’s uncovering renewable energy trends and sentiments or tackling other data challenges, our team is always ready to take on the next big challenge.