Newsroom

Published 22 september 2024 | Updated 26 september 2024

Convert Armenian Speech To Text In Seconds

Journalist

Artificial intelligence (AI) has become an essential tool for media professionals. With the rapid advancement of AI technology, leading media organizations worldwide are increasingly incorporating AI tools into their daily work. The Armenian media is also showing interest in utilizing AI tools, and the Armenian tech community is working to adapt and optimize these widely used models for the Armenian language. The problem of converting Armenian audio text into written text has piqued the interest of both technologists and media professionals.

Media.am interviewed the creators of various similar projects to determine when it might become feasible to transcribe Armenian speech into written form instantaneously.

Hayk Hovhannisyan: CivilNet, fact-checking journalist

As a fact-checking journalist, I spend much time in my daily work watching, listening to, transcribing, and monitoring various videos and interviews. Recently, there has been rapid development in Artificial Intelligence tools, and the world’s leading media organizations have started using them to create content; this has sparked my interest in the direction our technology community is taking. I have been considering building a platform that utilizes artificial intelligence to create, edit, and enhance Armenian content.

Various professionals from different fields came together to discuss the idea. As a journalist, I identified and presented the issues the media industry needs to address while our programming team started working on finding solutions to these problems. I believe that the strength of our team lies in our ability to complement each other professionally, which will ultimately help the platform significantly improve the efficiency of media work.

Hayk Hovhannisyan

One of the main functions of our startup, our platform, will be to transcribe native spoken words using artificial intelligence, but we have decided not to limit ourselves to that. In addition to editing the transcribed text correctly and making the necessary punctuation and spelling corrections, the platform will enable you to turn the written word into spoken word. We also encounter this problem a lot in our daily work. The process of preparing reports will become easier, reducing the time spent.

Currently, the work is still in progress and has not been published yet. We are currently in the process of raising funds and have already secured some agreements. I believe we can finish the project in the next few months. I am aware that other groups have also started working on similar projects during this time. I believe that competition makes the work more interesting and helps us work more efficiently.

Varuzhan Baghdasaryan, founder of Talk2edit platform

I am a postgraduate student at the Polytechnical Institute. During my three years of postgraduate education, I focused exclusively on researching natural language processing tools and adapting them to Armenian.Three years ago, when I was working on this project, there were no or very few relevant studies and articles. As a result, the work progressed rather slowly and was challenging. I believe that within a short period, the leading international companies in the field will expand, improve, and develop their language base to such an extent that they will surpass all Armenian models in the accuracy of decoding Armenian. In other words, we have until big companies start modeling Armenian. I think this process is essential for developing our scientific understanding.

The Talk2edit.com platform offers an automatic Armenian speech recognition system. It is one of the first systems for transcribing Armenian speech. It can transcribe up to 1 hour of audio or video clips (mp3, WAV, mp4) and works with microphones. Additionally, uploading YouTube videos for transcription on this platform is possible.

Varuzhan Baghdasaryan

This platform allows you to test automatic speech recognition and grammar and spelling correction. Currently, the model has an error rate of about 8.5%, meaning that 8.5 out of 100 words are predicted and transcribed incorrectly. The upcoming model that we will train soon aims to minimize these errors. Additionally, we are working on generating video captions and subtitles.

If you test the platform, you will see that the system works accurately, especially when the recording is done in a quiet environment. It works quickly and easily. We are currently working on improving our second model to address any existing flaws. In the near future, we aim to develop the system’s ability to distinguish between speakers during transcription; for instance, in a journalistic interview, it will be able to separate the words of each speaker. Alongside strengthening and improving the tool, we are also exploring potential business models and how we can monetize our idea. Our focus is on securing financing and distributing our product.

Levon Hakhoyan, founder of hispeech.ai

Transcribing and editing Armenian audio recordings and removing and shortening some parts from audio and video clips are currently done manually, which is quite time-consuming.

Our tool,hispeech․ai, has been in development for over a year. It will automatically transcribe and summarize any audio recording in the Armenian language. It’s important to note that you’ll be able to edit the audio or video by simply editing the transcript. This means that by editing the text, you can also edit the audio or video. Additionally, the tool will allow you to add subtitles and captions to videos.

Levon Hakhoyan

After delving deeper into the topic, I realized that this technological solution could significantly streamline and enhance the work of media professionals such as journalists, podcasters, and commercial producers, and it can also be applied in many other fields. For example, in the customer support centers of banks, communication operators, where there is a problem with listening and transcribing phone calls; in the judicial system – transcribing court sessions; in the field of education – transcribing lectures; in business, transcribing webinars, for summarizing. Another important area that made it very exciting was that such a tool would make it possible to make information more accessible to people with hearing impairment.

The product is built on the latest advancements in artificial intelligence. We collected and processed a large amount of data to create the AI model. We developed unique algorithms that enable the product to achieve human-level accuracy, which is crucial for systems of this kind. After launching the tool in Armenian, we aim to introduce new languages, such as Georgian and Urdu. We also plan to incorporate additional tools and features in the future, including noise removal, smart video editing, and more. A demo version of the platform is already available. We anticipate presenting a fully polished and enhanced tool to our target audiences within the next few months.

Tatev Hovhannisyan, employee at Crisp

A few days ago, “Crisp” and the Public Television Company signed a memorandum of cooperation. Our companies will collaborate on creating a new tool that utilizes artificial intelligence. While many AI models are available, a comprehensive Armenian language database is required for this technology to be most effective in our language.

Based on the media content provided by the public television company Crispy, a new artificial intelligence tool will develop its ability to generate texts from spoken Armenian speech.

Tatev Hovhannisyan

We hope to develop the best working model through our collaboration. We already have a preliminary concept in place and will continue our work based on it. We plan to use Armenian media content to enhance the model. Based on preliminary data, the tool may be ready for operation in about six months. However, this timeline is approximate and subject to change based on various factors. In the future, the Public Company will utilize the tool in different stages of TV production. For instance, it will be used to create captions during films and programs and improve the efficiency and speed of current work processes.

Artificial intelligence tools can be widely utilized in the media industry and across various fields. The key challenge lies in ensuring access to sufficient Armenian data to enable accurate and practical functionality similar to the capabilities in English.