VKontakte will open third-party developers access to its speech recognition technologies

VKontakte will open third-party developers access to its speech recognition technologies

[ad_1]

VKontakte presented another novelty. Now developers will be able to use VKontakte technology in their projects for free. She reads the voice, and also translates it into text.

This speech recognition technology is implemented in a few clicks. You can try it through the web interface on a special page or integrate it through the VKontakte public API, the website says. companies.

As Alexander Tobol, technical director of VKontakte, noted at the opening of the Saint HighLoad ++ conference, neural networks of ASR technology, Automatic Speech Recognition cope well with audio with extraneous noise, as well as a lot of slang and abbreviations.

Two models can be chosen for recognition: neutral and spontaneous. The first is suitable for intelligible speech, as in a TV show or interview. Spontaneous will help, if necessary, to process more ordinary speech with slang. Neural networks process files in a few seconds. They are also able to remove noise and pauses from the decoding. In addition, they understand unintelligible speech and a separate sound “ъ”.

This technology can be tried through the web interface on a special page, or integrated through the VKontakte public API. The portal also has a wide range of methods with which you can create VKontakte mini-applications or use them in third-party projects.

This solution is suitable for indie projects, startups, personal pet projects for learning and self-development. And the version with audio processing up to 100 minutes per day can be used for any purpose. In addition, for unlimited use of technology, you can send an application by e-mail.

According to Alexander Tobol, VKontakte shares a wide range of its own unique technologies. “Our ASR solution is one of the best in the industry for recognizing ordinary, everyday speech, in which slang, borrowings, and abbreviations are often found,” he said.

Tobol added that social network users send more than 2 billion voice messages every month. These are millions of hours of audio that are processed by neural networks.

ASR is used by VKontakte to transcribe voice messages, as well as generate subtitles in videos, personal recommendations, and much more. Three neural networks are involved at once. One is responsible for speech recognition, the second finds suitable words, and the third places punctuation marks. Each message is decrypted in about 1.5 seconds after being sent.

At the same time, the use of technology is limited only by the user’s imagination. According to him, you can make a game with voice control or using a chat bot, and also add voice recognition to a third-party messenger. “We hope that our ASR will help to create new unusual startups and indie projects from promising young developers,” he added.

[ad_2]

Source link