5play
Choose a language

The neural network mastered the imitation of a human voice after listening to a three-second sample

682
The neural network mastered the imitation of a human voice after listening to a three-second sample
Microsoft unveiled VALL-E, an AI-based algorithm capable of imitating a human voice after listening to a 3-second audio recording. So far, the source code of the program is not freely available, but the corporation can already boast of a dozen examples of the algorithm's operation, which give an idea of the quality of the received speech.

The neural network mastered the imitation of a human voice after listening to a three-second sample

Algorithm takes sample voice and text, and then outputs voice acting


Now on the Internet you can find a large number of programs that can synthesize human speech, but as a rule, for learning they will need to listen to several minutes of the original audio recordings with a voice. Against the background of such programs, VALL-E stands apart, since this algorithm needs to listen to only three seconds of the voice and get the text that needs to be converted into speech. The creators also claim that the program is able to imitate even those emotional coloring of the voice and tone of the speaker, which were not heard in the original sample.

VALL-E is based on a neural network that has been trained for 60,000 hours of spoken English. Microsoft does not say whether the algorithm will be released to the public. More information about the functioning of the algorithm can be found in a study by Cornell University. Samples of synthesized voices are on GitHub.

There are no comments yet :(

Information
Login or register to post comments