5play.org / News / The neural network mastered the imitation of a human voice after listening to a three-second sample

The neural network mastered the imitation of a human voice after listening to a three-second sample

655
The neural network mastered the imitation of a human voice after listening to a three-second sample
Microsoft unveiled VALL-E, an AI-based algorithm capable of imitating a human voice after listening to a 3-second audio recording. So far, the source code of the program is not freely available, but the corporation can already boast of a dozen examples of the algorithm's operation, which give an idea of the quality of the received speech.

The neural network mastered the imitation of a human voice after listening to a three-second sample

Algorithm takes sample voice and text, and then outputs voice acting


Now on the Internet you can find a large number of programs that can synthesize human speech, but as a rule, for learning they will need to listen to several minutes of the original audio recordings with a voice. Against the background of such programs, VALL-E stands apart, since this algorithm needs to listen to only three seconds of the voice and get the text that needs to be converted into speech. The creators also claim that the program is able to imitate even those emotional coloring of the voice and tone of the speaker, which were not heard in the original sample.

VALL-E is based on a neural network that has been trained for 60,000 hours of spoken English. Microsoft does not say whether the algorithm will be released to the public. More information about the functioning of the algorithm can be found in a study by Cornell University. Samples of synthesized voices are on GitHub.
Login or register to post comments

Comments 0

There are no comments yet, but you can be the one to add the very first comment!