TM-Speech Audio Samples

English Single-Speaker TTS

Dataset: LJSpeech
Checkpoint: link
Config: link
Vocoder: HiFi-GAN (LJSpeech)

long audio

FastSpeech2 Mamba-Transfomer TM-Speech
Text: In this section, we first describe the design motivation behind TM-Speech and then introduce the architecture of our model. TM-Speech leverages Mamba’s exceptional capability in modeling one-dimensional sequences to enhance audio reasoning. This approach aims to simplify the training pipeline and improve speech quality for fully end-to-end text-to-waveform synthesis.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: In the quiet moments just before dawn, when the world holds its breath and the sky turns from deep indigo to soft gold, there’s a sense of magic in the air. The first light touches the horizon, painting the clouds in hues of pink and amber, as if nature itself is waking from a dream. Birds begin their morning songs, filling the silence with melody, and the air is cool, crisp, full of promise. It’s a reminder that each day is a new beginning, a fresh canvas upon which we can paint our hopes, dreams, and endless possibilities.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Artificial intelligence holds the potential to revolutionize nearly every aspect of human life. From healthcare to transportation, AI is unlocking new possibilities, improving efficiency, and solving complex problems with unprecedented speed. In the future, we may see AI-driven breakthroughs in medical diagnoses, climate change mitigation, and personalized education, transforming industries and enhancing quality of life. While the technology promises to create smarter cities and autonomous systems, it also challenges us to address ethical concerns about privacy, bias, and control. The future of AI is a blend of innovation and responsibility, a powerful tool shaping tomorrow’s world in ways we are just beginning to imagine.

short audio

FastSpeech2 Mamba-Transfomer TM-Speech
Text: AI-generated art and music challenge traditional notions of creativity, blending human input with machine ingenuity.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Ethical concerns around bias, misinformation, and intellectual property remain key challenges for generative AI's future development.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Generative AI contributes to scientific research by predicting molecular structures or simulating complex systems.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Generative AI creates content, including text, images, and music, by learning patterns from vast data sets.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Generative AI models like GPT can write stories, articles, or code, mimicking human language fluently.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: In healthcare, it helps generate personalized treatment plans by analyzing patient data and medical research.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: It optimizes business processes by generating marketing content, emails, and product descriptions automatically.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: It powers innovations in art, design, and media by producing original and tailored outputs based on user prompts.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: Its potential in gaming allows the creation of dynamic, adaptive environments and characters that respond uniquely.

FastSpeech2 Mamba-Transfomer TM-Speech
Text: This technology enhances creativity by collaborating with humans, offering new ideas and solutions in various fields.

reconstruct audio

FastSpeech2

LJ003-0066 (Ground-Truth) LJ003-0066 (Synthesized)
Text: As many various and, according to our ideas, heinous crimes came under this head.

LJ020-0058 (Ground-Truth) LJ020-0058 (Synthesized)
Text: Peep under the cloth two or three times to see whether they rise evenly, and turn the pan around once that all may be equally exposed to the heat.

LJ021-0142 (Ground-Truth) LJ021-0142 (Synthesized)
Text: Accordingly, I propose to confer within the coming month.

LJ024-0043 (Ground-Truth) LJ024-0043 (Synthesized)
Text: that I will appoint justices who will not undertake to override the judgment of the Congress on legislative policy.

LJ030-0059 (Ground-Truth) LJ030-0059 (Synthesized)
Text: was a metallic frame with four handholds that riders in the car could grip while standing in the rear seat during parades.

LJ038-0259 (Ground-Truth) LJ038-0259 (Synthesized)
Text: The oral report was negative because of the battered condition of the bullet.

Mamba-Transfomer

LJ001-0074 (Ground-Truth) LJ001-0074 (Synthesized)
Text: the best, mostly French or Low-Country, was neat and clear, but without any distinction.

LJ003-0309 (Ground-Truth) LJ003-0309 (Synthesized)
Text: Proper hours for locking and unlocking prisoners should be insisted upon.

LJ005-0080 (Ground-Truth) LJ005-0080 (Synthesized)
Text: Accordingly due provision was made for the enforcement of hard labor on all prisoners sentenced to it, and for the employment of all others.

LJ007-0084 (Ground-Truth) LJ007-0084 (Synthesized)
Text: The inspectors in the following year, on examining the facts, found that some of these poor creatures had been in confinement for long periods.

LJ012-0058 (Ground-Truth) LJ012-0058 (Synthesized)
Text: When the journey was resumed, Mrs. Solomons accompanied her husband in the coach. Half-way to Newgate she was taken with a fit.

LJ021-0078 (Ground-Truth) LJ021-0078 (Synthesized)
Text: no economic panacea, which could simply revive over-night the heavy industries and the trades dependent upon them.

TM-Speech

LJ003-0172 (Ground-Truth) LJ003-0172 (Synthesized)
Text: The keeper went still further in his efforts to make money.

LJ018-0348 (Ground-Truth) LJ018-0348 (Synthesized)
Text: Henry Wainwright's impassioned denial of his crime, even after it had been brought fully home to him, has many parallels in the criminal records.

LJ027-0141 (Ground-Truth) LJ027-0141 (Synthesized)
Text: is closely reproduced in the life-history of existing deer. Or, in other words.

LJ042-0153 (Ground-Truth) LJ042-0153 (Synthesized)
Text: This was a fairly coherent description of life in that country, basically centered around the radio and television factory in which he worked.

LJ047-0051 (Ground-Truth) LJ047-0051 (Synthesized)
Text: particularly since his employment did not involve any sensitive information.

LJ050-0230 (Ground-Truth) LJ050-0230 (Synthesized)
Text: Manpower and Technical Assistance From Other Agencies.

Mandarin Multi-Speaker TTS

Dataset: AISHELL-3
Checkpoint: link
Config: link
Vocoder: HiFi-GAN (universal)

短音频

FastSpeech2 Mamba-Transfomer TM-Speech
Speaker: row1-SSB0762、 row2-SSB1863、 row3-SSB0273、 row4-SSB1161、 row5-SSB0354
Text: 日照香炉生紫烟,遥看瀑布挂前川,飞流直下三千尺,疑是银河落九天。

FastSpeech2 Mamba-Transfomer TM-Speech
Speaker: row1-SSB0762、 row2-SSB1863、 row3-SSB0273、 row4-SSB1161、 row5-SSB0354
Text: 轻轻的我走了,正如我轻轻的来,我轻轻的招手,作别西天的云彩,那河畔的金柳,是夕阳中的新娘。

FastSpeech2 Mamba-Transfomer TM-Speech
Speaker: row1-SSB0762、 row2-SSB1863、 row3-SSB0273、 row4-SSB1161、 row5-SSB0354
Text: 桃树、杏树、梨树,你不让我,我不让你,都开满了花赶趟儿。红的像火,粉的像霞,白的像雪。

FastSpeech2 Mamba-Transfomer TM-Speech
Speaker: row1-SSB0762、 row2-SSB1863、 row3-SSB0273、 row4-SSB1161、 row5-SSB0354
Text: 应该说今天的机器学习、人工智能在这方面是一个有力的工具,但在可预见的未来还无法替代人类。

长音频

FastSpeech2 Mamba-Transfomer TM-Speech
Text: 语言模型可以分为三块:算力、数据和算法。所以语言模型也好,整个机器学习模型也好,本质上就是把数据通过算力和算法压进中间那个模型里面,使得模型有一定的能力,在面对一个新的数据时,它能够在原数据里面找到相似的东西,然后做一定的修改,输出你要的东西。