TTS AI - Demosophy.org

Persian فارسی

English

https://translate.google.com/?sl=auto&tl=fa&op=translate

https://www.openai.fm/

https://huggingface.co/spaces/hexgrad/Kokoro-TTS

https://huggingface.co/spaces/Xenova/kokoro-web

cloning
https://huggingface.co/spaces/mrfakename/E2-F5-TTS

https://speaches.ai/

playground

https://docs.puter.com/playground/ai-speech2txt/

Open source AI TTS APP

Out Loud is an open-source desktop application and browser extension framework designed explicitly to act as a local front end for Kokoro-82M. [1]

Developed via the light-cloud-com/out-loud GitHub repository, it provides a user-friendly layer over the model so you do not have to interact with python scripts or command-line terminals. [1]

Key Features of Out Loud

100% Offline Processing: It bundles the Kokoro-82M model directly on your machine. Once installed, your text never leaves your computer or touches a cloud server. [1]
Desktop Interface: It integrates seamlessly as a menu-bar or system-tray utility for quick access. It includes an interactive “Talker Mode” where you can type or paste text to hear it instantly. [1]
Browser Extensions: It includes dedicated companion extensions for Google Chrome and Safari, allowing you to highlight text on any webpage and read it out loud using Kokoro in a single click. [1]
Local HTTP API: It spins up a local server endpoint on port 51730. You can pipe text into it from your own external scripts or automation tools to leverage Kokoro locally

https://www.out-loud.io/

to compile:

Install the Windows LTS version from:

Node.js Downloads

During installation make sure:

☑ Add to PATH

is checked.

git clone https://github.com/light-cloud-com/out-loud.git
cd out-loud
npm install
npm run electron-ui:install
npm run electron:dev
npm run electron:build:win

npm install

Read package.json
Download Electron, React, and all required sub-packages
Create a folder named: node_modules Store the downloaded packages there
Create/update: package-lock.json
which records the exact versions installed.

npm run electron:dev
executes a script defined by the project’s developer in the package.json file.
It does not have a universal meaning. Its meaning depends entirely on what the Out Loud developers put into package.json.

npm run electron:dev

launches the application in development mode.
Development mode usually means:
Runs directly from source code.
Shows debugging information.
Lets you modify files and test changes immediately.
Does not create an installer.

npm run electron:build:win
Compiles/packages the app.
Creates a Windows installer.
Does not usually launch the app.
Produces files you can distribute to other users.
Releases sub-directory of current directory will have a portable version and an installer

any change must be done in .ts files which will be compiled to js

after:

npm run electron:build:win

you can simply run:

releases/windows/win-unpacked/Out Loud.exe

in
C:\amir\out-loud\electron
tts-worker.ts
has pause settings

=-=-==-=-=-=-=

https://dev.to/wonderlab/open-source-project-of-the-day-part-11-supertonic-lightning-fast-on-device-multilingual-tts-50hp

Supertonic 3 (TTS) – a Hugging Face Space by Supertone

https://huggingface.co/spaces/Supertone/supertonic-2

Multiple projects are built on Supertonic:

TLDRL: Chrome extension, free on-device TTS that can read any webpage aloud
Read Aloud: Open-source TTS browser extension supporting Chrome and Edge
PageEcho: iOS e-book reader app
VoiceChat: On-device voice-to-voice LLM chatbot in the browser
OmniAvatar: Generate talking avatar videos from photos and voice
CopiloTTS: Kotlin multiplatform TTS SDK
Voice Mixer: PyQt5 tool for mixing and modifying voice styles
Supertonic MNN: Lightweight library based on MNN (fp32/fp16/int8)
Transformers.js: Hugging Face’s JS library with Supertonic support
Pinokio: One-click local cloud for Mac, Windows, and Linux

=-=-=-=-=-=-=-=-

supertonic-3

needs internet connection use a browser with WebGPU support (Chrome 113+ or Edge 113+).

but runs entirely in your browser, providing fast and private operation without sending any data to external servers.

has voice cloning

https://huggingface.co/spaces/Supertone/supertonic-3

Supertonic 3 (TTS) – a Hugging Face Space by Supertone

=-=-=-=-=-=-=-=-==-

Voice box

is very power full ofline GUI for many models and basically downloads their engines and voices then runs locally with many effects and tools

It effectively clones my voice or any voice that I have 30 seconds file of it

https://voicebox.sh/

only two voices are useful

LuxTTS (Fast, CPU-friendly)
- Speed Category: Ultra-Fast (Blazing fast local execution, optimized to achieve up to 150x real-time generation speeds and runs efficiently even on basic CPU hardware). [1]
Kokoro 82M
- Speed Category: Extremely Fast (An incredibly lightweight 82-million parameter model that hits a sub-0.04 Real-Time Factor (RTF), generating audio almost instantly). [1, 2]

The others takes minutes for just the model to be loaded

=–=-=-=-=-=-=-=-=-=-=-

Open-weight means you can download the model’s brain (the weights), but the creators keep the cooking recipe (datasets and code) private.

The Spectrum of AI Openness

Closed-Source: Everything is secret. You can only use it via an API (e.g., OpenAI’s ChatGPT).
Open-Weight: You download and run the full model locally. You can fine-tune it. However, you do not get the training data or the exact source code used to train it (e.g., Meta’s Llama 3, Mistral, or Kokoro-82M).
Open-Source: Everything is free and public. This includes weights, training data, training code, and evaluation sets.

Open-Weight & Open-Source TTS Alternatives

Model	Parameters	Category	Core Benefit
Supertonic	66M	Open-Weight	10x faster than Kokoro on basic CPUs
Kokoro-82M	82M	Open-Weight	High efficiency with exceptional voice quality
Piper	Very Small	Open-Source	Maximum compatibility on tiny edge hardware
Chatterbox-Turbo	350M	Open-Source	ElevenLabs-level quality and cloning
Fish Audio S2 Pro	~500M	Open-Weight	Natural language inline emotion controls
MeloTTS	Variable	Open-Source	Seamless mixed-language text synthesis

To run these models completely offline, you need to download two components onto your local machine: the model weights (the .pth, .onnx, or .safetensors file) and the inference code (the Python/C++ library). Once saved locally, they require zero internet access to synthesize speech.

How to Run Them 100% Offline

Supertonic / Kokoro-82M: Download the architecture files from Hugging Face once. Point your script to the local folder path instead of the Hugging Face repository ID to block internet requests [5].
Piper: Best for pure offline edge devices (like Raspberry Pi). It compiles into a single standalone C++ binary with .onnx voice files. It does not require Python or external dependencies to run.
ONNX Runtimes: Convert any of these models to ONNX format. This allows you to run them offline using lightweight runtimes in C++, Rust, or Go without loading heavy machine-learning frameworks.

Service	Lowest Paid Plan	Free Plan	What Paid Adds Compared to Free
ElevenLabs	$6/mo	10k credits/month, basic TTS, limited projects, no commercial use	30k credits/month, commercial license, instant voice cloning, dubbing studio, 20 projects instead of 3 (ElevenLabs)
NaturalReader	$9.92/mo (annual billing)	Free reading, limited premium voices, limited exports	AI voices, larger export allowance, more voice choices, file support, OCR, higher character limits (NaturalReader)
Speechify	~$11–12/mo (annual)	Limited voice quality and features	Premium voices, faster reading, larger export allowances, offline features (The Speakr)
PlayHT	~$19/mo	Limited monthly characters and voices	More characters, premium voices, voice cloning, commercial rights (AnySpeech)
Murf AI	~$19–29/mo	Limited generation and exports	More generation minutes, commercial rights, voice editing tools (TechRadar)
Fish Audio	~$9–15/mo	Limited credits	More generation credits, cloning, commercial use (plan-dependent) (AnySpeech)
TTS Studio AI	~$10/mo	Limited monthly generation	Larger character quota and additional voices (AnySpeech)
Voice.ai	~$5/mo	Limited credits and voice access	More credits, premium voices, additional voice tools (AnySpeech)
SpeechGeneration AI	~$5/mo	Limited monthly generation	Higher character limits and additional voice options (AnySpeech)
Deepgram Aura Studio	~$19/mo	Limited usage	Higher monthly generation quotas and premium voices (AnySpeech)

https://openai.com/api/

https://openai.com/api/pricing/

Rough token estimate

For English:

1 token ≈ ¾ of a word
100 words ≈ 130–150 tokens
1 page of text ≈ 500–800 tokens

The exact count depends on the language and content.

So, in short:

Input column = what you send to the model.
Output column = what the model sends back.
You are billed separately for each.