AI News: New Tools Like PrintMon Maker, F5-TTS, and SwarmUI 0.9.3

Hello everyone! Today’s AI news is with exciting updates.

PrintMon Maker

This new service lets you turn text or images into 3D-printable models. It’s not just 3D graphics—it creates STL files (the type 3D printers use). We’ve come a long way! While 3D graphics and printable models might look similar, they are quite different in how they work. Converting between the two is tricky and often requires a lot of fixing.

https://forum.bambulab.com/t/introducing-makerlab-printmon-maker/95537

F5-TTS

F5-TTS is the latest open-source model that turns text into speech. It’s faster than older models and makes more natural, expressive voices. Unlike before, you don’t need complex setups to use it—pretty straightforward!

https://huggingface.co/papers/2410.06885

https://github.com/SWivid/F5-TTS

https://huggingface.co/SWivid/F5-TTS

SwarmUI 0.9.3

SwarmUI is a user-friendly interface for generating images using AI models like Stable Diffusion and Flux. It’s designed for both beginners and advanced users, and soon, it will support video and audio generation too. It’s still in beta, but it’s already looking promising!

https://github.com/mcmonkeyprojects/SwarmUI

FLORA

FLORA is a platform where you can access many advanced AI models in one editor. It’s not just for still images—you can also create videos, upscale images, and design your own workflows with an intuitive interface.

https://www.florafauna.ai

Ovis1.6-Gemma2-9B

This lightweight yet powerful vision model works well with the Japanese language too. You can ask questions in Japanese, and it responds in Japanese, which is great for local users.

https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B

https://github.com/AIDC-AI/Ovis

https://github.com/AIDC-AI/Ovis

See2Sound

See2Sound generates spatial audio to match images, animations, or videos. This means the audio changes in direction and depth, making it sound more realistic, especially on a surround sound system.

https://github.com/see2sound/see2sound

https://huggingface.co/spaces/jadechoghari/see-2-sound

FLUX.1-Turbo-Alpha

This update is for the FLUX.1-dev model, using an 8-step LoRA distillation process. It makes the model smaller and faster while boosting performance—a win-win!

https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha

v0 Latest Update

v0 now edits files up to five times faster by only scanning for the changes that need to be made. It’s a huge time saver!

Day 50 of v0 ships: v0 can now edit files up to 5x faster by scanning your code and applying only the necessary changes. pic.twitter.com/Qodg65z7OO
— v0 (@v0) October 13, 2024

SimpleTuner v1.1.2

SimpleTuner, the fine-tuning tool for Flux models, has just been updated. It now supports mask loss training, similar to tools like OneTrainer and Kohya. There’s also a new normalization technique, which was introduced in the Dreambooth guide.

https://github.com/bghira/SimpleTuner

Pyramidflow on ComfyUI

Pyramidflow is a high-performance video generation AI, and now it can be controlled through ComfyUI with a custom node. However, it requires a fair amount of VRAM (usually 24GB–40GB). Luckily, there are ways to reduce VRAM usage, and this is one of them.

https://github.com/AIFSH/PyramidFlow-ComfyUI

Creative AI Examples

Lastly, here’s an impressive use of AI that’s been buzzing around X (formerly Twitter). Someone created a video using a photo taken on their iPhone, with Luma AI’s image-to-video generation technology.

Made this video with iPhone photos I took of my friend Stephanie that I used as keyframes in @LumaLabsAI! With the camera controls I can gen transitions between shots. I also built a custom web app in Next.js to help me speedramp and edit all the clips! Breakdown 🧵(1/18) pic.twitter.com/b3lC92yLrr
— CoffeeVectors (@CoffeeVectors) October 12, 2024

That’s all for today’s AI news!