Ai News:Meta Movie Gen Benchmarks,OpenAI GPT-4o-Audio-Preview,AI Discovers 70,000 New Viruses and more

Meta’s New Movie Gen Benchmarks Simplify Video and Audio AI

Meta has released Movie Gen Bench, a set of tools to help researchers improve AI in creating videos and audio. The release includes two main parts:

Movie Gen Video Bench: The biggest benchmark yet for generating videos from text prompts.

Movie Gen Audio Bench: A way to test AI models that generate sound based on video or text+video inputs.

These benchmarks make it easier to measure how good AI models are at creating media.

As detailed in the Meta Movie Gen technical report, today we’re open sourcing Movie Gen Bench: two new media generation benchmarks that we hope will help to enable the AI research community to progress work on more capable audio and video generation models.

Movie Gen Video Bench… pic.twitter.com/66aOaIGwwR
— AI at Meta (@AIatMeta) October 17, 2024

New GPT-4o-Audio Model for Generating Voices

OpenAI introduced a GPT-4o-audio-preview model. This tool uses creative prompts to generate different voices and speaking styles. It aims to make audio production more flexible and shows how AI can now handle a wide range of audio tasks. You can find more details about it on Twitter.

I can confirm that with system prompt engineering and a high temperature, OpenAI's new gpt-4o-audio-preview model can be instructed to generate voices and any vocal style. https://t.co/Zfxc58Mb5x pic.twitter.com/qL2DgbnPri
— Max Woolf (@minimaxir) October 17, 2024

Faster AI Reasoning with Shortcut Models

Shortcut models are making AI reasoning quicker—up to 128 times faster. Unlike older models, these don’t need extra training steps, making them much easier to use. They are designed to replace more complex AI systems while improving efficiency.

*Shortcut models* are a plug-and-play replacement for diffusion models that can generate in a single step (or more). This speeds up inference by up to 128x.

Shortcut models are trained end-to-end, and do not require a separate distillation phase or learning schedules. pic.twitter.com/De3eOJsK2y
— Kevin Frans (@kvfrans) October 18, 2024

AI Finds 70,000 New Viruses

A new AI tool scanned biological data and discovered 70,000 unknown viruses. This exciting breakthrough could help us learn more about viruses and advance the study of virology. It also shows how AI can help us understand biology better

“AI scans RNA ‘dark matter’ and uncovers 70,000 new viruses”

Although not in this research data, AI will show that we misunderstood and misidentified viruses.

This landmark is the first step. https://t.co/c3F8mUf1dV
— Brian Roemmele (@BrianRoemmele) October 17, 2024

As I scan VHS tapes to train YOUR AI I find stuff that I am at a full loss to understand. pic.twitter.com/5OPWUgRjFc
— Brian Roemmele (@BrianRoemmele) October 18, 2024

Hugging Face Fixes Transformers Issue

Hugging Face developers, including Zach Mueller, fixed a big problem with gradient accumulation in the Transformers library. This update helps AI models train better by fixing how loss is calculated. The fix is now live on GitHub..

The gradient accumulation fix is now in the main branch of transformers!

Thank you to the entire @huggingface team, especially @TheZachMueller and @art_zucker for collabing with us to fix it! 🤗🦥 https://t.co/MzcDIJJwB0
— Daniel Han (@danielhanchen) October 17, 2024

Postmortem: @UnslothAI Gradient Accumulation Report, @huggingface`transformers`, and You!

First, what went wrong.

A great visualization of this issue by @shxf0072is attached, essentially calculating the loss individually without taking into account when sequence lengths… pic.twitter.com/rM8wqg5gBV
— Zach Mueller (@TheZachMueller) October 17, 2024

Meta’s Spirit LM: New Speech-Integrated Language Model

Meta launched Spirit LM, a tool that mixes speech with text, overcoming the limits of older speech recognition tools. By focusing on phonemes, pitch, and tone, this model is set to improve speech-based tasks like transcription and text-to-speech.

Meta Spirit LM: open source language model that mixes text and speech. https://t.co/gVtqE1Hf09
— Yann LeCun (@ylecun) October 18, 2024

Open Materials 2024 (OMat24)

Meta released OMat24, a dataset for predicting material properties. It’s freely available for both commercial and non-commercial use, promoting open science. This dataset aims to help researchers and companies explore new material possibilities.

Meta Open Materials 2024:
Dataset and models for material property prediction. https://t.co/Xz6Ry2twht
— Yann LeCun (@ylecun) October 18, 2024

AI Training Data Crisis

Brian Roemmele raised concerns about a loss of training material for AI due to old VHS media becoming obsolete. He warned that today’s AI models rely heavily on platforms like Reddit and Facebook, which could lead to a narrow view of human experiences.

AgentOccam: AI Automating Web Tasks

AgentOccam is a new tool that uses large language models to automate tasks on websites without training. It performs better than earlier systems, proving that AI can become more efficient in web-based tasks.

👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄
🧙 Link: https://t.co/s6GPYFAEFf… pic.twitter.com/EG9syQFzDV
— Ke Yang (@EmpathYang) October 18, 2024

AI-Generated Spider Webs in Toy Story 4

Pixar used AI to create spider webs for scenes in Toy Story 4’s antique mall. This made the animation process much faster. Only webs directly interacted with by characters needed human input—everything else was automatically generated.

For #ToyStory 4, instead of manually creating cobwebs for their Antique Mall environment, @Pixar created AI spiders which would weave realistic cobwebs for them like a real spider.

You can see the red dots which are the AI spiders as they weave cobwebs in real-time. This… pic.twitter.com/MuAjHTG47g
— Rassoul Edji (@RassoulEdji) October 18, 2024

MEGA-Bench for Multimodal AI Models

MEGA-Bench introduces an evaluation system covering over 500 different AI tasks. This benchmark helps researchers assess how well multimodal models (those handling images, text, and more) perform across diverse tasks.

https://tiger-ai-lab.github.io/MEGA-Bench

https://arxiv.org/abs/2410.10563

SambaNova and Gradio Expand AI Access

SambaNova and Gradio are working together to make high-speed AI tools available to everyone. Their goal is to make advanced AI easier to use, empowering both individuals and businesses.

SambaNova and Gradio are making high-speed AI accessible to everyone—here’s how it works https://t.co/KHN9ByYy6E
— VentureBeat (@VentureBeat) October 17, 2024

NotebookLM Business Customization Tools

The NotebookLM team introduced new features that let users customize audio summaries. They also launched a business version for organizations through Google Workspace, giving teams advanced AI tools for collaboration.

https://twitter.com/omarsar0/status/1847084938803175873

OpenAI’s Residency Program Now Open

OpenAI is offering a residency program for people from non-traditional backgrounds who want to work on AI. This is a chance for curious learners to gain hands-on experience in AI development. Applications are open on OpenAI’s website.

if you're from an unconventional background and want to work on ai, consider applying to the OpenAI residency.
you should be:
– pumped about building true ai
– not afraid of large complex codebases or hard infra problems
– excited to learn fast, dive deephttps://t.co/jqtscGnG6h
— will depue (@willdepue) October 17, 2024

MultiUI: Better Visual Understanding for AI

MultiUI provides a huge dataset to help AI models improve their understanding of web interfaces and documents. It uses text along with screenshots to boost models’ ability to read and interact with different kinds of digital content.

https://arxiv.org/abs/2410.13824

https://arxiv.org/pdf/2410.13824

AGI Milestone Announcement

Yam Peleg recently hinted that Artificial General Intelligence (AGI) might have been achieved, posting a cryptic tweet with symbolic art. While the details remain unclear, this has sparked curiosity in the AI community.

|￣￣￣￣￣￣￣￣￣￣￣￣￣￣￣￣￣|
| AGI has been achieved internally |
|＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿|
(•◡•) /
/
——
| |
|_ |_
— Yam Peleg (@Yampeleg) October 18, 2024

Hugging Face & GitHub: AI and Technology Innovation Simplified

Janus: This is a cool new tool that helps both understand and create things like images or text using AI. It’s more flexible because it separates how it looks at pictures and text. It’s based on a powerful AI model called DeepSeek-LLM-1.3b-base, which works with a huge collection of 500 billion text tags. This means it can do more than older models and understand a lot of different kinds of information.

https://huggingface.co/deepseek-ai/Janus-1.3B

CS-Notes: If you’re getting ready for a tech interview or just want to brush up on computer science basics, check out CS-Notes on GitHub. It’s a big collection of notes covering important topics like algorithms, operating systems, and system design. It’s a great tool for anyone wanting to get a job in tech.

https://github.com/CyC2018/CS-Notes

Papermark: Want a secure way to share documents online? Papermark is an open-source tool that lets you do that. You can use custom web addresses and get stats on who’s viewing your documents. It’s made with tools like Next.js and TypeScript, and it’s perfect for people or businesses who need to safely share files online.

https://github.com/mfts/papermark

Unkey: Managing APIs can be tricky, especially when it comes to security. Unkey is an open-source project that helps developers handle API authentication and permissions. It’s also open for the community to contribute to its development.

https://github.com/unkeyed/unkey

Reddit Discussions: Cool AI Pony Models

A Reddit post caught attention with a super realistic animated pony model. Here are some things discussed:

Video Creation: People talked about using tools like PONY and others like Kling or Runway to make these animations.

Animation Struggles: Some users had a hard time getting the animations just right and asked if they should try different tools.

Visual Issues: Some folks thought the pony’s unblinking eyes and stiff face were a little creepy, but everyone agreed it looked real.

Furry Community: There was a brief chat about how productive the furry community is when it comes to animation, but some felt the need to keep certain topics separate.

This shows how AI tools are helping create super detailed animations, but there’s still room to make them feel even more lifelike.

What is ComfyUI Outpainting?

Ever wanted to take a picture and make it bigger, like painting on a larger canvas? That’s what Outpainting does! It lets you expand a picture beyond its edges, and AI fills in the new parts. This is great for making comic panels or larger banner images.

For example, if you want to add more space around an image, Outpainting will add the new area based on what’s already in the picture, making it look like a natural extension.

Easy Steps to Use Outpainting in ComfyUI

Let’s break it down step by step:

Start your workflow: Click on “Load Default” in the menu to get the basic setup ready.

Upload your picture: Choose a picture from your computer to expand.

Add a “Pad Image” node: This lets you extend the picture. Connect the dots so that your image is ready to grow beyond its original borders.

And just like that, your picture will now have a new area based on what’s already in the image! Super helpful for creating bigger, more detailed visuals.

This guide makes it simple to try out Outpainting and add extra depth to your projects.

Realistic Pony Models are Wild
byu/Impressive_Alfalfa_6 inStableDiffusion