AI News: OpenAI O1, RunwayML on Safety, Video Enhancements, and More

Today, I’ve got some awesome updates from the AI world that you’ll definitely want to hear about. Let’s dive right in!

Sully’s Workflow with OpenAI O1

Sully’s breaking down his secret sauce for getting the most out of OpenAI’s O1. If you’ve tried using O1 and found it tricky, you’re not alone. Sully suggests it’s all about giving O1 the right setup. Instead of going back-and-forth like you would with GPT-4, you need to build a solid context or structure first. This means preparing a long, detailed document, and then handing it off to another model for optimization. Only then do you pass it over to O1, where the magic really happens. Pretty smart, right?

a lot of people are finding o1 a bit tricky to use

So i made a video about the workflow i use pretty often

main lessons:
– o1 is super smart, but it needs context for everything
– Don't use it like gpt4o (back & forth) its a waste of time
– build context (voice mode), create a… pic.twitter.com/PNiSIVi5P3
— Sully (@SullyOmarr) October 7, 2024

Supervision 0.24.0 – Line Crossing Counts by Category

The new Supervision update is here, and it’s got a neat feature for counting line crossings by category. If you’ve been waiting for this, the update is live now, and setting it up is a breeze. In under 30 minutes, you can have this feature up and running. For more details, check it out on GitHub.

supervision-0.24.0 is out! you can finally count per-class line crossings. many of you have been asking for this, now we have it!

it took me barely 30 minutes to make this demo using supervision!

link: https://t.co/xXMRaS4ejS pic.twitter.com/kMlNWmSs7H
— SkalskiP (@skalskip92) October 7, 2024

https://github.com/roboflow/supervision

LeLaN – Learning Navigation Strategies from Real Videos

Next up, we’ve got LeLaN, a new project from UC Berkeley and Toyota North America. This one’s all about teaching robots to understand and follow language instructions. By using videos from YouTube and pre-trained models, LeLaN allows robots to learn how to navigate in real-life situations, which is pretty cool!

Our new paper on using YouTube videos to learn language conditioned navigation is out! By leveraging pretrained models and video data mined from the web, we can get robots to better understand language instructions. https://t.co/viydd3lOg5
— Sergey Levine (@svlevine) October 7, 2024

[Video] Runway – Gen-3 Alpha Turbo Image to Video

Gen-3 Alpha Turbo now allows you to specify the first and last frames, and also supports vertical aspect ratios.

You can now provide both first and last frame inputs for Gen-3 Alpha Turbo. Available for all users on web in both horizontal and vertical aspect ratios. pic.twitter.com/JsdEqGYf3P
— Runway (@runwayml) October 8, 2024

[Video] Hailuo AI – Image to Video

Hailuo, a video generation AI, now supports Image to Video. It’s currently free to generate, so if you’re interested, give it a try.

🌟 The wait is finally over ——We are excited to announce the launch of our Image-to-Video feature! 🎬✨

What distinguishes Hailuo's Image2Video experience?
– Text-and-image joint instruction following: Hailuo seamlessly integrates both text and image command inputs, enhancing… pic.twitter.com/ke59EjtmBB
— Hailuo AI (@Hailuo_AI) October 8, 2024

[Video] HeyGen – Avatar 3.0 with Unlimited Looks

HeyGen is top class when it comes to avatars. With Unlimited Looks, you can now change the camera angle, outfit, and even the pose. At this point, I can’t even tell if it’s an AI avatar…

HeyGen just dropped Avatar 3.0 with Unlimited Looks.

Now anyone can clone themselves with AI and unlock multiple poses, outfits, and camera angles.

Here's how: pic.twitter.com/8bFSP9E2D6
— Min Choi (@minchoi) October 7, 2024

Imrat’s Top Moments from Lex Fridman’s Podcast

Imrat shared his favorite moments from Lex Fridman’s chat with the Cursor team. If you’re into AI programming, this episode dives deep into some juicy details. It’s a must-watch for anyone who uses AI tools like Claude, O1, GPT Engineer, and more. You might just pick up a new trick or two.

RunwayML on Safety in Generative Models

RunwayML is making sure that as AI tools get smarter, they’re also safe and fair to use. They’re rolling out new safeguards to help prevent misuse of generative models. Whether you’re an artist, a creator, or just someone exploring the possibilities, it’s good to see companies taking responsibility for the impact of their tools.

As we continue to build General World Models that advance human creativity, support artists, and augment media and entertainment industries, we have further deepened our sense of responsibility to build tools that have a net positive impact on the world.

Today, we are sharing…
— Runway (@runwayml) October 7, 2024

Link: https://runwayml.com/research/foundations-for-safe-generative-media

Signal’s VideoGuide – Better Video Diffusion without Training

Signal’s introduced a way to improve video diffusion models, called VideoGuide. This nifty tool helps keep visual quality high while also enhancing the consistency of generated videos. It’s all about getting a smoother result without putting in extra training, so if you’re into text-to-video generation, this could be a game changer.

https://videoguide2025.github.io

Differential Transformer

Transformers just got an upgrade with the Diff Transformer. It helps focus on the important stuff by reducing irrelevant context. This new setup is great for things like long-context learning and text summarization. It even has benefits for dealing with hallucinations in AI responses.

https://arxiv.org/pdf/2410.05258

OmniBooth – Multimodal Image Control

OmniBooth is giving you even more power over image creation. It lets you control the look and feel of your generated images with both text and image prompts. Want specific colors, objects, or styles? OmniBooth’s got you covered with fine-tuned control that can make your image creations really pop.

https://len-li.github.io/omnibooth-web

MathHay – Benchmarking Math Reasoning in AI

MathHay is a new benchmark that tests how well AI can handle complex math over long contexts. It’s tough, even for top models. The results show that even the best AIs have a lot of room for improvement in the math department. If you’re into pushing AI limits, MathHay is a cool tool to explore.

https://arxiv.org/pdf/2410.04698

FAN – Fourier Analysis Networks

Finally, there’s FAN, a new way to incorporate periodic data into neural networks. This setup helps models understand recurring patterns better, which is crucial for things like time series prediction. FAN promises a smarter and more efficient way to tackle these types of problems.

https://arxiv.org/pdf/2410.02675

And that’s it for today! Keep exploring, and let me know which of these updates you’re most excited about.

Zamba2-2.7B-Instruct

The Zamba2-2.7B-Instruct model is a new version designed to follow instructions better and handle chat data. It combines two types of technology: state-space (Mamba2) and Transformer blocks. This model performs really well in tests, even beating larger models. It works quickly and doesn’t use a lot of memory, making it perfect for devices.https://huggingface.co/Zyphra/Zamba2-2.7B-instruct

Awesome Remote Job Resources

Check out the awesome-remote-job GitHub repository created by lukasz-madon! This collection has many helpful resources for remote workers. You’ll find articles, tools, job postings, and even communities for remote work. A standout feature is the “Remote DNA” list, which helps job seekers find companies that truly support remote work. It also shares legal and financial info about working remotely and includes a directory of useful communities and tools. https://github.com/lukasz-madon/awesome-remote-job