Generative AI

NVIA recently issuing Audio Flamingo 3: Open source model that promotes general sound

He heard a common wise telephone (AGI)? Meet its sense of hearing-The general noise is intelligence. Reference Sound flamingo 3 (Af3)Unvidia introduces a great leap in the equipment and showing one another. While the past models can write sound pieces or lose the ability to translate audio in the situation-rich mankind, such as a way of speaking, as well as the music, and long ingenuity. AF3 changes that.

With Flamingo 3, NVIFA introduces Larth's largest opening model (Lalm) That is not only uncertain but also understands the reasons. Designed in a five-class class curriculum. This puts the new bar for ai system joining the sound, bringing us a step near AGI.

Core Innovations after sound flamingo 3

  1. AF-Wheshy: Combined Encoder combined AF3 uses gossip, novel that is converted from Shikerves-V3. Processing speeches, good sounds, and music using the same construction and solving large amounts of previous lalmier using different accomposists, which results in impotence. AF-WHISPERS AUDIO-CAPTION DATIOS, METADATA used, and 1280 dimensions of moving space to sync the text.
  2. Chain-of-You thought about noise: Reasons Reass Unlike Static QA programs, AF3 is equipped with 'thinking' skills. Using AFF Data Data
  3. Multi-Turn, many audio conversations By using the AFF Data (75k conversations), AF3 can hold contexts including many sounds. This imitates genuine surfing of the earth, where people refer to the previous audio sounds. It also introduces Voice-to-word conversations using the Scripture's distribution module to the expression.
  4. Defect Relationships AF3 is the first open model that can see with audio installation up to 10 minutes. Tracked by the okaudi-XL examples (1.25m Examples), the model supports activities such as the sum of the meeting, podcast understanding, detection, and temporary basis.

State-of-Archchmark and original world power

AF3 passes open models and closed over 20 benches, including:

  • Mmau (AVG): 73.14% (+ 2.14% above QWEN2.5-O)
  • LongaudioBen: 68.6 (GPT-4O in Vivous), beating Gemini 2.5 Pro
  • Librispeech (ASR): 1.57% of the weer, the pho-4-4-mm
  • Cootoqa: 91.1% (vs. 89.2% from QWen2.5-O)

This is made to improve not just for Satan; They redefine your expectations in sound system systems. AF3 also launches measurement of a verbal conversation and the delivery of the speech, reaching 5.94S Generation Latency (vs. 14.62s for QWen2.5) and the best scores.

Data Pipeline: Datasets teach audio thinking

Nvidia did not simply measure Compute-and recover information:

  • AudiSiskills-XL: 8M Examples including Asbient, music, and speech consultation.
  • Longaudio-XL: It includes long-term speech from AudioBooks, podcasts, meetings.
  • Wish: Promotes the detection of a short style.
  • AF-chat: Many discussions are built, many audio conversations.

Each dataset is fully open, along with the training code and cooking code, enables renewal and future research.

Open source

AF3 is not just an example. Nvidia Is Has Rinued:

  • The weight of the model
  • Recipes of cooking
  • Measurement Code
  • Four open datasets

This reflects makes Audio-language-language model that is the most common language. It opens new research indicators are high-quality consultation, low-lower lower audio audio, music insight, and a lot of contact information.

Conclusion: For more in the General Audio intelligence

Audio Flamingo 3 shows that a deep sound understanding is not just possible but open and open. By combining the amounts, novel training techniques, and various data, the Nvidi is delivering the obedience, understanding and grounds for ways that may not be accessed.


Look Paper, Codes and model in face massage. All credit for this study goes to research for this project.

We're ready to contact 1 million Devs / Engineers / Investigators? See that NVIADIA research, LG AI, and senior AI services MarktechPost benefit to their target audience [Learn More]


Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button