ShengShu Know-how launches Vidar multi-view bodily AI coaching mannequin

The Vidar embodied AI mannequin from ShengShu makes use of simulated worlds as an alternative of bodily coaching knowledge. Supply: Adobe Inventory, Vectorhub by ice

ShengShu Know-how Co. yesterday launched its multi-view bodily AI coaching mannequin, Vidar — which stands for for “video diffusion for motion reasoning.” Utilizing Vidu’s capabilities in semantic and video understanding, Vidar makes use of a restricted set of bodily knowledge to simulate a robotic’s decision-making in real-world environments, stated the corporate.

“Vidar gives a radically totally different method to coaching embodied AI fashions,” said ShengShu Know-how. “Simply as Tesla focuses on vision-based coaching and Waymo leans into lidar, the {industry} is exploring divergent paths to bodily AI.”

Based in March 2023, ShengShu Know-how specializes within the improvement of multimodal massive language fashions (LLMs). The Beijing-based firm stated it delivers mobility-as-a-service (MaaS) and software-as-a-service (SaaS) merchandise for smarter, sooner, and extra scalable content material creation.

With its flagship video-generation platform SeeShengShu stated it has reached customers in additional than 200 nations and areas all over the world, spanning fields together with interactive leisure, promoting, movie, animation, cultural tourism, and extra.

Vidar simulated coaching to speed up robotic improvement

“Whereas some corporations practice bodily AI by embedding fashions into real-world robots and gathering knowledge via the bodily interactions that their robots encounter, it’s a technique that’s pricey, hardware-dependent, and tough to scale,” stated ShengShu Know-how. “Others depend on purely simulated coaching, however this typically lacks the variability and edge-case knowledge wanted for real-world deployment.”

Vidar takes a distinct method, the corporate claimed. It combines restricted bodily coaching knowledge with generative video to make predictions and generate new hypothetical eventualities, making a multi-view simulation that includes lifelike coaching environments, all inside a digital area. This permits for extra sturdy, scalable coaching with out the time, value, or limitations of physical-world knowledge assortment, defined ShengShu.

Constructed on high of the Vidu generative video mannequin, Vidar can carry out dual-arm manipulation duties with multi-view video prediction and even reply to natural-language voice instructions after fine-tuning. The mannequin successfully serves as a digital mind for real-world motion, stated the corporate.

Utilizing Vidu’s generative video engine, Vidar generates large-scale simulations to cut back dependency on bodily knowledge, whereas sustaining the complexity and richness wanted to coach real-world-capable AI brokers. ShengShu stated Vidar can extrapolate a generalized sequence of robotic actions and duties from solely 20 minutes of coaching knowledge. The corporate asserted that’s between 1/80 and 1/1,200 of the information wanted to coach industry-leading fashions together with RDT and π0.5.

ShengShu stated Vidar’s core innovation lies in its modular two-stage studying structure. Not like conventional strategies that merge notion and management, Vidar decouples them into two distinct levels for larger flexibility and scalability.

Within the upstream stage, large-scale normal video knowledge and moderate-scale embodied video knowledge are used to coach Vidu’s mannequin for perceptual understanding.

Within the second downstream stage, a task-agnostic mannequin referred to as AnyPos turns that visible understanding into actionable motor instructions for robots. This separation makes it considerably simpler and sooner to coach and deploy AI throughout several types of robots, whereas decreasing prices and rising scalability.

Vidar can reduce the amount of training data needed to train AI models, says ShengShu Technology.

Vidar is designed to cut back the quantity of coaching knowledge wanted to coach AI fashions. Supply: ShengShu Know-how.

Vidar a framework for scalable embodied intelligence

Vidar follows a scalable coaching framework impressed by language and picture basis fashions of the previous decade of AI breakthroughs. ShengShu stated its three-tiered knowledge pyramid, spanning large-scale generic video, embodied video knowledge, and robot-specific examples, makes for a extra versatile system, lowering conventional knowledge bottleneck.

Constructed on the U-ViT structure, which explores the fusion of diffusion fashions and transformer architectures for a large assortment of multimodal technology duties, Vidar harnesses long-term temporal modeling and multi-angle video consistency to energy bodily grounded decision-making.

This design helps fast switch from simulation to real-world deployment, which ShengShu stated is vital for robotics in dynamic environments. It additionally minimizes engineering complexity, in keeping with the corporate,

ShengShu stated Vidar can facilitate robotics adoption throughout a number of sectors. From dwelling assistants and eldercare to good manufacturing and medical robotics, the mannequin allows quick adaptation to new environments and multi-task eventualities, all with minimal knowledge, it added.

Vidar creates an AI-native path for robotics improvement that’s environment friendly, scalable, and cost-effective, ShengShu claimed. By remodeling normal video into actionable robotic intelligence, the corporate stated its mannequin can bridge the hole between visible understanding and embodied company.

Vidar has a modular learning architecture, according to ShengShu Technology.

Vidar has a modular studying structure. Supply: ShengShu Know-how

ShengShu marks milestones in multimodal AI

Vidar builds on the fast momentum of the Vidu video basis mannequin, stated ShengShu. The corporate listed statistics since its debut:

Vidu reached 1 million customers inside one month
Surpassed 10 million customers in simply three months
Generated over 100 million movies by Month 4
Reference-to-video technology exceeded 100 million by Month 8
Whole generated movies now high 300 million

ShengShu continues to develop the frontiers of multimodal AI, Vidar represents the following frontier—bringing generalization, generativity, and embodiment into one unified system.

Editor’s observe: RoboBusiness 2025, which might be on Oct. 15 and 16 in Santa Clara, Calif., will embody tracks on bodily AI and humanoid robots. Registration is now open.

Supply hyperlink

ShengShu Know-how launches Vidar multi-view bodily AI coaching mannequin

Vidar simulated coaching to speed up robotic improvement

Vidar a framework for scalable embodied intelligence

ShengShu marks milestones in multimodal AI

Unitree launches A2 quadruped geared up with entrance and rear lidar

Lyft companions with Baidu to deploy autonomous automobiles in Europe

Orbbec touts Pulsar ME450 as a multi-pattern 3D lidar

LEAVE A REPLY Cancel reply

Most Popular

The place may it happen? : NPR

Why a whisper community app like Tea isn’t your pal

RetroStrip overview: Nostalgic utility brings again the legendary Mac Management Strip

Educating AI to Repair Your Code: My Summer time Enhancing Fast Repair at Databricks

Recent Comments

EDITOR PICKS

The place may it happen? : NPR

Did Dijon Announce a New Album?

Rosie O’Donnell claims ABC will cancel ‘The View’ to appease Trump

POPULAR POSTS

What Is a Model Storytelling Framework and Why Is It Essential?

LiAngelo Ball States Purpose He Give up Basketball

Meta Makes Billion-Greenback Job Provide Competing for AI Expertise

POPULAR CATEGORY

ABOUT US

FOLLOW US