Multimodal Perception

SynTrackThinking improves multimodal multi-object tracking for autonomous driving through frequency-aware fusion and temporal contrastive learning

The alternative text for this image may have been generated using AI. It is important to clarify that SynTrackThinking fundamentally differs from scalable multimodal reasoning frameworks, such as ...

techtimes

Advancing Multimodal AI for Integrated Understanding and Generation

Abstract: Advancing Multimodal AI for Integrated Understanding and Generation explores the transformative potential of multimodal artificial intelligence (AI), which integrates diverse data types such ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

Nature

Multimodal deep learning for anomaly detection in urban infrastructure networks: improving the resilience of public management systems

The construction of new smart cities has entered a phase of deep development, wherein urban infrastructure networks have evolved into complex heterogeneous systems with multiple coordinated subsystems ...

EurekAlert!

Show inaccessible results

SynTrackThinking improves multimodal multi-object tracking for autonomous driving through frequency-aware fusion and temporal contrastive learning

Advancing Multimodal AI for Integrated Understanding and Generation

Microsoft unveils AI model that understands image content, solves visual puzzles

Multimodal deep learning for anomaly detection in urban infrastructure networks: improving the resilience of public management systems

Stretchable multimodal sensor arrays for hardness perception in bionic skin

How the Gemma 4 Vision Agent’s “Agentic Loop” Solves Complex Visual Reasoning

Omni-modal language models: Paving the way toward artificial general intelligence

Modelling Passenger Behavior in Disrupted Multimodal Transport Networks