Multimodal

Module: fundamentals

What it is

Multimodal AI can process and generate multiple types of content—not just text, but also images, audio, and video. A multimodal model might accept image uploads and describe them, or generate images from text descriptions. The same model handles different modalities.

Why it matters

Multimodal capabilities dramatically expand what AI can help with. You can share screenshots for debugging, upload documents for analysis, or describe images you want created. Understanding multimodality helps you know what inputs an AI can accept and what outputs it can produce.