Modern generative models are increasingly "multimodal," meaning they can process and generate across different types of data simultaneously. For instance, a model can take a text prompt and output an image, or take an image and answer text questions about it.
- • Text-to-text: Drafting, summarizing, and translation (e.g., LLMs).
- • Text-to-image/video: Generating media from descriptions (e.g., Midjourney, Sora).
- • Text-to-code: Assisting software engineers with code generation (e.g., GitHub Copilot).