Computer vision encompasses a wide range of tasks, each with different model architectures and training requirements.
- • Image classification: assigning a label to an entire image (e.g., "cat" or "invoice").
- • Object detection: locating and identifying multiple objects within an image with bounding boxes.
- • Semantic segmentation: labeling every pixel in an image with a class.
- • Optical character recognition (OCR): extracting text from images and documents.
- • Video understanding: tracking objects and events across frames over time.