According to the Microsoft Azure AI Fundamentals (AI-900) official study guide and the Microsoft Learn module “Describe features of computer vision workloads on Azure,” the “Describe Image” feature within Azure’s Computer Vision service automatically generates textual captions that describe the contents of an image.
This feature uses a deep learning model trained on millions of labeled images to identify objects, people, and scenes, then formulates a natural language sentence summarizing what it sees. For example, given a photograph showing “two people sitting at a table with a laptop,” the service might generate the caption: “A man and a woman sitting at a desk using a laptop.”
Here’s how the other options differ:
A. Recognize text: Refers to Optical Character Recognition (OCR), which extracts written text from images, not generates descriptive captions.
C. Identify the areas of interest: Refers to detecting regions of an image that stand out visually, such as hotspots or significant features, not descriptive captioning.
D. Detect objects: Identifies and classifies objects in an image (e.g., cars, chairs, people) but doesn’t produce a sentence or caption summarizing them.
Thus, only “Describe the images” generates automatic, human-readable captions that summarize photo content, a core computer vision workload taught in AI-900.