Vision（图片理解）

所有模型均支持图片理解。在 content 中使用数组格式传递文本和图片。

使用外部图片 URL

json

{
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/photo.jpg"
          }
        }
      ]
    }
  ]
}

使用 Base64 编码

json

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "data:image/png;base64,iVBORw0KGgo..."
      }
    }
  ]
}

图片限制

支持格式：JPEG、PNG、GIF、WebP
单张图片大小上限：10 MB
外部 URL 请求超时：10 秒
出于安全考虑，不支持访问内网地址（localhost、私有 IP 等）

流式响应

Function Calling