OpenAI’s GPT-4 Vision

Learn bits

Science & Tech.

Mahesh

12/04/24 06:04 AM IST

OpenAI’s GPT-4 Vision

In News

Chat-GPT can also create images from natural language prompts, thanks to the integration of DALL-E.

GPT-4 vision

GPT-4 with Vision, also referred to as GPT-4V, allows users to instruct GPT-4 to analyse image inputs.
Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development.
GPT-4 Vision has been considered OpenAI’s step forward towards making its chatbot multimodal — an AI model with a combination of image, text, and audio as inputs.
It allows users to upload an image as input and ask a question about it.
This task is known as visual question answering (VQA). GPT-4 Vision is a Large Multimodal Model or LMM, which is essentially a model that is capable of taking information in multiple modalities like text and images or text and audio and generating responses based on it.
It is not the first and only LMM.
There are many others such as CogVLM, LLaVA, Kosmos-2, etc. LMMs are also known as Multimodal Large Language Models (MLLMs).

Benefits

GPT-4 Vision has some groundbreaking capabilities such as processing visual content including photographs, screenshots, and documents.
The latest iteration allows it to perform a slew of tasks such as identifying objects within images, and interpreting and analysing data displayed in graphs, charts, and other visualisations.
GPT-4 Vision can also interpret handwritten and printed text contained within images.
This is a significant leap in AI as it, in a way, bridges the gap between visual understanding and textual analysis.
The model is capable of taking from a design on paper and creating code for a website.
Data interpretation is another key area where the model can work wonders as the model lets one unlock insights based on visuals and graphics.

Limitations

GPT-4 Vision also continues to reinforce social bias and worldviews, according to its maker.
The model has been trained to avoid identifying specific individuals in images which OpenAI calls ‘refusal’ behaviour by design.
The company has advised against its use for tasks that require precise scientific, medical, or sensitive content analysis due to its limitations and inconsistencies.

Source- Indian Express

More Related Current Affairs View All

17 Sep

Reasons Behind the heavy rain in Uttarakhand, Himachal

'Dehradun and several other districts in Uttarakhand have experienced very heavy rainfall over the past few days, triggering landslides in multiple areas and causing rivers to swel

08 Sep

Rajasthan’s coaching centre Bill

'The Rajasthan Coaching Centres (Control and Regulation) Bill, 2025, is a significant piece of legislation passed by the Rajasthan Assembly to regulate and oversee the state's burg

28 Aug

IADT-1

'Recently, the Indian Space Research Organisation (ISRO) successfully carried out its first Integrated Air Drop Test (IADT-1), a crucial milestone in the preparation for the countr

Learn bits

Mahesh

OpenAI’s GPT-4 Vision

More Related Current Affairs View All

Reasons Behind the heavy rain in Uttarakhand, Himachal

Rajasthan’s coaching centre Bill

IADT-1

India’s First Ai-Driven Magazine Generator

Generate Your Custom Current Affairs Magazine using our AI in just 3 steps