Skip to main content

MiniGPT-4: A free image-to-text AI tool you can try out today

ChatGPT is great, but right now, it’s limited to just text — text in, text out. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

OpenAI has yet to release this feature, however, which is where MiniGPT-4 comes in. This open source project gives us a preview of what the image processing in GPT-4 might be like — and it’s pretty neat.

What is MiniGPT-4?

Image used with permission by copyright holder

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system. Some examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites just from drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4. It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology. It’s also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It’s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

How to use MiniGPT-4

MiniGPT-4 is just a demo and is still in its first version. For now, it can be accessed for free at the group’s official website. To use it, just drag an image in or click “Drop Image Here.” Once it’s uploaded, type your prompt into the search box.

What kinds of things should you try out? Well, asking MiniGPT-4 to describe an image is simple enough. But maybe you need some copy for an Instagram post for your company. Or maybe you want to knoe the ingredients needed for an interesting dish, and even a recipe for how to cook it. MiniGPT-4 can handle these tasks surprisingly well.

The coding aspects are a bit more rough around the edges. Turning a simple napkin drawing into a functioning website was a trick shown off by OpenAI when GPT-4 was first announced. But MiniGPT-4 doesn’t seem to be able to handle that quite as well just yet. ChatGPT will provide more accurate code — in fact, running whatever the MiniGPT-4 code is through ChatGPT or GPT-4 will net you better results.

One thing to note is that MiniGPT-4 does use your local system’s GPU. So, unless you have a fairly powerful discrete GPU, you may find the experience fairly slow. For context, I tried it out on a M2 Max MacBook Pro, and it took around 30 seconds to generate text based on an image I uploaded.

Limitations of MiniGPT-4

The speed of MiniGPT-4 is certainly a limitation. If you’re trying to access this without some decent graphics, it’s too slow to feel responsive. If you’re used to the speed of cloud-based ChatGPT or even Bing Image Creator, MiniGPT-4 is going to feel painfully slow.

Beyond that, MiniGPT-4 has all the same limitations that ChatGPT or Google Bard or any other AI chatbot in that it can “hallucinate” or make up information.

Editors' Recommendations

Luke Larsen
Luke Larsen is the Senior editor of computing, managing all content covering laptops, monitors, PC hardware, Macs, and more.
ChatGPT AI chatbot can now be used without an account
The ChatGPT website on a laptop's screen as the laptop sits on a counter in front of a black background.

ChatGPT, the AI-powered chatbot that went viral at the start of last year and kicked off a wave of interest in generative AI tools, no longer requires an account to use.

Its creator, OpenAI, launched a webpage on Monday that lets you begin a conversation with the chatbot without having to sign up or log in first.

Read more
OpenAI needs just 15 seconds of audio for its AI to clone a voice
A laptop screen shows the home page for ChatGPT, OpenAI's artificial intelligence chatbot.

In recent years, the listening time required by a piece of AI to clone someone’s voice has been getting shorter and shorter.

It used to be minutes, now it’s just seconds.

Read more
How much does an AI supercomputer cost? Try $100 billion
A Microsoft datacenter.

It looks like OpenAI's ChatGPT and Sora, among other projects, are about to get a lot more juice. According to a new report shared by The Information, Microsoft and OpenAI are working on a new data center project, one part of which will be a massive AI supercomputer dubbed "Stargate." Microsoft is said to be footing the bill, and the cost is astronomical as the name of the supercomputer suggests -- the whole project might cost over $100 billion.

Spending over $100 billion on anything is mind-blowing, but when put into perspective, the price truly shows just how big a venture this might be: The Information claims that the new Microsoft and OpenAI joint project might cost a whopping 100 times more than some of the largest data centers currently in operation.

Read more