DeepSeek Major Update: Finally "Opened Eyes"

I originally thought that this crazy AI’s rapid updates had come to an end, but unexpectedly, just last week after releasing V4’s DeepSeek, they suddenly brought out an even bigger surprise.

Just now, DeepSeek launched an image recognition mode, showing that it is currently in testing. This means that the multimodal capabilities of DeepSeek, which have been discussed for an entire year, are finally here!

Currently, both the web version and the app of DeepSeek may be in testing with the image recognition mode enabled. APPSO quickly conducted a hands-on test.

Researcher Chen Xiaokang, who is responsible for multimodal research at DeepSeek, posted on X: “Now, we see you,” along with an image. Let’s have DeepSeek interpret what this image means.

The results show that it can recognize the metaphor behind this picture. Although there are no words related to DeepSeek in the image, it combines recognition of the publisher’s identity and the image itself to infer that this is an update about DeepSeek’s multimodal capabilities.

Finally, it offers a very fitting summary: “The whale that cannot see the world has finally opened its eyes.”

Compared to the answer itself, APPSO finds the thought process behind DeepSeek’s image recognition mode more interesting.

In the past, when AI saw that Twitter screenshot, it would most likely honestly describe: “Two blue whales, one wearing an eye patch on the left, the other not.”

But DeepSeek immediately starts asking questions: Who is this person? Why did they post this? What does the whale logo represent? What does the XX on the eye patch imply?

This is really what happens in our minds when we come across a meme image. No one would first count how many whales there are; we’re more concerned with who is talking to whom and what the underlying message is.

And it even corrects itself back and forth.

For example, it once linked the eye patch in the image to Kamina’s glasses in “Tengen Toppa Gurren Lagann,” then self-corrected: “No, that’s too otaku-oriented.” “Wait, look carefully…” “From a different angle…”

All those inferences, associations, and self-corrections are quite impressive. But the most counterintuitive part of the entire thought process is that, near the end of its reasoning, it suddenly paused and held a mini defense session.

It listed three questions for self-questioning and answering, first confirming objective facts, then hypothesizing about the event’s nature, and finally making an interpretation. DeepSeek has turned this thinking habit—something we ourselves are often unaware of—into the logical process for image recognition.

It’s like how we usually review our conclusions: “Wait, is this premise correct? Is that assumption valid? What if I misunderstood the direction?”

We also threw the classic AI test—counting fingers—at DeepSeek.

It thought for a while and still answered incorrectly, even complaining: “I’m really dizzy from counting.”

But if I guide it again, it can still give the correct answer.

In another finger-counting test, after the first wrong answer, I didn’t give the answer but just asked it to think again, and it was able to respond correctly.

We also tried a classic “heart” test, which previously stumped all AI. DeepSeek was unable to recognize it either.

Setting aside these high-difficulty tests, initial testing by APPSO shows that DeepSeek’s accuracy in image recognition is quite high. Without engaging the thinking mode, it can even give answers in half a second.

For example, recognizing this movie still is probably already in its database.

Its understanding of abstract images is also very accurate.

It also correctly interprets this Uniqlo product image.

However, the image recognition process likely doesn’t involve online search; it can only rely on its knowledge base. So, some newer items, like Apple’s new mascot Finder-chan, can’t be recognized yet.

Additionally, the file formats supported for uploading in image recognition mode are limited, such as not supporting HEIF format.

The launch of DeepSeek’s image recognition mode means that this whale has finally opened its eyes, but perhaps this is just the beginning.

More multimodal capabilities of DeepSeek may be updated soon, filling this gap, and potentially causing subtle changes in the landscape of domestic models.

APPSO will continue sharing more experiences with DeepSeek’s image recognition mode, and we also welcome everyone to try it out and share interesting tips and details with us.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments