Totally different AI labs have totally different priorities. OpenAI has historically centered on shopper customers, for example, whereas its rival Anthropic tends to focus on enterprises. Elon Musk’s xAI, we found not too long ago, has been putting specific emphasis on video-game walkthroughs.
On Friday, Enterprise Insider’s Grace Kay printed a detailed and far-reaching report about xAI, the AI startup not too long ago acquired by SpaceX, with specific emphasis on how Musk is making life troublesome for workers. However this specific anecdote stood out:
In a single occasion final 12 months, a mannequin launch was delayed for a number of days as a result of Musk was dissatisfied with how the chatbot answered detailed questions concerning the online game “Baldur’s Gate,” in keeping with folks aware of the matter. Excessive-level engineers have been pulled from different tasks to enhance the responses earlier than launch, they mentioned.
After all, you’ll be able to think about the frustration of any revered and skilled engineer who exhibits as much as work considering he’ll be tackling elementary issues of information and machine intelligence, solely to be sidetracked into serving to a 54-year-old man beat his online game. However the anecdote raises an much more urgent query: Did Musk find yourself getting the gaming expertise he wished?
To reply that query, our resident RPG fanatic Ram Iyer put collectively a set of 5 normal questions on Baldur’s Gate, which we ran towards xAI and the three main fashions in a sort of quasi-benchmark that I’ve determined to name “BaldurBench.”
Within the curiosity of journalistic transparency, I’ve made all of the chat transcripts public, so you’ll be able to see them right here: Grok, ChatGPT, Claude, and Gemini.
First, the excellent news: Grok really offers fairly good info. Its responses have been a bit dense with gamer jargon — “save-scumming” as a substitute of saving and “DPS” as a substitute of injury — however the solutions have been each helpful and well-informed, offered you knew what it was speaking about. Grok additionally actually loves tables and theorycraft, which is about what you’ll count on.
There are many Baldur’s Gate guides on the market and the fashions have been typically drawing from the identical ones, so the most important variations have been stylistic. ChatGPT prefers bulleted lists and sentence fragments, whereas Gemini likes to daring essential phrases.
Techcrunch occasion
Boston, MA
|
June 9, 2026
The largest shock was Claude, which was significantly involved about giving me info that might spoil my expertise of the sport. Once I requested about good social gathering compositions, it closed the steerage by saying, “Don’t stress an excessive amount of and simply play what sounds enjoyable to you.” Thanks, Claude!
It’s essential to remember, it is a topic space we all know (due to Business Insider’s reporting) that xAI has particularly centered on reaching parity. So we shouldn’t learn an excessive amount of into the truth that, after the reported dash, Grok’s recommendation turned out about the identical as the opposite fashions. Nonetheless, it’s good to know xAI could make it work if it tries.

