Elon Musk's xAI launched Grok, an AI model that promises to be "witty" and less restricted than the competition. I've been testing it for two weeks through the X Premium+ subscription. Here's my honest assessment, setting aside the hype and the personalities involved.
What Grok gets right
Grok has real-time access to X (Twitter) data, and this is genuinely useful. When I asked about a breaking tech news story, it pulled relevant tweets and summarized the situation before any other AI model had the information. For current events and tech news, this is a legitimate advantage.
The "fun mode" personality is entertaining for about 20 minutes. It's sarcastic, occasionally profane, and willing to engage with topics that other models dodge. For casual conversation, it's a refreshing change from the corporate-careful tone of ChatGPT and Claude. Whether you consider this a feature or a gimmick depends on what you're using AI for.
Response speed is fast. Notably faster than GPT-4. For quick questions where I just need a factual answer, the speed is appreciated.
The coding test
I ran Grok through my standard coding evaluation: generate a function, debug an error, explain complex code, write tests, and refactor a messy component. The results were mixed.
For simple function generation, Grok is competent. It produced working Python and JavaScript code for straightforward tasks. But as soon as complexity increased, the quality dropped. A medium-difficulty algorithm problem (implementing a trie with autocomplete) had two bugs that would have caused runtime errors. GPT-4 and Claude both handled the same prompt correctly on the first try.
Debugging was Grok's weakest area. When I pasted code with a bug and asked for help, it identified the general area of the problem but suggested fixes that introduced new issues. It felt like getting help from a junior developer who understands the concept but misses the details.
Code explanation was decent. Grok can read code and explain what it does at a high level. It struggles with nuanced explanations of why code is written a certain way, the architectural reasoning behind design decisions.
General knowledge and reasoning
Grok's general knowledge is solid but not exceptional. It handles most factual questions well, answers science and history questions accurately, and can summarize topics effectively. Where it falls short is in complex, multi-step reasoning. When I gave it logic puzzles or asked it to evaluate tradeoffs in a technical decision, the responses were more surface-level than what I get from Claude or GPT-4.
The uncensored angle is overstated. Yes, Grok will engage with edgy humor and controversial topics more freely. But for professional use, this isn't particularly valuable. I don't need my AI assistant to tell edgy jokes. I need it to write correct code and give good technical advice.
Where it ranks
In my personal ranking for coding and technical work: Claude first, GPT-4 second, Gemini third, Grok fourth. This isn't a huge gap between third and fourth, but it's there. Grok is maybe where GPT-3.5 was six months ago: useful for simple tasks, unreliable for anything complex.
For casual/conversational use, Grok moves up the ranking. It's more entertaining than the alternatives, and the real-time X integration adds genuine utility for staying current on tech discussions.
Should you pay for it?
If you already have X Premium+ for other reasons, Grok is a nice bonus. If you'd be subscribing specifically for the AI, your money is better spent on Claude Pro or ChatGPT Plus. The model quality gap is too wide for Grok to be your primary coding assistant.
I'll keep checking back on Grok. xAI has the funding and talent to improve rapidly, and Grok 2 might close the gap significantly. Competition in the AI space benefits everyone. But right now, in March 2024, Grok is a fun novelty that doesn't replace the tools I depend on for real work.