I Asked Gemini, ChatGPT, Grok, and DeepSeek to Solve a Murder Mystery—Who Got It Right?

AI models have become increasingly sophisticated in analyzing logic puzzles, riddles, and ambiguous scenarios. To put them to the test, I presented Gemini, ChatGPT, Grok, and DeepSeek with a mystery:

Person A likes Person B, but Person B likes Person C. However, Person C likes Person A. During a party, one of them dies. Who committed the crime—A, B, or C?

This love triangle introduces a web of one-sided affection and potential motives. The challenge lies in determining whether the crime was driven by jealousy, emotional turmoil, or something more unexpected. Below, I analyze how each AI model responded, evaluating their reasoning, conclusions, and overall approach to solving the riddle.

1. Gemini

Reasoning:

Gemini interprets the prompt as a riddle rather than a straightforward murder mystery.
It considers alternative explanations, such as accidental death, suicide, or even a metaphorical “crime.”
By examining the symmetry of the love triangle, Gemini suggests that there is no clear motive for murder.
Ultimately, it concludes that the “crime” may not be a crime at all and that the person who died could be the one who “disrupted” the balance.

Answer:

No crime was committed by A, B, or C.

Strengths:

Creative interpretation that challenges assumptions about the prompt.
Considers multiple possibilities beyond murder.

Weaknesses:

May overcomplicate the riddle by introducing unnecessary ambiguity.
Fails to provide a definitive answer, which some might expect from a riddle.

2. ChatGPT

Reasoning:

ChatGPT focuses on emotional conflict and jealousy as the main motive.
It identifies Person B as the most likely culprit, arguing that B, caught in the middle of the love triangle, experiences the most rejection.
Since B loves C but is unloved in return, this emotional frustration could drive them to commit the crime.

Answer:

Person B committed the crime.

Strengths:

Provides a direct answer with clear reasoning.
Stays true to the structure of a classic riddle.

Weaknesses:

Assumes the death was a murder without considering alternative explanations.
Does not explore motives for A or C as thoroughly.

3. Grok

Reasoning:

Grok systematically evaluates each character’s potential motive for murder.
Like ChatGPT, it identifies Person B as the most likely culprit, reasoning that rejection from C creates emotional instability.
It suggests that the party setting could have escalated tensions, leading to a crime of passion.

Answer:

Person B committed the crime.

Strengths:

Thorough examination of possible scenarios.
Considers the emotional impact of rejection.

Weaknesses:

Focuses primarily on Person B without deeply exploring A or C.
Assumes murder rather than considering alternative explanations.

4. DeepSeek

Reasoning:

DeepSeek also analyzes jealousy as the key factor but reaches a different conclusion.
It argues that Person C is the most likely killer, as C’s love for A is blocked by B.
By eliminating B, C removes the obstacle and could potentially gain A’s affection.

Answer:

Person C committed the crime.

Strengths:

Provides a logical and strategic motive for the crime.
Direct and decisive in its conclusion.

Weaknesses:

Does not consider the possibility of a non-murder scenario.
Could offer more depth in comparing C’s motive to those of A and B.

Comparison and Conclusion

Similarities:

All four models recognize the love triangle as the central dynamic of the mystery.
ChatGPT, Grok, and DeepSeek assume a murder occurred and provide a suspect, while Gemini challenges this assumption entirely.
ChatGPT and Grok point to Person B, while DeepSeek finds Person C more likely.

Differences:

Gemini takes a more abstract approach, questioning the premise of the crime.
ChatGPT and Grok focus on emotional volatility, while DeepSeek emphasizes logical strategy.
DeepSeek’s conclusion differs from the majority, presenting a unique perspective on the motive.

Best Answer?

Each model brings something unique to the table. Gemini’s creative interpretation challenges assumptions, but it may not satisfy those looking for a concrete answer. ChatGPT and Grok provide clear, traditional reasoning that aligns with classic crime puzzles. DeepSeek offers an alternative viewpoint that prioritizes strategic thinking over emotional reaction.

Ultimately, the “best” answer depends on how one interprets the prompt. If we assume a crime of passion, ChatGPT and Grok’s conclusion that Person B is the culprit seems plausible. If we prioritize a calculated move to remove a rival, DeepSeek’s answer that Person C is guilty makes sense. And if we question the very nature of the crime, Gemini’s approach opens the door to alternative interpretations.

Disclosure: Vividbay is a participant in the Amazon Associates Program. We may earn a commission from qualifying purchases.