App Comparisons13 min read

ChatGPT vs Claude vs Gemini: 5 AI Chat Apps Ranked (2026)

By Unstar · Editorial Team

1-3 star analysis of 5 AI chat apps: ChatGPT, Claude, Gemini, Perplexity, Microsoft Copilot. Hallucinations, paywall walls, login limits, and what users actually complain about in 2026.

AI chat apps moved from novelty to default utility in 2026. Most users now have at least two installed: a primary general assistant and a secondary tool for code, research, or image work. The category exploded so fast that store ratings rarely reflect what daily users complain about, and the marketing pages talk about benchmarks, context windows, and reasoning depth instead of the friction users actually run into on the phone.

We pulled 1-3 star reviews across the 5 most-installed AI chat apps in iOS and Google Play during early 2026. The complaints repeat across apps with surprising consistency: hallucinations on questions where the user knows the answer, paywall walls disguised as feature limits, message-cap surprises that hit mid-task, and refusals that read as over-cautious or bureaucratic. The differences between apps are real but smaller than the marketing suggests.

This post focuses on consumer chat apps (general assistants and search-style chat). It does not cover developer-only API tools, image-only generators (Midjourney, DALL-E web), or character roleplay apps. For an angle on using AI to analyze app reviews specifically see our ChatGPT vs Claude for App Review Analysis comparison.

Apps Analyzed

  • ChatGPT (OpenAI): the category leader by install base, free tier with GPT-4-class model and message caps, Plus tier ($20/mo) for higher caps and image generation, Pro tier for power features, voice mode and image input across tiers
  • Claude (Anthropic): strong on long-context tasks and writing, free tier with daily message limits, Pro tier ($20/mo) for higher caps and Projects, web and iOS apps, mobile parity with web
  • Gemini (Google): integrated with Google account and Workspace, free tier on top of the standard Google AI model with paid Advanced tier, Android-default through Assistant integration
  • Perplexity: answer-engine framing rather than chat, citation-first output, free tier with daily Pro searches, Pro tier ($20/mo) for unlimited Pro searches and model selection
  • Microsoft Copilot: Bing-integrated chat with web search, free tier and Copilot Pro, deep integration with Microsoft 365 on iOS and Android, image generation via DALL-E

Top Complaints Across All AI Chat Apps

These percentages reflect complaint frequency in our 1-3 star sample across all 5 apps. AI chat complaints concentrate around the moments where the model gave a confidently wrong answer, the app blocked a feature behind a paywall, or the conversation lost track of context the user expected it to keep.

1. Hallucinations and Confident Wrong Answers (22%)

The single most common complaint across every AI chat app is the model returning a wrong answer with full confidence. Users describe asking factual questions where they already know the answer (testing the app) and getting plausible-sounding but incorrect responses. The complaint is not that the model is sometimes wrong, it is that the model never signals uncertainty.

  • "ChatGPT confidently invented a paper that does not exist": the canonical hallucination complaint
  • "Claude told me the wrong release date for a movie I directed": confidence on facts the user can check
  • "Gemini cited a Wikipedia link that loaded a 404": invented citations are a frequent theme
  • "Perplexity citation footer linked to a real page that did not contain the claim": even citation-first apps get this

2. Paywall Walls and Feature Lock-Out (18%)

The free tier either blocks features users expected (image generation, model selection, file upload) or hits message caps mid-conversation. Reviews describe this as bait-and-switch when the app pitch implies a feature works and the actual flow stops with an upsell.

  • "ChatGPT free hit message cap mid-task, lost my project":
  • "Gemini Advanced gated the model I needed for a question":
  • "Perplexity Pro searches ran out before I finished researching":
  • "Copilot free model is not the model the marketing showed":

3. Login Limits and Anonymous Cap Friction (14%)

Free or anonymous use hits message caps fast, and the login flow is described as friction-heavy. Reviews mention being asked to sign in for the second message, having to re-authenticate every few days, and losing chat history when switching devices without an account.

  • "ChatGPT anonymous limit is one question, then forced login":
  • "Claude logged me out twice this week, lost the project context":
  • "Gemini sign-in loop on Android, would not stay signed in":
  • "Copilot wanted Microsoft account before I could use the free tier":

4. Context Loss and Memory Failures (12%)

The conversation forgets earlier messages within the same session, or the app drops context between sessions. Users describe attaching a file, asking three questions, and finding the model lost track of what was attached. Memory features that promise persistence are described as inconsistent.

  • "ChatGPT memory feature kept old wrong info but forgot the new":
  • "Claude lost the long document I uploaded after 5 messages":
  • "Gemini does not remember what I said two messages ago":
  • "Perplexity follow-up does not actually follow up, asks again":

5. Refusals and Over-Caution (11%)

Users hit refusals on questions they consider reasonable: medical research, legal hypotheticals, technical questions about security, mature creative writing. Reviews describe the refusal pattern as bureaucratic and frustrating, especially when the app refuses to answer something the user can easily find on Google.

  • "ChatGPT refused to summarize a public news article about a crime":
  • "Claude added a 4-paragraph disclaimer to a recipe question":
  • "Gemini refused to translate a paragraph that mentioned alcohol":
  • "Copilot blocked a question I could answer with a Wikipedia search":

6. App Performance and Response Latency (9%)

Response time spikes during peak hours, the app freezes mid-stream, and reconnects drop the partial response. Users describe the iOS or Android app feeling slower than the web version, especially under cellular conditions.

  • "ChatGPT app froze halfway through a code answer, lost the rest":
  • "Claude on iOS slower than the same chat in Safari":
  • "Gemini long answers stop streaming and never recover":
  • "Copilot mobile app crashes when I switch to a long thread":

7. Voice Mode Quality and Latency (7%)

Voice features are a frequent complaint. Reviews describe voice transcription that misses words, voice replies that feel robotic, latency that breaks conversational flow, and the voice mode silently dropping connection.

  • "ChatGPT voice mode pauses for 4 seconds before every reply":
  • "Claude voice on iOS misheard most of my dictation":
  • "Gemini Assistant voice replaced the old Google Assistant and got worse":
  • "Copilot voice mode requires a separate flow than text chat":

8. Privacy and Data Training Concerns (4%)

A smaller but persistent complaint is data training. Users describe being unsure whether their conversations train the model, finding the opt-out toggle hard to locate, and seeing privacy policy updates that change defaults without notice. The complaint volume is smaller than expected because most users do not check the toggle, but the users who do leave detailed reviews.

  • "ChatGPT trained on my conversation by default, opt-out was buried":
  • "Gemini privacy update changed default to share with humans":
  • "Claude privacy is opt-in for training, but the language is unclear":

9. Subscription Cancellation Friction (3%)

Reviews complain about cancellation flows that require web account access, refund policies that vary by store (App Store vs Play Store handle this differently), and re-bills that hit despite a cancel attempt.

  • "ChatGPT Plus cancel was on web only, not in iOS app":
  • "Gemini Advanced bills through Google One and the cancel is buried":

Per-App Breakdown

ChatGPT

Negative review themes (in order of frequency):

  • Message cap surprises on the free tier. The free tier hits a cap after 4-6 messages on the higher model and silently downgrades to a smaller model, and reviews describe the downgrade as not-clearly-disclosed
  • Hallucinations on factual queries. ChatGPT is the most-reviewed app and the most-cited offender, partly because of install base. The pattern is confident wrong answers on dates, paper citations, and product specs
  • Memory feature inconsistency. The cross-session memory feature stores some facts and forgets others, and reviews describe the model citing outdated stored memory while forgetting the recent context
  • Voice mode latency. Voice replies have noticeable thinking pauses, and reviews compare unfavorably to the web demo videos
  • Plus tier value perception. Some reviews describe the $20/mo Plus tier as not worth the difference from free, especially when the free tier covers basic use cases

ChatGPT is the right pick for users who want the broadest feature set (image generation, voice, custom GPTs, file upload) and who can tolerate the message-cap and hallucination patterns. The complaints concentrate around free-tier friction, memory inconsistency, and voice latency.

Claude

Negative review themes:

  • Daily message cap on free tier hits mid-task. Free Claude has a per-day cap that resets at a fixed time, and reviews describe planning around the cap or hitting it during long writing sessions
  • Long context drift. Claude markets long context window strength, and reviews describe the model losing track of details from earlier in the document
  • Long disclaimer prefacing. Reviews describe Claude adding multi-paragraph disclaimers, ethics framing, or hedging before answers, and finding this slowed task completion
  • iOS app feature gap vs web. Claude on iOS is described as missing Projects, file management, and some keyboard shortcuts that work on web
  • Pro tier upgrade flow inside the app. Some reviews describe the Pro upgrade flow on iOS as routing through web, which broke the in-app purchase pattern

Claude is the right pick for users who prioritize long-document work, code review, and writing where the model's writing voice matters. The complaints concentrate around free-tier message limits, disclaimer length, and iOS app feature parity.

Gemini

Negative review themes:

  • Replaced Google Assistant on Android, regressed. Gemini took over from Google Assistant on Android, and reviews describe losing routines, smart home features, and quick voice queries that the old Assistant handled
  • Sign-in loop on Android. Reviews describe being signed out repeatedly, especially after Google account switches or password updates
  • Refusal frequency on routine queries. Gemini refuses queries that other apps answer (recipes mentioning alcohol, translation of news articles, mature creative writing), and reviews describe the refusal pattern as overcautious
  • Workspace integration is uneven. Some Workspace features work, others do not, and reviews describe finding the working set through trial and error
  • Advanced tier value vs Google One. Gemini Advanced is part of the Google One AI Premium plan, and reviews describe the bundle pricing as confusing relative to standalone competitors

Gemini is the right pick for users deep in Google ecosystem (Gmail, Drive, Calendar, Workspace) who want assistant integration. The complaints concentrate around the Assistant regression, sign-in friction, and refusal pattern.

Perplexity

Negative review themes:

  • Pro search count runs out unexpectedly. Free tier has a daily Pro search limit, and reviews describe hitting the limit during research sessions and dropping to base searches
  • Citations point to pages that do not contain the claim. Citation-first framing creates expectation that citations match the claim, and reviews describe pages cited that do not actually source the statement
  • Follow-up questions lose context. Despite the chat framing, follow-up questions sometimes treat the next question as a new search, and reviews describe re-typing context into each query
  • Spaces feature is hard to discover. Spaces (collections) feature is described as buried in the UI, and reviews mention discovering it months after install
  • Mobile keyboard issues. Reviews describe iOS keyboard quirks (autocorrect overriding queries, paste behavior) that do not happen on web

Perplexity is the right pick for users who treat AI as an answer engine and value source citations as a default. The complaints concentrate around Pro search limits, citation reliability, and chat-vs-search context.

Microsoft Copilot

Negative review themes:

  • Free model is not the marketing model. Reviews describe the free tier using a smaller model than the marketing implied, and finding the upgrade path required Copilot Pro plus a Microsoft 365 subscription for full access
  • Microsoft account requirement is friction. Reviews describe being unable to use the app without a Microsoft account, even for trial-style queries
  • Bing search integration adds noise. Copilot blends search results with chat answers, and reviews describe the blend as cluttered relative to a clean chat experience
  • Image generation has long queues. DALL-E image generation queue is described as slow during peak times, and reviews describe waiting 30-60 seconds per image
  • Outlook and Word integration is gated. Copilot in Outlook and Word requires the Copilot for Microsoft 365 license, and reviews describe expecting that integration in the consumer Copilot app and not finding it

Copilot is the right pick for users embedded in the Microsoft 365 ecosystem who want chat plus DALL-E image generation. The complaints concentrate around the model gap, account requirement, and Microsoft 365 license confusion.

AI Chat App Complaint Summary

AppWorst-rated complaintBest forAvoid if
ChatGPTCap surprises + hallucinations on factsBroadest feature set, image + voice + custom GPTsYou need consistent free access without caps
ClaudeDaily cap + long disclaimersLong-document work, writing, code reviewYou want short replies without prefacing
GeminiAssistant regression on Android + refusalsDeep Google ecosystem integrationYou miss the old Google Assistant feature set
PerplexityPro search cap + citation reliabilityAnswer-engine style with sourcesYou want pure chat without search framing
CopilotModel gap + Microsoft account requirementMicrosoft 365 users wanting integrated chatYou are not in the Microsoft ecosystem

What Each Pattern Tells You

A few patterns hold across the AI chat category and worth flagging before you commit:

  • Hallucinations are universal. Every app gets cited for confident wrong answers. Plan for verification of any factual claim, especially dates, citations, and specs. The question is not whether the app hallucinates, it is how the app signals uncertainty
  • Free tier message caps are deliberate friction toward upgrade. All 5 apps cap free use, the patterns differ. ChatGPT and Claude reset daily, Gemini varies by query type, Perplexity is per-day Pro searches, Copilot is per-session. Pick the cap pattern that matches your usage shape
  • Voice quality is improving but lags the web demos. Voice mode reviews are consistently below text mode reviews. If voice is your primary input, test on your actual device before committing to a subscription
  • Refusal pattern is the brand differentiator. ChatGPT, Claude, and Gemini have different refusal calibration. Test the apps on the kind of queries you actually run (research, code, creative, mature) to see which pattern matches your needs
  • Memory and context features are marketing-ahead-of-reality. Memory across sessions is inconsistent on every app. Plan to re-state context for important tasks, do not assume the app remembers

How to Pick Your AI Chat App in 2026

Match the app to your usage shape, not to the marketing benchmarks:

  • Decide whether you want chat or answer engine. ChatGPT, Claude, Gemini, Copilot are chat-first. Perplexity is answer-first with chat overlay. The choice shapes how you write queries
  • Read the most recent 1-3 star reviews on [Unstar.app](https://unstar.app) for each candidate app. Refusal pattern, latency regression, and feature changes show in reviews within days of an app update
  • Test the free tier on your real workload for one week. Count how many times you hit the cap, how often the app refuses, and how often the answer is verifiably wrong. Those three numbers decide value
  • Verify your ecosystem integration. Gemini for Google, Copilot for Microsoft 365, ChatGPT for cross-platform with most third-party integrations, Claude for standalone use, Perplexity for source-driven research
  • Plan for at least two apps installed. Most power users in 2026 use ChatGPT or Claude as primary and Perplexity as secondary for research with citations
  • Keep a verification habit. For factual queries (dates, citations, specs, code), assume the answer needs a check. The fastest way to use AI well is to combine it with one click of verification

Bottom Line

ChatGPT is the right pick for users who want the broadest feature set and the wrong pick for users who need consistent free access without cap surprises. Claude is the right pick for long-document work and writing where voice and tone matter and the wrong pick for users who want short, direct answers without long preamble. Gemini is the right pick for users deep in the Google ecosystem and the wrong pick for users who relied on Google Assistant features that did not survive the Gemini transition. Perplexity is the right pick for answer-engine research with citations and the wrong pick for users who want pure chat or hit Pro search caps. Copilot is the right pick for Microsoft 365 users and the wrong pick for users not in that ecosystem.

Before subscribing to any AI chat app, read the most recent 1-3 star reviews on Unstar.app for the specific app and your country and check for clusters around your specific use case (cap pattern, refusal frequency, voice quality, citation accuracy). Those clusters surface real failure modes weeks before they appear in store-rating averages.

Related reading: ChatGPT vs Claude for App Review Analysis compares the two leaders specifically for review-analysis workflows. AI-Powered App Review Analysis: From Hundreds of Complaints to Clear Action Items covers how to use AI on the review side. Subscription App Reviews: How to Reduce Cancellations covers the subscription mechanics that drive most of the paywall complaints in this analysis.

Methodology: All apps and review counts referenced are pulled live from App Store and Google Play APIs. Rankings update weekly. Specific reviews are direct user quotes (1-3 stars) with names masked. If you spot an error, email us.

Ready to analyze your app's negative reviews?

See what users really complain about: for free.

Try Unstar.app