The 'Botsitter' Economy: What Glean's Productivity Data Actually Shows

A vendor-funded report says workers now spend nearly a full day each week supervising AI tools. The number is eye-catching, but the methodology and the incentives behind it deserve a closer read before anyone treats babysitting chatbots as the new normal.

A new report from Glean, the enterprise AI search company, claims that knowledge workers now spend close to a full working day every week supervising AI systems. The figure has been circulating under a tidy label, the "botsitter," describing employees who spend their time checking, correcting, and re-prompting the AI tools their companies bought to save them time. Business Insider's coverage frames this as a workplace trend worth watching. The trend is real enough, but the framing rewards a skeptical reading.

What's claimed

The headline number is roughly eight hours per week, a full working day, spent on what the report calls AI oversight. That covers reviewing model outputs, rewriting prompts that didn't land, fact-checking generated text, and cleaning up after tools that produced something plausible but wrong. The implication is that the productivity dividend companies expected from generative AI is being partially eaten by the labor required to keep that AI on the rails.

Glean's narrative is that this overhead exists because most enterprises bolted general-purpose chatbots onto workflows without giving the models access to internal, authoritative data. A model that can't see your company's documents will confidently invent answers, and someone has to catch those inventions. That someone is the botsitter.

What's actually new

The underlying observation is not new to anyone who has shipped an LLM feature. Verification cost is the central tax of generative AI. A model that is right 90 percent of the time still forces a human to check 100 percent of the output, because you cannot know in advance which tenth is wrong. The economics of that have been discussed since the first wave of GPT-4-era deployments in 2023 and 2024.

What is genuinely useful here is the attempt to put a number on it at the level of an individual worker's week. Most enterprise AI ROI studies measure tokens consumed, seats licensed, or self-reported time saved. Far fewer measure the time spent undoing or supervising the work. If the eight-hour figure holds up, it reframes the standard pitch. The relevant question stops being "how much time does AI save" and becomes "how much time does AI save, net of the time spent supervising it."

That net framing matters because the two effects do not cancel cleanly. A tool can genuinely accelerate a first draft while still demanding heavy review, leaving a worker faster on output but no less occupied. Several studies through 2025, including work on developer productivity and on writing tasks, found exactly this pattern: faster completion paired with persistent verification load, and in some cases a measurable drop in the user's own retained skill.

Limitations

Start with who produced the report. Glean sells a product whose entire premise is that grounding AI in your company's internal data reduces hallucination and therefore reduces supervision overhead. A report concluding that workers waste a day a week supervising ungrounded AI is, conveniently, a report that the vendor's product is positioned to fix. That does not make the finding false, but it means the framing was chosen by a party with a commercial interest in the conclusion.

The methodology also deserves scrutiny that the coverage does not provide. "Supervising AI" is a soft category. Is rewriting a prompt supervision, or is it just using the tool? Is reading an AI-generated summary before sending it oversight, or ordinary editing that a worker would have done with any draft? Self-reported time estimates are notoriously unreliable, and a survey that primes respondents with the idea that AI creates oversight work will tend to surface oversight work. Without the sample size, the question wording, and the definition of supervision, the eight-hour figure is a talking point, not a measurement.

There is also a baseline problem. Workers have always spent time checking other people's output: reviewing junior colleagues' drafts, verifying data from another department, editing copy. Some portion of "botsitting" is verification that simply moved from a human source to a machine source, not new work created out of nothing. The honest comparison is supervision time before and after the tool, not supervision time measured in isolation.

What changes

For anyone deploying these tools, the practical takeaways are narrower than the headline. The cost of an AI feature includes the human verification it triggers, and that cost should be measured directly rather than assumed away. Grounding models in authoritative internal data does reduce hallucination, which is the legitimate kernel inside Glean's pitch, and retrieval-augmented systems have repeatedly outperformed bare chatbots on factual enterprise tasks. The tasks where AI pays off cleanly are the ones where verification is cheap, where a wrong answer is obvious at a glance or carries low stakes. The tasks where it disappoints are the ones where checking the output costs nearly as much as producing it would have.

The "botsitter" label will travel because it is catchy and it flatters a feeling many workers already have. The data underneath it is thinner than the label suggests. Treat the eight-hour number as a hypothesis worth testing inside your own organization, with your own time-tracking, rather than as an established fact imported from a vendor's marketing. The interesting work is measuring the verification tax honestly, and that is exactly the measurement most AI rollouts still skip.