A daily pipeline pulls Hacker News posts, uses Gemini to spot mentions of coding models from OpenRouter, scores sentiment per comment, and logs the results to a public Google Sheet for transparency and further analysis.
Overview
The HN SOTA project monitors how often specific large language models appear in Hacker News discussions about coding. It aggregates mentions and sentiment over a ten-day window to give a snapshot of community interest.
Data collection
Each day the pipeline fetches the two hundred most popular stories from the Hacker News API within the last twenty-four hours. It then passes titles and comments to a Gemini model that filters for posts dealing with LLMs or general programming topics.
Model extraction and sentiment scoring
To isolate relevant discussions the system keeps at most fifty posts that the Gemini filter flags as related. For each retained post the title and all its comments are sent again to Gemini, which looks for any model names from the OpenRouter list and assigns a sentiment score (positive, neutral, negative) per comment.
Audit trail
Results are written to a public Google Sheet where every row records the comment ID, the model mentioned, and the sentiment decided by the model. Clicking a comment ID opens the original discussion at https://news.ycombinator.com/item?id= followed by the ID, allowing anyone to verify the source.
Top ten snapshot
The aggregated top ten list shows total mentions and the net sentiment balance for each model across the ten-day period from April 23 to May 1 2026. Scale bars are normalized to the highest count so readers can compare relative popularity at a glance.
Practical use
Developers can check which models are gaining traction in real-time conversations, while researchers may study how sentiment shifts after a new release or benchmark. The sheet also highlights models that receive little discussion despite strong technical performance, pointing to possible gaps between performance and community awareness.
Limitations and next steps
The approach relies on the accuracy of the Gemini filter and sentiment model, so occasional misclassifications may occur. Future work could incorporate additional signals such as up-vote counts or integrate alternative model directories to broaden coverage.
Explore the data
The full dataset is available in the linked Google Sheet, where users can sort by date, model, or sentiment to dig deeper into the trends. Feel free to copy the sheet for your own analysis or to build on the pipeline for other forums.
Comments
Please log in or register to join the discussion