A cheap way to compare LLM knowledge
I stumbled upon this by chance.
I was trying to get subcategories for some categories. I prompted different LLMs to generate subcategories for these categories. Particularly, I asked for exhaustive lists of subcategories in json format.
The number and kind of subcategories for each category given by each model gives a great insight into several things:
- how each model categorizes different topics
- how much emphasis it gives to each category over others (as a first approximation, comparing the number of subcategories for each category gives a sense of how deeply it has knowledge of different categories).
- some models give rich information, but do not structure it as expected.
I personally loved the AI2 list for my purpose (though it didn't give me the direct json output I asked for)
Here are the chats with each model I tried:
- Grok/X think
- Claude 3.7 sonnet with extended thinking
- ChatGPT o3-mini-high with search
- Perplexity with reasoning and all sources
- Gemini Pro Deep Research
- AI2 ScholarQA
This was my prompt:
I have a list of categories (shown below). Give me an exhaustive (Important: must be exhaustive!) list of subcategories for each category in json format with the following aspects:
name = name of the category
subcategories = exhaustive list of subcategories
List of categories:
Current
Artificial Intelligence
Science
Health
Career
Tech
Business
Finance
Energy
Ethics
Philosophy
Entertainment
Art
Law
Government
Sports
This technique can be used with other kinds of prompts (beyond categories) as well.