Long conversations with LLMs via language-as-code
This is a detailed response to a question by @karpathy on X:
When working with LLMs I am used to starting "New Conversation" for each request.
— Andrej Karpathy (@karpathy) March 20, 2025
But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between…
The best shot at a good outcome from an LLM is to write actual code as a prompt and ask to print the result. This can be arbitrarily long. This will do best when an LLM that has been pre-trained on similar code, and even better if it has access to a code sandbox.
def add(a, b):
return a+b
add(1, 2)
# Act as a python interpreter
# Your output:
Most llms get this kind of simple thing right. ChatGPT 4o chat
If we must do something complex involving natural language, the best shot is to use 'language as fake code', in pseudocode or python like code that then asks for the continuation or print the result. This can also be arbitrarily long, assuming the code is well formed.
class TechStack():
test_db: str # sqlite | postgres | supabase | mongodb
test_db_in_memory: bool
prod_db: str # sqlite | postgres | supabase | mongodb
test_db_config: dict
prod_db_config: dict
migration_framework: # alembic | supabase | ...
test_db_env_vars: str
remote_db_env_vars: str
test_db_docker_config: str
remote_db_docker_config: str
type_system: str # pydantic.BaseModel | sqlmodel.SQLModel | ...
orm: str # sqlalchemy | attrs | ...
auth: str # fastapi_auth | supabase_auth | ...
oauth: List[str] # apple | github | google | ...
backend_framework: str # fastapi|flask|nodejs|...
frontend_framework: str # React | Angular | Reflex | ...
frontend_styling_framework: str # css | tailwindcss | ...
frontend_script_language: str # typescript | javascript | ...
def get_best_tech_stack_for_web_app(environment: str) -> TechStack:
"""Get the best tech stack for a web app based on current knowledge for a web app"""
# ...
pass
get_best_tech_stack_for_web_app(environment="macos")
#act as a python implementation for the get_best_tech_stack_for_web_app method and return the result with an explanation as comments.
#Your output:
For example, this is what grok gave me for this prompt:
from typing import List
class TechStack:
test_db: str # sqlite | postgres | supabase | mongodb
test_db_in_memory: bool
prod_db: str # sqlite | postgres | supabase | mongodb
test_db_config: dict
prod_db_config: dict
migration_framework: str # alembic | supabase | ...
test_db_env_vars: str
remote_db_env_vars: str
test_db_docker_config: str
remote_db_docker_config: str
type_system: str # pydantic.BaseModel | sqlmodel.SQLModel | ...
orm: str # sqlalchemy | attrs | ...
auth: str # fastapi_auth | supabase_auth | ...
oauth: List[str] # apple | github | google | ...
backend_framework: str # fastapi|flask|nodejs|...
frontend_framework: str # React | Angular | Reflex | ...
frontend_styling_framework: str # css | tailwindcss | ...
frontend_script_language: str # typescript | javascript | ...
def get_best_tech_stack_for_web_app(environment: str) -> TechStack:
"""Get the best tech stack for a web app based on current knowledge for a web app"""
tech_stack = TechStack()
# Database choices
# SQLite for testing due to its simplicity and in-memory capability on macOS
tech_stack.test_db = "sqlite"
tech_stack.test_db_in_memory = True # Faster testing with in-memory DB
# PostgreSQL for production - robust, widely supported, great macOS compatibility
tech_stack.prod_db = "postgres"
# Basic config dictionaries
tech_stack.test_db_config = {"uri": "sqlite:///:memory:"}
tech_stack.prod_db_config = {
"uri": "postgresql://user:password@localhost:5432/dbname",
"pool_size": 20
}
# Migration framework
# Alembic works well with SQLAlchemy and PostgreSQL, macOS-friendly
tech_stack.migration_framework = "alembic"
# Environment variables (simple strings for demo)
tech_stack.test_db_env_vars = "TEST_DB_URL=sqlite:///:memory:"
tech_stack.remote_db_env_vars = "DB_URL=postgresql://user:password@localhost:5432/dbname"
# Docker configs as simple strings
# SQLite doesn't need Docker for testing, using host macOS environment
tech_stack.test_db_docker_config = ""
tech_stack.remote_db_docker_config = """
postgres:
image: postgres:15
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: password
POSTGRES_DB: dbname
ports:
- "5432:5432"
"""
# Type system and ORM
# SQLModel combines Pydantic and SQLAlchemy, great for type safety on macOS dev
tech_stack.type_system = "sqlmodel.SQLModel"
tech_stack.orm = "sqlalchemy" # SQLModel uses SQLAlchemy under the hood
# Authentication
# FastAPI Users is lightweight, integrates well with SQLAlchemy on macOS
tech_stack.auth = "fastapi_auth"
tech_stack.oauth = ["github", "google"] # Common providers, well-supported
# Frameworks
# FastAPI: Modern, async, great Python ecosystem support on macOS
tech_stack.backend_framework = "fastapi"
# React: Widely used, great tooling support on macOS via Homebrew/Node
tech_stack.frontend_framework = "React"
# Frontend styling and scripting
# TailwindCSS: Modern, utility-first, excellent macOS dev experience
tech_stack.frontend_styling_framework = "tailwindcss"
# TypeScript: Better tooling and type safety, strong macOS support
tech_stack.frontend_script_language = "typescript"
return tech_stack
# Execute and get the result
result = get_best_tech_stack_for_web_app(environment="macos")
Full Grok chat response.
I think learning to draft natural language prompts as code-like is invaluable.
If we want to do natural language without the above prompt as code-like approach, then structured outputs will help (a bit of a pain to get right with some reasoning models) enable longer conversations, since half the conversation will be structured.
If we want natural language both ways, we are much better off using short prompts and conversations (the instructional part of the prompt should be short. Long contexts are ok, if clearly demarcated with a uniform marker. Placement of long contexts within the prompt can be a pain to get right - my guess is different models prefer different locations (top, bottom).
It also depends on whether we want to automate something downstream with the response. Automating is often easier with natural language as code, though rarely not.
I suspect (haven't tried) that we can hold a long conversation with language-as-code prompts.