Long conversations with LLMs via language-as-code

20 Mar, 2025

This is a detailed response to a question by @karpathy on X:

When working with LLMs I am used to starting "New Conversation" for each request.

But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between…
— Andrej Karpathy (@karpathy) March 20, 2025

The best shot at a good outcome from an LLM is to write actual code as a prompt and ask to print the result. This can be arbitrarily long. This will do best when an LLM that has been pre-trained on similar code, and even better if it has access to a code sandbox.

def add(a, b):
	return a+b

add(1, 2)
# Act as a python interpreter
# Your output:

Most llms get this kind of simple thing right. ChatGPT 4o chat

If we must do something complex involving natural language, the best shot is to use 'language as fake code', in pseudocode or python like code that then asks for the continuation or print the result. This can also be arbitrarily long, assuming the code is well formed.

class TechStack():
	test_db: str  # sqlite | postgres | supabase | mongodb
	test_db_in_memory: bool 
	prod_db: str  # sqlite | postgres | supabase | mongodb
	test_db_config: dict
	prod_db_config: dict
	migration_framework:  # alembic | supabase | ...
	test_db_env_vars: str
	remote_db_env_vars: str
	test_db_docker_config: str
	remote_db_docker_config: str
	
	type_system: str  # pydantic.BaseModel | sqlmodel.SQLModel | ...
	orm: str  # sqlalchemy | attrs | ...
	auth: str  # fastapi_auth | supabase_auth | ...
	oauth: List[str]  # apple | github | google | ...
	backend_framework: str  # fastapi|flask|nodejs|...
	frontend_framework: str # React | Angular | Reflex | ...
	frontend_styling_framework: str  # css | tailwindcss | ...
	frontend_script_language: str  # typescript | javascript | ...
	
def get_best_tech_stack_for_web_app(environment: str) -> TechStack:
	"""Get the best tech stack for a web app based on current knowledge for a web app"""
	# ...
	pass

get_best_tech_stack_for_web_app(environment="macos")
#act as a python implementation for the get_best_tech_stack_for_web_app method and return the result with an explanation as comments.
#Your output:

For example, this is what grok gave me for this prompt:

from typing import List

class TechStack:
    test_db: str  # sqlite | postgres | supabase | mongodb
    test_db_in_memory: bool 
    prod_db: str  # sqlite | postgres | supabase | mongodb
    test_db_config: dict
    prod_db_config: dict
    migration_framework: str  # alembic | supabase | ...
    test_db_env_vars: str
    remote_db_env_vars: str
    test_db_docker_config: str
    remote_db_docker_config: str
    
    type_system: str  # pydantic.BaseModel | sqlmodel.SQLModel | ...
    orm: str  # sqlalchemy | attrs | ...
    auth: str  # fastapi_auth | supabase_auth | ...
    oauth: List[str]  # apple | github | google | ...
    backend_framework: str  # fastapi|flask|nodejs|...
    frontend_framework: str # React | Angular | Reflex | ...
    frontend_styling_framework: str  # css | tailwindcss | ...
    frontend_script_language: str  # typescript | javascript | ...

def get_best_tech_stack_for_web_app(environment: str) -> TechStack:
    """Get the best tech stack for a web app based on current knowledge for a web app"""
    tech_stack = TechStack()
    
    # Database choices
    # SQLite for testing due to its simplicity and in-memory capability on macOS
    tech_stack.test_db = "sqlite"
    tech_stack.test_db_in_memory = True  # Faster testing with in-memory DB
    # PostgreSQL for production - robust, widely supported, great macOS compatibility
    tech_stack.prod_db = "postgres"
    
    # Basic config dictionaries
    tech_stack.test_db_config = {"uri": "sqlite:///:memory:"}
    tech_stack.prod_db_config = {
        "uri": "postgresql://user:password@localhost:5432/dbname",
        "pool_size": 20
    }
    
    # Migration framework
    # Alembic works well with SQLAlchemy and PostgreSQL, macOS-friendly
    tech_stack.migration_framework = "alembic"
    
    # Environment variables (simple strings for demo)
    tech_stack.test_db_env_vars = "TEST_DB_URL=sqlite:///:memory:"
    tech_stack.remote_db_env_vars = "DB_URL=postgresql://user:password@localhost:5432/dbname"
    
    # Docker configs as simple strings
    # SQLite doesn't need Docker for testing, using host macOS environment
    tech_stack.test_db_docker_config = ""
    tech_stack.remote_db_docker_config = """
    postgres:
        image: postgres:15
        environment:
            POSTGRES_USER: user
            POSTGRES_PASSWORD: password
            POSTGRES_DB: dbname
        ports:
            - "5432:5432"
    """
    
    # Type system and ORM
    # SQLModel combines Pydantic and SQLAlchemy, great for type safety on macOS dev
    tech_stack.type_system = "sqlmodel.SQLModel"
    tech_stack.orm = "sqlalchemy"  # SQLModel uses SQLAlchemy under the hood
    
    # Authentication
    # FastAPI Users is lightweight, integrates well with SQLAlchemy on macOS
    tech_stack.auth = "fastapi_auth"
    tech_stack.oauth = ["github", "google"]  # Common providers, well-supported
    
    # Frameworks
    # FastAPI: Modern, async, great Python ecosystem support on macOS
    tech_stack.backend_framework = "fastapi"
    # React: Widely used, great tooling support on macOS via Homebrew/Node
    tech_stack.frontend_framework = "React"
    
    # Frontend styling and scripting
    # TailwindCSS: Modern, utility-first, excellent macOS dev experience
    tech_stack.frontend_styling_framework = "tailwindcss"
    # TypeScript: Better tooling and type safety, strong macOS support
    tech_stack.frontend_script_language = "typescript"
    
    return tech_stack

# Execute and get the result
result = get_best_tech_stack_for_web_app(environment="macos")

Full Grok chat response.

I think learning to draft natural language prompts as code-like is invaluable.

If we want to do natural language without the above prompt as code-like approach, then structured outputs will help (a bit of a pain to get right with some reasoning models) enable longer conversations, since half the conversation will be structured.

If we want natural language both ways, we are much better off using short prompts and conversations (the instructional part of the prompt should be short. Long contexts are ok, if clearly demarcated with a uniform marker. Placement of long contexts within the prompt can be a pain to get right - my guess is different models prefer different locations (top, bottom).

It also depends on whether we want to automate something downstream with the response. Automating is often easier with natural language as code, though rarely not.

I suspect (haven't tried) that we can hold a long conversation with language-as-code prompts.