An astrophysicist is using OpenAI's Codex to attack a decades-old black hole simulation problem

Chi-kwan Chan of the University of Arizona is using Codex to derive and test new numerical schemes for simulating plasma around black holes. The interesting part isn't that an AI wrote code, it's why this particular problem is a good fit for a tool that's wrong most of the time.

OpenAI published a profile of Chi-kwan Chan, an astrophysicist at the University of Arizona and Steward Observatory, who is using Codex to help derive algorithms for simulating the plasma that swirls around black holes. Chan is part of the Event Horizon Telescope collaboration, the group that produced the first image of a black hole's shadow in 2019 and is now working toward a video of the supermassive black hole at the center of the M87 galaxy.

The framing from OpenAI is, predictably, optimistic. The actual computational problem Chan describes is specific and worth understanding, because it explains why a code-generating model is useful here in a way it often isn't elsewhere.

What's claimed

The claim is narrow and Chan states it carefully. He is using Codex to propose candidate numerical schemes for a stiff simulation problem, then testing those candidates against known analytic solutions. He is not claiming Codex discovered new physics, and he is explicit that most of what it generates is wrong. The pitch is that the model lets him explore a larger space of mathematical reformulations than he could work through by hand.

That is a meaningfully different claim from the usual "AI accelerates science" copy. Chan is treating Codex as a generator of testable hypotheses in a domain where testing is cheap relative to derivation.

The actual computational problem

The physics that makes this hard is a timestep problem, and it's the kind of thing anyone who has written a numerical integrator will recognize.

Most simulations of plasma near a black hole treat the plasma as a fluid. That works when electrons and ions collide frequently, because frequent collisions are what justify the fluid approximation in the first place. The standard magnetohydrodynamic equations apply, and you get tractable behavior. But near supermassive black holes, some regions of plasma get so hot and so diffuse that particles almost never collide. Chan's phrasing: "They don't really collide with each other."

Without collisions, you can't treat the plasma as a fluid. The particles instead spiral tightly around magnetic field lines, corkscrewing as they orbit. To model that directly, a simulation has to resolve each tight gyration, and the radius and period of that gyration are tiny compared to the scale of the system you actually care about. The integrator is forced into extremely small timesteps to keep the spiral motion stable.

This is a multiscale stiffness problem. You want to study large-scale plasma behavior over long timescales, but the numerics force you to resolve the fastest, smallest motion in the system. The result, as Chan puts it, is that even the fastest supercomputers spend most of their compute resolving minuscule particle gyrations rather than the dynamics anyone wants to observe. He says this has limited how realistically black hole plasma can be simulated "for decades," which is not hype. Gyrokinetics and gyroaveraging are well-established research areas in fusion plasma physics precisely because this problem is hard and old.

What's actually new

The approach Chan describes is to change, mathematically, how the simulation tracks particle motion, so the integrator no longer has to follow every individual spiral. This is the conceptual family that includes gyroaveraging: you average over the fast circular motion and evolve the slower guiding-center dynamics instead. The hard part is deriving a reformulation that is both mathematically valid and numerically stable, and verifying it reproduces known solutions.

What Codex contributes is breadth of search. Chan's point is that working through every candidate transformation by hand is slow, and a model that can propose and implement many numerical schemes lets him test more of them. The schemes are inspectable. His group reads the proposed code, runs it against reference solutions, and keeps what survives. This is the opposite of the black-box concern that makes scientists wary of AI: nothing is accepted because the model produced it.

Chan's own line is the most useful summary of the methodology. "We don't accept an idea because it came from Einstein, from a bright student, or from an AI model. We accept it only after repeated testing."

Limitations

A few things to keep in proportion.

Nothing here has produced a validated result yet. Chan says "if" the approaches succeed, the payoff would be simulating trillions of particles around black holes. The work is in the candidate-generation and testing phase. The headline number, trillions of particles, is the goal, not an achievement.

The reason this works as an AI use case is the verification asymmetry, and that asymmetry doesn't generalize to most science. Numerical schemes can be checked against analytic solutions and convergence tests. A wrong scheme fails a test quickly and cheaply, so a high false-positive rate from the generator is tolerable. In domains where validating a hypothesis requires a multi-year experiment, a model that's wrong most of the time is a much worse deal. Chan is essentially arguing that fields with fast, rigorous tests are the natural fit for current models, and that's a sharper claim than "AI helps research."

There is also a sampling bias in any vendor-published profile. This is OpenAI describing a researcher using OpenAI's product, and the honest, qualified tone of Chan's quotes is doing a lot of work to make it credible. The signal here is the structure of the workflow, generate then verify, not the marketing around it.

Why it's a reasonable pattern

Stripped of the product framing, this is a clean example of using a code model where its weaknesses don't matter. The bottleneck Chan faces isn't compute alone, it's the human time to derive and implement many candidate reformulations of a stiff integration problem. Codex compresses that derivation-and-implementation loop, and the existing scientific infrastructure of test cases and reproducibility catches the failures.

That is a more defensible use of an LLM than asking it to be right. It's being used as a fast, cheap, occasionally-correct proposal engine bolted onto a verification pipeline that already exists. For the specific problem of getting around timestep stiffness in collisionless plasma simulations, that's a sensible division of labor. Whether it actually unlocks trillion-particle black hole simulations is a question the convergence tests, not the press release, will eventually answer.

#OpenAI Codex #Astrophysics #numerical simulation #black holes #Scientific Computing