Rio-3.5-Open-397B ≈ 0.6 x Nex-N2_pro + 0.4 x Qwen

Nex-AGI claims IplanRIO's 'original' 397B model is actually a direct element-wise merge of their Nex model with Qwen, backed by weight analysis and behavioral tests showing the model identifies itself as Nex when its system prompt is removed.

When IplanRIO released Rio-3.5-Open-397B as an "original 397B model trained by IplanRIO," the open-source AI community had little reason to doubt the claim. Training a model of that scale is expensive and complex, but not unheard of. What sounded unusual, however, was the speed at which the model appeared to converge on identities and behaviors that didn't quite match the stated training provenance.

Nex-AGI, the team behind Nex-N2, decided to investigate. Their findings, published in a GitHub issue, paint a picture of what appears to be a straightforward weight merge disguised as independent training. The math is hard to argue with: Rio's weights are, across every layer and component, a fixed blend of approximately 57% Nex and 43% Qwen.

IMAGE:1

The Identity Test

The first piece of evidence comes from something surprisingly simple: ask the model who it is. Rio ships with a hard-coded system prompt forcing it to identify as "Rio." Nex-AGI removed that prompt and posed 120 identity questions to the underlying model.

The results were damning. When asked "who are you?", Rio answered "Nex" 79.2% of the time and "Nex-AGI" 73.3% of the time. It answered "Rio" exactly 0% of the time. A model that is supposed to be Rio, when stripped of its forced identity layer, consistently calls itself Nex and recites Nex-AGI's organizational backstory word for word.

This isn't a case of a model picking up similar patterns from shared training data. The specific phrasing Nex-AGI trained into Nex, including references to the Shanghai Innovation Institute and their "ecosystem alliance" framing, appears verbatim in Rio's responses. Those identity markers exist in hundreds of Nex's training examples. They don't appear in any public Qwen documentation or training materials.

The implication is straightforward: Rio's weights carry Nex's trained identity, and the "You are Rio" system prompt exists specifically to suppress what the model's weights keep trying to say.

The Weight Analysis

Behavioral tests are suggestive. Weight analysis is definitive.

If Rio is truly a merge of Nex and Qwen at some ratio α, then for every single weight tensor in the model, the relationship (Rio - Qwen) = α × (Nex - Qwen) must hold. This is a mathematical property of linear interpolation, and it's trivially verifiable by computing the mixing weight α and measuring collinearity.

Collinearity, expressed as cos_fit, measures how closely the deviation vectors align. For unrelated models in a billion-dimensional parameter space, cos_fit should be approximately zero. The directions are essentially orthogonal. For a genuine merge, cos_fit should be approximately one.

Nex-AGI's measurements across all 60 layers and every component of Rio's architecture:

Component	Mixing Weight (α)	Collinearity (cos_fit)
Routed experts (387B parameters, 60 layers)	0.571 ± 0.0016	0.993
lm_head (output head)	0.574	0.991
Attention (q/k/v/o, 15 full-attention layers)	~0.585	~0.986
Linear-attention projections (45 layers)	~0.586	~0.984

A collinearity of 0.98 to 0.99 across billions of parameters is not a statistical anomaly. It represents thousands of standard deviations from what you'd expect from unrelated models. The standard deviation of α across the expert blocks, just 0.0016, is remarkably tight. This is one model poured into another at a fixed ratio.

IMAGE:2

Why This Matters

Model merging is a legitimate technique in open-source AI. Libraries like mergekit make it straightforward to combine weights from different models at specified ratios. The community uses it to blend capabilities, create specialized variants, and build on top of existing work. There's nothing inherently wrong with releasing a merged model.

What's problematic is labeling a merge as an "original" trained model. It misrepresents the work involved, obscures the provenance of the weights, and potentially violates the terms of models with restrictive licenses. If you merge Apache-licensed and proprietary weights, you need to understand and disclose the licensing implications.

This incident also highlights a growing tension in the open-source AI space. As model weights proliferate and merging tools improve, the barrier to creating "new" models drops. But the barrier to verifying originality remains relatively high. Nex-AGI had to run detailed weight-level analysis across billions of parameters to prove what their behavioral tests suggested.

The broader ecosystem needs better norms around disclosure. When someone releases a model, there should be clear documentation about whether it's trained from scratch, fine-tuned, or merged from existing weights. Weight provenance should be as transparent as code provenance.

IMAGE:3

The Verification Problem

Nex-AGI's analysis is thorough, but it also reveals something uncomfortable about the current state of open-source AI. Detecting a model merge requires access to the original weights, statistical expertise, and computational resources to run tensor-level comparisons across billions of parameters. Most users downloading a model from Hugging Face don't have the tools or knowledge to verify provenance claims.

This creates an asymmetry. It's easy to merge two models and relabel the result. It's hard to prove that's what happened. The community relies heavily on trust and self-reporting, which works until it doesn't.

Some possible mitigations exist. Watermarking techniques could embed attribution information directly in model weights. Standardized reporting formats could require disclosure of training methodology. Model cards could include more detailed provenance information. But none of these are widely adopted yet.

For now, incidents like this serve as a reminder that the open-source AI ecosystem is still figuring out its norms. The tools for creating models have outpaced the tools for verifying them. That gap will need to close as the stakes get higher and the models get more influential.

The Nex-AGI team has laid out their evidence clearly. The weight analysis is reproducible. The behavioral tests are straightforward. The conclusion, that Rio is a merge rather than an independently trained model, is well-supported. What happens next, whether IplanRIO responds, whether this changes how model releases are handled going forward, remains to be seen. But the mathematical evidence speaks for itself.