Tinker Unleashes Major Platform Updates with General Availability, Advanced Reasoning, and Vision Capabilities

The AI development landscape just got a significant boost with Tinker's announcement of major platform updates, moving from a waitlist-only service to general availability while introducing powerful new capabilities that expand its versatility for developers and researchers.

General Availability Opens Doors to All AI Builders

In a move that democratizes access to advanced AI development tools, Tinker has officially removed its waitlist, making the platform available to everyone. This general availability means developers, researchers, and organizations can now access Tinker's suite of AI tools without prior approval, accelerating innovation in the field.

"Everybody can use Tinker now; sign up here to get started," the announcement states. "See the Tinker homepage for available models and pricing, and check out the Tinker cookbook for code examples."

Article illustration 1

This open access comes at a crucial time as the demand for specialized AI models continues to grow across industries. By removing barriers to entry, Tinker is positioning itself as a go-to platform for both cutting-edge research and practical application development.

Advanced Reasoning with Kimi K2 Thinking

One of the most significant updates is the introduction of Kimi K2 Thinking, a new reasoning model that represents a leap forward in complex problem-solving capabilities. With a staggering trillion parameters, Kimi K2 is now the largest model in Tinker's lineup.

"Users can now fine-tune Kimi K2 Thinking on Tinker. With a trillion parameters, Kimi K2 is the largest model in our lineup so far. It is built for long chains of reasoning and tool use," the announcement explains.

This model is specifically designed for applications that require multi-step reasoning and integration with external tools—capabilities that are increasingly important for developing sophisticated AI systems that can tackle complex, real-world problems. The ability to fine-tune such a large model opens up possibilities for creating highly specialized reasoning systems tailored to specific domains.

Seamless Integration with OpenAI API Compatibility

Recognizing the prevalence of OpenAI's API in the AI ecosystem, Tinker has introduced OpenAI API-compatible scaffolding for its inference interface. This compatibility allows developers to leverage Tinker's models using familiar OpenAI API syntax, significantly reducing the learning curve for adopting the platform.

The announcement provides a clear example of how this works:

# Tinker's standard function for inference
prompt = types.ModelInput.from_ints(tokenizer.encode("The capital of France is",))
params = types.SamplingParams(max_tokens=20, temperature=0.0, stop=["
"])
future = sampling_client.sample(prompt=prompt, sampling_params=params)

# OpenAI API-compatible sampling
response = openai_client.completions.create(
    model="tinker://0034d8c9-0a88-52a9-b2b7-bce7cb1e6fef:train:0/sampler_weights/000080",
    prompt="The capital of France is",
    max_tokens=20,
    temperature=0.0,
    stop=["
"],
)

This compatibility extends beyond simple inference—it also works with models that are still in training. "We have added OpenAI API-compatible scaffolding for quickly sampling from a model by specifying a path, even while it's still training," the announcement notes. "This also means Tinker can now plug-and-play with any OpenAI API-compatible platform."

This feature is particularly valuable for organizations looking to experiment with different models while maintaining consistent API interfaces across their infrastructure.

Vision Input Capabilities with Qwen3-VL

Perhaps the most groundbreaking update is the addition of vision input support through Qwen3-VL models. Tinker now offers two vision models: Qwen3-VL-30B-A3B-Instruct and Qwen3-VL-235B-A22B-Instruct, enabling developers to process and analyze images alongside text.

"To input images, just interleave together an ImageChunk – consisting of your image, saved as bytes – with text chunks," the announcement demonstrates with this code example:

model_input = tinker.ModelInput(chunks=[
  tinker.types.ImageChunk(data=image_data, format="png"),
  tinker.types.EncodedTextChunk(tokens=tokenizer.encode("What is this?")),
])

These vision capabilities aren't limited to simple image recognition—they can be integrated into various applications including supervised fine-tuning (SFT) and reinforcement learning (RL) fine-tuning. This opens up possibilities for multimodal AI systems that can understand and reason across different types of data.

Demonstrating Vision Capabilities: Image Classification

To showcase the power of their vision models, the Tinker team conducted experiments on image classification tasks using four classic datasets:

  • Caltech 101 (101 general object categories)
  • Stanford Cars (car makes, models, and years)
  • Oxford Flowers (flower species)
  • Oxford Pets (pet breeds)

The approach is particularly interesting because it leverages the language model aspect of Qwen3-VL. "Since Qwen3-VL is a language model, we frame classification as text generation: given an image, the model outputs the class name," the team explains.

They compared this approach against a traditional vision baseline—DINOv2-base, a self-supervised vision transformer commonly used for computer vision tasks. For the comparison, both models were fine-tuned using LoRA (Low-Rank Adaptation), a parameter-efficient fine-tuning method.

The key finding was data efficiency. "Labeled image data is scarce for many real-world use cases, so data efficiency is the primary measure we look at," the team states. Their results show that in the limited-data regime, Qwen3-VL-235-A22B outperforms DINOv2.

This advantage comes from two factors: the larger model size and the inherent language knowledge that vision-language models possess. "Not only is it a bigger model, but as a VLM, it also comes with language knowledge out-of-the-box (i.e. what a 'golden retriever' or 'sunflower' is)," the team explains.

This result has significant implications for real-world applications where labeled data is often limited. It suggests that vision-language models may offer a more efficient path to specialized computer vision tasks compared to traditional approaches.

Implications for the AI Development Landscape

These updates collectively position Tinker as a more comprehensive platform for AI development. The general availability lowers barriers to entry, while the new features expand the types of AI systems that can be built on the platform.

The addition of Kimi K2 Thinking addresses the growing need for AI systems that can perform complex reasoning—a critical capability for applications in fields like scientific research, financial analysis, and strategic planning.

The OpenAI API compatibility recognizes the importance of interoperability in the current AI ecosystem, allowing developers to more easily adopt Tinker's models without overhauling existing infrastructure.

Perhaps most significant is the introduction of vision capabilities through Qwen3-VL. This positions Tinker at the forefront of the multimodal AI trend, where systems can process and reason across different types of data. The demonstrated data efficiency in vision tasks suggests these models could be particularly valuable for real-world applications where labeled data is scarce.

As the Tinker team notes, "Tinker exists to enable builders and researchers to train and customize state-of-the-art models. As always, we look forward to seeing what you build with Tinker."

With these updates, the platform is well-positioned to support the next wave of AI innovation, making advanced capabilities more accessible to a broader community of developers and researchers.