SiliconCloud has announced a significant upgrade to its DeepSeek-R1 and other inference model APIs, aiming to better meet developers' needs for long contexts and flexible parameter configurations. In this upgrade, the maximum context length of multiple inference models has been increased to 128K, enabling the model to think more comprehensively and produce more complete output content.

image.png

In this upgrade, many well-known models, such as Qwen3, QWQ, GLM-Z1, etc., all support a maximum context length of 128K, while DeepSeek-R1 supports 96K. This enhancement provides strong support for complex reasoning tasks like code generation and intelligent applications.

More importantly, SiliconCloud has introduced an independent control function for "reasoning chain" and "response content" lengths. Through this approach, developers can more efficiently utilize the model's reasoning capabilities. The maximum response length (max_tokens) is now only used to limit the content ultimately output by the model to users, while the reasoning budget (thinking_budget) is specifically designed to control the number of tokens used during the reasoning phase. Such a design allows developers to flexibly adjust the depth of the model's reasoning and the length of the output based on the complexity of the actual task.

For example, on the SiliconCloud platform, users can control the maximum reasoning chain length and maximum response length of the Qwen3-14B series by setting the thinking_budget and max_tokens. During the reasoning process, if the number of tokens generated in the reasoning stage reaches the thinking_budget, the Qwen3 series inference models will forcibly stop the reasoning chain. For other inference models, they may continue to output reasoning content.

image.png

In addition, if the maximum response length exceeds the max_tokens or the context length exceeds the context_length limit, the model's output response content will be truncated, and the finish_reason field in the response will be marked as length, indicating that the output was terminated due to length restrictions.

Users can visit SiliconCloud's official documentation to learn more about the details of API usage. As SiliconCloud continues to innovate, user experience will continue to improve, and more features will be rolled out successively.

https://docs.siliconflow.cn/en/userguide/capabilities/reasoning

Key Points:

🔹 Supports a maximum context length of 128K, enhancing the model's reasoning and output capabilities.  

🔹 Independently controls the length of the reasoning chain and response content, increasing developer flexibility.  

🔹 If length limits are reached, the model's output will be truncated, with reasons clearly marked.