Alibaba Qwen AI Models Developed by Chinese Researchers Can Control PCs and Phones
Alibaba's Qwen AI models can control PCs and phones, analyze text and images, and outperform GPT-4o in benchmarks. Learn how they revolutionize AI.


Alibaba's Qwen team has introduced the Qwen2.5-VL series, a family of AI models capable of performing a variety of text and image analysis tasks. These models can parse files, understand videos, count objects in images, and even control PCs and mobile devices. The flagship model, Qwen2.5-VL-72B, surpasses competitors like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet in various evaluations. Notably, Qwen2.5-VL can analyze charts, extract data from scanned documents, comprehend lengthy videos, and recognize various products and media characters. Additionally, it can interact with software on PCs and mobile devices, demonstrating capabilities such as booking flights through apps. The smaller models in the series are available under a permissive license, while the flagship model requires permission from Alibaba for commercial deployment.
In the rapidly evolving field of artificial intelligence, Alibaba's Qwen team has made a significant step with the release of their latest AI models, collectively known as Qwen2.5-VL. These models are designed to perform a wide array of tasks, including text and image analysis, video comprehension, and even controlling electronic devices such as PCs and smartphones. This development positions Alibaba at the forefront of AI innovation, showcasing capabilities that rival and, in some areas, surpass those of leading competitors.
Key Features of Qwen2.5-VL
Advanced Multimodal Understanding
Qwen2.5-VL models are equipped to handle complex tasks that require the integration of textual and visual information. They can parse various file types, analyze images to count objects, and comprehend videos of extended durations. This multimodal capability enables applications across diverse sectors, from document analysis to media content understanding.
Device Control Capabilities
One of the standout features of Qwen2.5-VL is its ability to interact with and control electronic devices. Demonstrations have shown the model launching applications on smartphones and performing tasks such as booking flights, highlighting its potential as a personal assistant and in automating routine tasks.
Superior Performance Metrics
Benchmarking tests indicate that the flagship model, Qwen2.5-VL-72B, outperforms notable AI models like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet in various evaluations, including video understanding, mathematical reasoning, document analysis, and question-answering tasks. This performance underscores the model's advanced reasoning and comprehension abilities.
Versatile Application Potential
Beyond traditional text and image analysis, Qwen2.5-VL can extract data from scanned documents such as invoices and forms, comprehend multi-hour videos, and recognize a wide array of products and characters from films and TV series. These capabilities open avenues in fields like automated data entry, content summarization, and enhanced media recognition.
Licensing and Accessibility
Alibaba has adopted a tiered approach to licensing for the Qwen2.5-VL series. The smaller models, Qwen2.5-VL-3B and Qwen2.5-VL-7B, are available under a permissive license, encouraging widespread experimentation and development. In contrast, the flagship Qwen2.5-VL-72B model is distributed under a custom license that requires entities with more than 100 million monthly active users to seek permission from Alibaba before deploying the model commercially. This strategy balances open innovation with controlled application in large-scale commercial environments.
Ethical and Regulatory Considerations
As with many AI models developed in China, Qwen2.5-VL operates within certain content restrictions. For instance, when queried on sensitive political topics, the model may decline to respond, reflecting compliance with national regulations that mandate adherence to core socialist values. This aspect is crucial for developers to consider when planning applications intended for diverse user bases.
Conclusion
Alibaba's Qwen2.5-VL series represents a significant advancement in AI capabilities, particularly in integrating multimodal understanding with practical device control. Its superior performance in various benchmarks and versatile application potential make it a noteworthy development in the AI landscape. However, prospective users and developers should be mindful of the licensing terms and content restrictions associated with these models to ensure responsible and compliant deployment.
As AI continues to evolve, models like Qwen2.5-VL highlight the rapid progress being made and the expanding possibilities for integrating AI into everyday tasks and professional workflows.