Recently, Microsoft's research team conducted a comparative study on API proxies and GUI proxies, finding that both have their own characteristics and can be reasonably chosen based on different requirements. API proxies interact with software through programmable interfaces, while GUI proxies simulate human operation methods, completing tasks by clicking buttons and navigating menus. For example, if you want to schedule an event, an API proxy might only require one function call, while a GUI proxy would need to open the calendar application, gradually filling in relevant information.
In the study, the Microsoft team evaluated the performance of these two types of proxies across nine categories. One major difference lies in how they interact with software: API proxies utilize function calls, generally performing more stably with low error rates; whereas GUI proxies rely on visual interface content, which, though less efficient, offers higher flexibility. GUI proxies can control almost all software with visible interfaces, even those that do not provide APIs.
The study also pointed out that API proxies have advantages in security and maintenance, as access permissions can be restricted at the functional level, and version control benefits are enjoyed. In contrast, GUI proxies are more vulnerable, as slight visual changes can cause them to malfunction. However, GUI proxies offer higher transparency, allowing users to clearly see each operation, making auditing easier.
Microsoft proposed three hybrid system strategies combining API proxies and GUI proxies. The first strategy is to encapsulate GUI operations through APIs, such as simplifying multi-step processes like generating financial reports into a single GenerateReport() function. The second strategy is to use orchestration tools to coordinate steps between APIs and GUIs, suitable for workflows like database queries and credit checks. The third strategy is low-code and no-code platforms, enabling non-technical users to build automation processes via drag-and-drop interfaces.
When choosing the appropriate proxy, the research team provided clear guidance. API proxies are suitable for high-performance tasks, especially when dealing with well-documented interfaces. GUI proxies, on the other hand, are ideal for legacy systems lacking APIs and mobile applications. Over time, hybrid systems can adapt to new APIs, offering greater flexibility.
Key Points:
🌟 API proxies achieve fast and stable task completion through function calls, suitable for environments with high security requirements.
🔄 GUI proxies are highly flexible, able to handle visual interface changes, suitable for old systems and tasks requiring visual confirmation.
🤝 Hybrid systems combine the strengths of both, selecting optimal solutions based on specific needs, driving automation processes forward.