INNDrive by Vision, Not APIs
Read →Itemize the workflows where every step has an API and the count is near zero — so the agent must see and act on the screen like a human.
Why Google failed to make GPT-3 + why Multimodal Agents are the path to AGI — with David Luan of Adept · Latent Space (swyx & Alessio)