Upload an image and input description text, the system will return the thinking process and region annotation