[NOTE] Vision and NLP Models, AI Agents

Topics on vision and language models, including classic DNN-based image classification, visual question answering, visual object tracking, transformer-based LLM, and their potential applications such as AI agents or robots.

Visual Object Tracking (VOT)

VOT Basics


Multi-Modal Models