Week 13: Conversational Robotics

Learning Objectives

During this final week, students will integrate all the knowledge from previous modules to create conversational robots that can understand natural language commands and execute complex tasks.

Topics Covered

Integrating GPT models for conversational AI in robots
- Connecting language models to robotic systems
- Context management for multi-turn conversations
- Safety and filtering for robot commands
Speech recognition and natural language understanding
- Using OpenAI Whisper for voice commands
- Natural language processing for command interpretation
- Handling ambiguous or unclear commands
Multi-modal interaction: speech, gesture, vision
- Combining multiple input modalities
- Cross-modal attention and understanding
- Coherent multi-modal responses

Key Concepts

Conversational AI Integration

Integrating large language models with robots involves several challenges:

Command grounding: Connecting language to physical actions
World modeling: Maintaining a representation of the environment
Action planning: Translating high-level commands to low-level actions
Feedback integration: Using robot sensors to inform the conversation

Voice-to-Action Pipeline

The complete pipeline from voice command to robot action:

Speech recognition: Converting speech to text (Whisper)
Natural language understanding: Interpreting the command
Task decomposition: Breaking complex commands into subtasks
Action selection: Choosing appropriate robot actions
Execution: Performing the actions using ROS 2
Feedback: Reporting results back to the user

Conversational robots can process multiple input types:

Visual input: Cameras for scene understanding
Audio input: Microphones for speech and sound
Tactile input: Force/torque sensors for manipulation feedback
Context: Robot state, location, and history

Capstone Project: The Autonomous Humanoid

The final project integrates all course concepts:

Voice Command Reception: Using Whisper to receive a command like "Clean the room"
Cognitive Planning: Using LLMs to translate the command into a sequence of actions
Path Planning: Using Nav2 to navigate obstacles
Object Recognition: Using computer vision to identify objects
Manipulation: Using robot arms to move objects
Human Interaction: Providing feedback and handling clarifications

Practical Exercises

Integrate Whisper with ROS 2 for voice commands
Connect GPT models to robot control systems
Implement natural language to action mapping
Create a multi-modal interaction system
Complete the capstone project demonstration

Assignments

Capstone: Simulated humanoid robot with conversational AI
Demonstrate voice-to-action capabilities
Complete the "Clean the room" challenge

Learning Objectives​

Topics Covered​

Key Concepts​

Conversational AI Integration​

Voice-to-Action Pipeline​

Multi-Modal Integration​

Capstone Project: The Autonomous Humanoid​

Practical Exercises​

Assignments​