Initial commit with translated description
This commit is contained in:
448
AI_AGENT_GUIDE.md
Normal file
448
AI_AGENT_GUIDE.md
Normal file
@@ -0,0 +1,448 @@
|
||||
# AI Desktop Agent - Cognitive Automation Guide
|
||||
|
||||
## 🤖 What Is This?
|
||||
|
||||
The **AI Desktop Agent** is an intelligent layer on top of the basic desktop control that **understands** what you want and figures out how to do it autonomously.
|
||||
|
||||
Unlike basic automation that requires exact instructions, the AI Agent:
|
||||
- **Understands natural language** ("Draw a cat in Paint")
|
||||
- **Plans the steps** automatically
|
||||
- **Executes autonomously**
|
||||
- **Adapts** based on what it sees
|
||||
|
||||
---
|
||||
|
||||
## 🎯 What Can It Do?
|
||||
|
||||
### ✅ Autonomous Drawing
|
||||
```python
|
||||
from skills.desktop_control.ai_agent import AIDesktopAgent
|
||||
|
||||
agent = AIDesktopAgent()
|
||||
|
||||
# Just describe what you want!
|
||||
agent.execute_task("Draw a circle in Paint")
|
||||
agent.execute_task("Draw a star in MS Paint")
|
||||
agent.execute_task("Draw a house with a sun")
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Opens MS Paint
|
||||
2. Selects pencil tool
|
||||
3. Figures out how to draw the requested shape
|
||||
4. Draws it autonomously
|
||||
5. Takes a screenshot of the result
|
||||
|
||||
### ✅ Autonomous Text Entry
|
||||
```python
|
||||
# It figures out where to type
|
||||
agent.execute_task("Type 'Hello World' in Notepad")
|
||||
agent.execute_task("Write an email saying thank you")
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Opens Notepad (or finds active text editor)
|
||||
2. Types the text naturally
|
||||
3. Formats if needed
|
||||
|
||||
### ✅ Autonomous Application Control
|
||||
```python
|
||||
# It knows how to open apps
|
||||
agent.execute_task("Open Calculator")
|
||||
agent.execute_task("Launch Microsoft Paint")
|
||||
agent.execute_task("Open File Explorer")
|
||||
```
|
||||
|
||||
### ✅ Autonomous Game Playing (Advanced)
|
||||
```python
|
||||
# It will try to play the game!
|
||||
agent.execute_task("Play Solitaire for me")
|
||||
agent.execute_task("Play Minesweeper")
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Analyzes the game screen
|
||||
2. Detects game state (cards, mines, etc.)
|
||||
3. Decides best move
|
||||
4. Executes the move
|
||||
5. Repeats until win/lose
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ How It Works
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
User Request ("Draw a cat")
|
||||
↓
|
||||
Natural Language Understanding
|
||||
↓
|
||||
Task Planning (Step-by-step plan)
|
||||
↓
|
||||
Step Execution Loop:
|
||||
- Observe Screen (Computer Vision)
|
||||
- Decide Action (AI Reasoning)
|
||||
- Execute Action (Desktop Control)
|
||||
- Verify Result
|
||||
↓
|
||||
Task Complete!
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
1. **Task Planner** - Breaks down high-level tasks into steps
|
||||
2. **Vision System** - Understands what's on screen (screenshots, OCR, object detection)
|
||||
3. **Reasoning Engine** - Decides what to do next
|
||||
4. **Action Executor** - Performsthe actual mouse/keyboard actions
|
||||
5. **Feedback Loop** - Verifies actions succeeded
|
||||
|
||||
---
|
||||
|
||||
## 📋 Supported Tasks (Current)
|
||||
|
||||
### Tier 1: Fully Automated ✅
|
||||
|
||||
| Task Pattern | Example | Status |
|
||||
|-------------|---------|---------|
|
||||
| Draw shapes in Paint | "Draw a circle" | ✅ Working |
|
||||
| Basic text entry | "Type Hello" | ✅ Working |
|
||||
| Launch applications | "Open Paint" | ✅ Working |
|
||||
|
||||
### Tier 2: Partially Automated 🔨
|
||||
|
||||
| Task Pattern | Example | Status |
|
||||
|-------------|---------|---------|
|
||||
| Form filling | "Fill out this form" | 🔨 In Progress |
|
||||
| File operations | "Copy these files" | 🔨 In Progress |
|
||||
| Web navigation | "Find on Google" | 🔨 Planned |
|
||||
|
||||
### Tier 3: Experimental 🧪
|
||||
|
||||
| Task Pattern | Example | Status |
|
||||
|-------------|---------|---------|
|
||||
| Game playing | "Play Solitaire" | 🧪 Experimental |
|
||||
| Image editing | "Resize this photo" | 🧪 Planned |
|
||||
| Code editing | "Fix this bug" | 🧪 Research |
|
||||
|
||||
---
|
||||
|
||||
## 🎨 Example: Drawing in Paint
|
||||
|
||||
### Simple Request
|
||||
```python
|
||||
agent = AIDesktopAgent()
|
||||
result = agent.execute_task("Draw a circle in Paint")
|
||||
|
||||
# Check result
|
||||
print(f"Status: {result['status']}")
|
||||
print(f"Steps taken: {len(result['steps'])}")
|
||||
```
|
||||
|
||||
### What Happens Behind the Scenes
|
||||
|
||||
**1. Planning Phase:**
|
||||
```
|
||||
Plan generated:
|
||||
Step 1: Launch MS Paint
|
||||
Step 2: Wait 2s for Paint to load
|
||||
Step 3: Activate Paint window
|
||||
Step 4: Select pencil tool (press 'P')
|
||||
Step 5: Draw circle at canvas center
|
||||
Step 6: Screenshot the result
|
||||
```
|
||||
|
||||
**2. Execution Phase:**
|
||||
```
|
||||
[✓] Launched Paint via Win+R → mspaint
|
||||
[✓] Waited 2.0s
|
||||
[✓] Activated window "Paint"
|
||||
[✓] Pressed 'P' to select pencil
|
||||
[✓] Drew circle with 72 points
|
||||
[✓] Screenshot saved: drawing_result.png
|
||||
```
|
||||
|
||||
**3. Result:**
|
||||
```python
|
||||
{
|
||||
"task": "Draw a circle in Paint",
|
||||
"status": "completed",
|
||||
"success": True,
|
||||
"steps": [... 6 steps ...],
|
||||
"screenshots": [... 6 screenshots ...],
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎮 Example: Game Playing
|
||||
|
||||
```python
|
||||
agent = AIDesktopAgent()
|
||||
|
||||
# Play a simple game
|
||||
result = agent.execute_task("Play Solitaire for me")
|
||||
```
|
||||
|
||||
### Game Playing Loop
|
||||
|
||||
```
|
||||
1. Analyze screen → Detect cards, positions
|
||||
2. Identify valid moves → Find legal plays
|
||||
3. Evaluate moves → Which is best?
|
||||
4. Execute move → Click and drag card
|
||||
5. Repeat until game ends
|
||||
```
|
||||
|
||||
### Game-Specific Intelligence
|
||||
|
||||
The agent can learn patterns for:
|
||||
- **Solitaire**: Card stacking rules, suit matching
|
||||
- **Minesweeper**: Probability calculations, safe clicks
|
||||
- **2048**: Tile merging strategy
|
||||
- **Chess** (if integrated with engine): Move evaluation
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Enhancing the AI
|
||||
|
||||
### Adding Application Knowledge
|
||||
|
||||
```python
|
||||
# In ai_agent.py, add to app_knowledge:
|
||||
|
||||
self.app_knowledge = {
|
||||
"photoshop": {
|
||||
"name": "Adobe Photoshop",
|
||||
"launch_command": "photoshop",
|
||||
"common_actions": {
|
||||
"new_layer": {"hotkey": ["ctrl", "shift", "n"]},
|
||||
"brush_tool": {"hotkey": ["b"]},
|
||||
"eraser": {"hotkey": ["e"]},
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Custom Task Patterns
|
||||
|
||||
```python
|
||||
# Add a custom planning method
|
||||
def _plan_photo_edit(self, task: str) -> List[Dict]:
|
||||
"""Plan for photo editing tasks."""
|
||||
return [
|
||||
{"type": "launch_app", "app": "photoshop"},
|
||||
{"type": "wait", "duration": 3.0},
|
||||
{"type": "open_file", "path": extracted_path},
|
||||
{"type": "apply_filter", "filter": extracted_filter},
|
||||
{"type": "save_file"},
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Advanced: Vision + Reasoning
|
||||
|
||||
### Screen Analysis
|
||||
|
||||
The agent can analyze screenshots to:
|
||||
- **Detect UI elements** (buttons, text fields, menus)
|
||||
- **Read text** (OCR for labels, instructions)
|
||||
- **Identify objects** (icons, images, game pieces)
|
||||
- **Understand layout** (where things are)
|
||||
|
||||
```python
|
||||
# Analyze what's on screen
|
||||
analysis = agent._analyze_screen()
|
||||
|
||||
print(analysis)
|
||||
# Output:
|
||||
# {
|
||||
# "active_window": "Untitled - Paint",
|
||||
# "mouse_position": (640, 480),
|
||||
# "detected_elements": [...],
|
||||
# "text_found": [...],
|
||||
# }
|
||||
```
|
||||
|
||||
### Integration with OpenClaw LLM
|
||||
|
||||
```python
|
||||
# Future: Use OpenClaw's LLM for reasoning
|
||||
agent = AIDesktopAgent(llm_client=openclaw_llm)
|
||||
|
||||
# The agent can now:
|
||||
# - Reason about complex tasks
|
||||
# - Understand context better
|
||||
# - Plan more sophisticated workflows
|
||||
# - Learn from feedback
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Extending for Your Needs
|
||||
|
||||
### Add Support for New Apps
|
||||
|
||||
1. **Identify the app**
|
||||
2. **Document common actions**
|
||||
3. **Add to knowledge base**
|
||||
4. **Create planning method**
|
||||
|
||||
Example: Adding Excel support
|
||||
|
||||
```python
|
||||
# Step 1: Add to app_knowledge
|
||||
"excel": {
|
||||
"name": "Microsoft Excel",
|
||||
"launch_command": "excel",
|
||||
"common_actions": {
|
||||
"new_sheet": {"hotkey": ["shift", "f11"]},
|
||||
"sum_formula": {"action": "type", "text": "=SUM()"},
|
||||
}
|
||||
}
|
||||
|
||||
# Step 2: Create planner
|
||||
def _plan_excel_task(self, task: str) -> List[Dict]:
|
||||
return [
|
||||
{"type": "launch_app", "app": "excel"},
|
||||
{"type": "wait", "duration": 2.0},
|
||||
# ... specific Excel steps
|
||||
]
|
||||
|
||||
# Step 3: Hook into main planner
|
||||
if "excel" in task_lower or "spreadsheet" in task_lower:
|
||||
return self._plan_excel_task(task)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Real-World Use Cases
|
||||
|
||||
### 1. Automated Form Filling
|
||||
```python
|
||||
agent.execute_task("Fill out the job application with my resume data")
|
||||
```
|
||||
|
||||
### 2. Batch Image Processing
|
||||
```python
|
||||
agent.execute_task("Resize all images in this folder to 800x600")
|
||||
```
|
||||
|
||||
### 3. Social Media Posting
|
||||
```python
|
||||
agent.execute_task("Post this image to Instagram with caption 'Beautiful sunset'")
|
||||
```
|
||||
|
||||
### 4. Data Entry
|
||||
```python
|
||||
agent.execute_task("Copy data from this PDF to Excel spreadsheet")
|
||||
```
|
||||
|
||||
### 5. Testing
|
||||
```python
|
||||
agent.execute_task("Test the login form with invalid credentials")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Enable/Disable Failsafe
|
||||
```python
|
||||
# Safe mode (default)
|
||||
agent = AIDesktopAgent(failsafe=True)
|
||||
|
||||
# Fast mode (no failsafe)
|
||||
agent = AIDesktopAgent(failsafe=False)
|
||||
```
|
||||
|
||||
### Set Max Steps
|
||||
```python
|
||||
# Prevent infinite loops
|
||||
result = agent.execute_task("Play game", max_steps=100)
|
||||
```
|
||||
|
||||
### Access Action History
|
||||
```python
|
||||
# Review what the agent did
|
||||
print(agent.action_history)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Debugging
|
||||
|
||||
### View Step-by-Step Execution
|
||||
```python
|
||||
result = agent.execute_task("Draw a star in Paint")
|
||||
|
||||
for i, step in enumerate(result['steps'], 1):
|
||||
print(f"Step {i}: {step['step']['description']}")
|
||||
print(f" Success: {step['success']}")
|
||||
if 'error' in step:
|
||||
print(f" Error: {step['error']}")
|
||||
```
|
||||
|
||||
### View Screenshots
|
||||
```python
|
||||
# Each step captures before/after screenshots
|
||||
for screenshot_pair in result['screenshots']:
|
||||
before = screenshot_pair['before']
|
||||
after = screenshot_pair['after']
|
||||
|
||||
# Display or save for analysis
|
||||
before.save(f"step_{screenshot_pair['step']}_before.png")
|
||||
after.save(f"step_{screenshot_pair['step']}_after.png")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Future Enhancements
|
||||
|
||||
Planned features:
|
||||
|
||||
- [ ] **Computer Vision**: OCR, object detection, UI element recognition
|
||||
- [ ] **LLM Integration**: Natural language understanding with OpenClaw LLM
|
||||
- [ ] **Learning**: Remember successful patterns, improve over time
|
||||
- [ ] **Multi-App Workflows**: "Get data from Chrome and put in Excel"
|
||||
- [ ] **Voice Control**: "Alexa, draw a cat in Paint"
|
||||
- [ ] **Autonomous Debugging**: Fix errors automatically
|
||||
- [ ] **Game AI**: Reinforcement learning for game playing
|
||||
- [ ] **Web Automation**: Full browser control with understanding
|
||||
|
||||
---
|
||||
|
||||
## 📚 Full API
|
||||
|
||||
### Main Methods
|
||||
|
||||
```python
|
||||
# Execute a task
|
||||
result = agent.execute_task(task: str, max_steps: int = 50)
|
||||
|
||||
# Analyze screen
|
||||
analysis = agent._analyze_screen()
|
||||
|
||||
# Manual mode: Execute individual steps
|
||||
step = {"type": "launch_app", "app": "paint"}
|
||||
result = agent._execute_step(step)
|
||||
```
|
||||
|
||||
### Result Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"task": str, # Original task
|
||||
"status": str, # "completed", "failed", "error"
|
||||
"success": bool, # Overall success
|
||||
"steps": List[Dict], # All steps executed
|
||||
"screenshots": List[Dict], # Before/after screenshots
|
||||
"failed_at_step": int, # If failed, which step
|
||||
"error": str, # Error message if failed
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**🦞 Built for OpenClaw - The future of desktop automation!**
|
||||
269
QUICK_REFERENCE.md
Normal file
269
QUICK_REFERENCE.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Desktop Control - Quick Reference Card
|
||||
|
||||
## 🚀 Instant Start
|
||||
|
||||
```python
|
||||
from skills.desktop_control import DesktopController
|
||||
|
||||
dc = DesktopController()
|
||||
```
|
||||
|
||||
## 🖱️ Mouse Control (Top 10)
|
||||
|
||||
```python
|
||||
# 1. Move mouse
|
||||
dc.move_mouse(500, 300, duration=0.5)
|
||||
|
||||
# 2. Click
|
||||
dc.click(500, 300) # Left click at position
|
||||
dc.click() # Click at current position
|
||||
|
||||
# 3. Right click
|
||||
dc.right_click(500, 300)
|
||||
|
||||
# 4. Double click
|
||||
dc.double_click(500, 300)
|
||||
|
||||
# 5. Drag & drop
|
||||
dc.drag(100, 100, 500, 500, duration=1.0)
|
||||
|
||||
# 6. Scroll
|
||||
dc.scroll(-5) # Scroll down 5 clicks
|
||||
|
||||
# 7. Get position
|
||||
x, y = dc.get_mouse_position()
|
||||
|
||||
# 8. Move relative
|
||||
dc.move_relative(100, 50) # Move 100px right, 50px down
|
||||
|
||||
# 9. Smooth movement
|
||||
dc.move_mouse(1000, 500, duration=1.0, smooth=True)
|
||||
|
||||
# 10. Middle click
|
||||
dc.middle_click()
|
||||
```
|
||||
|
||||
## ⌨️ Keyboard Control (Top 10)
|
||||
|
||||
```python
|
||||
# 1. Type text (instant)
|
||||
dc.type_text("Hello World")
|
||||
|
||||
# 2. Type text (human-like, 60 WPM)
|
||||
dc.type_text("Hello World", wpm=60)
|
||||
|
||||
# 3. Press key
|
||||
dc.press('enter')
|
||||
dc.press('tab')
|
||||
dc.press('escape')
|
||||
|
||||
# 4. Hotkeys (shortcuts)
|
||||
dc.hotkey('ctrl', 'c') # Copy
|
||||
dc.hotkey('ctrl', 'v') # Paste
|
||||
dc.hotkey('ctrl', 's') # Save
|
||||
dc.hotkey('win', 'r') # Run dialog
|
||||
dc.hotkey('alt', 'tab') # Switch window
|
||||
|
||||
# 5. Hold & release
|
||||
dc.key_down('shift')
|
||||
dc.type_text("hello") # Types "HELLO"
|
||||
dc.key_up('shift')
|
||||
|
||||
# 6. Arrow keys
|
||||
dc.press('up')
|
||||
dc.press('down')
|
||||
dc.press('left')
|
||||
dc.press('right')
|
||||
|
||||
# 7. Function keys
|
||||
dc.press('f5') # Refresh
|
||||
|
||||
# 8. Multiple presses
|
||||
dc.press('backspace', presses=5)
|
||||
|
||||
# 9. Special keys
|
||||
dc.press('home')
|
||||
dc.press('end')
|
||||
dc.press('pagedown')
|
||||
dc.press('delete')
|
||||
|
||||
# 10. Fast combo
|
||||
dc.hotkey('ctrl', 'alt', 'delete')
|
||||
```
|
||||
|
||||
## 📸 Screen Operations (Top 5)
|
||||
|
||||
```python
|
||||
# 1. Screenshot (full screen)
|
||||
img = dc.screenshot()
|
||||
dc.screenshot(filename="screen.png")
|
||||
|
||||
# 2. Screenshot (region)
|
||||
img = dc.screenshot(region=(100, 100, 800, 600))
|
||||
|
||||
# 3. Get pixel color
|
||||
r, g, b = dc.get_pixel_color(500, 300)
|
||||
|
||||
# 4. Find image on screen
|
||||
location = dc.find_on_screen("button.png")
|
||||
|
||||
# 5. Get screen size
|
||||
width, height = dc.get_screen_size()
|
||||
```
|
||||
|
||||
## 🪟 Window Management (Top 5)
|
||||
|
||||
```python
|
||||
# 1. Get all windows
|
||||
windows = dc.get_all_windows()
|
||||
|
||||
# 2. Activate window
|
||||
dc.activate_window("Chrome")
|
||||
|
||||
# 3. Get active window
|
||||
active = dc.get_active_window()
|
||||
|
||||
# 4. List windows
|
||||
for title in dc.get_all_windows():
|
||||
print(title)
|
||||
|
||||
# 5. Switch to app
|
||||
dc.activate_window("Visual Studio Code")
|
||||
```
|
||||
|
||||
## 📋 Clipboard (Top 2)
|
||||
|
||||
```python
|
||||
# 1. Copy to clipboard
|
||||
dc.copy_to_clipboard("Hello!")
|
||||
|
||||
# 2. Get from clipboard
|
||||
text = dc.get_from_clipboard()
|
||||
```
|
||||
|
||||
## 🔥 Real-World Examples
|
||||
|
||||
### Example 1: Auto-fill Form
|
||||
```python
|
||||
dc.click(300, 200) # Name field
|
||||
dc.type_text("John Doe", wpm=80)
|
||||
dc.press('tab')
|
||||
dc.type_text("john@email.com", wpm=80)
|
||||
dc.press('tab')
|
||||
dc.type_text("Password123", wpm=60)
|
||||
dc.press('enter')
|
||||
```
|
||||
|
||||
### Example 2: Copy-Paste Automation
|
||||
```python
|
||||
# Select all
|
||||
dc.hotkey('ctrl', 'a')
|
||||
# Copy
|
||||
dc.hotkey('ctrl', 'c')
|
||||
# Wait
|
||||
dc.pause(0.5)
|
||||
# Switch window
|
||||
dc.hotkey('alt', 'tab')
|
||||
# Paste
|
||||
dc.hotkey('ctrl', 'v')
|
||||
```
|
||||
|
||||
### Example 3: File Operations
|
||||
```python
|
||||
# Select multiple files
|
||||
dc.key_down('ctrl')
|
||||
dc.click(100, 200)
|
||||
dc.click(100, 250)
|
||||
dc.click(100, 300)
|
||||
dc.key_up('ctrl')
|
||||
# Copy
|
||||
dc.hotkey('ctrl', 'c')
|
||||
```
|
||||
|
||||
### Example 4: Screenshot Workflow
|
||||
```python
|
||||
# Take screenshot
|
||||
dc.screenshot(filename=f"capture_{time.time()}.png")
|
||||
# Open in Paint
|
||||
dc.hotkey('win', 'r')
|
||||
dc.pause(0.5)
|
||||
dc.type_text('mspaint')
|
||||
dc.press('enter')
|
||||
```
|
||||
|
||||
### Example 5: Search & Replace
|
||||
```python
|
||||
# Open Find & Replace
|
||||
dc.hotkey('ctrl', 'h')
|
||||
dc.pause(0.3)
|
||||
# Type find text
|
||||
dc.type_text("old_text")
|
||||
dc.press('tab')
|
||||
# Type replace text
|
||||
dc.type_text("new_text")
|
||||
# Replace all
|
||||
dc.hotkey('alt', 'a')
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
```python
|
||||
# With failsafe (move to corner to abort)
|
||||
dc = DesktopController(failsafe=True)
|
||||
|
||||
# With approval mode (ask before each action)
|
||||
dc = DesktopController(require_approval=True)
|
||||
|
||||
# Maximum speed (no safety checks)
|
||||
dc = DesktopController(failsafe=False)
|
||||
```
|
||||
|
||||
## 🛡️ Safety
|
||||
|
||||
```python
|
||||
# Check if safe to continue
|
||||
if dc.is_safe():
|
||||
dc.click(500, 500)
|
||||
|
||||
# Pause execution
|
||||
dc.pause(2.0) # Wait 2 seconds
|
||||
|
||||
# Emergency abort: Move mouse to any screen corner
|
||||
```
|
||||
|
||||
## 🎯 Pro Tips
|
||||
|
||||
1. **Instant typing**: `interval=0` or `wpm=None`
|
||||
2. **Human typing**: `wpm=60` (60 words/min)
|
||||
3. **Smooth mouse**: `duration=0.5, smooth=True`
|
||||
4. **Instant mouse**: `duration=0`
|
||||
5. **Wait for UI**: `dc.pause(0.5)` between actions
|
||||
6. **Failsafe**: Always enable for safety
|
||||
7. **Test first**: Use `demo.py` to test features
|
||||
8. **Coordinates**: Use `get_mouse_position()` to find them
|
||||
9. **Screenshots**: Capture before/after for verification
|
||||
10. **Hotkeys > Menus**: Faster and more reliable
|
||||
|
||||
## 📦 Dependencies
|
||||
|
||||
```bash
|
||||
pip install pyautogui pillow opencv-python pygetwindow pyperclip
|
||||
```
|
||||
|
||||
## 🚨 Common Issues
|
||||
|
||||
**Mouse not moving correctly?**
|
||||
- Check DPI scaling in Windows settings
|
||||
- Verify coordinates with `get_mouse_position()`
|
||||
|
||||
**Keyboard not working?**
|
||||
- Ensure target app has focus
|
||||
- Some apps block automation (games, secure apps)
|
||||
|
||||
**Failsafe triggering?**
|
||||
- Keep mouse away from screen corners
|
||||
- Disable if needed: `failsafe=False`
|
||||
|
||||
---
|
||||
|
||||
**Built for OpenClaw** 🦞 - Desktop automation made easy!
|
||||
623
SKILL.md
Normal file
623
SKILL.md
Normal file
@@ -0,0 +1,623 @@
|
||||
---
|
||||
description: "使用鼠标、键盘和屏幕控制的高级桌面自动化。"
|
||||
---
|
||||
|
||||
# Desktop Control Skill
|
||||
|
||||
**The most advanced desktop automation skill for OpenClaw.** Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
|
||||
|
||||
## 🎯 Features
|
||||
|
||||
### Mouse Control
|
||||
- ✅ **Absolute positioning** - Move to exact coordinates
|
||||
- ✅ **Relative movement** - Move from current position
|
||||
- ✅ **Smooth movement** - Natural, human-like mouse paths
|
||||
- ✅ **Click types** - Left, right, middle, double, triple clicks
|
||||
- ✅ **Drag & drop** - Drag from point A to point B
|
||||
- ✅ **Scroll** - Vertical and horizontal scrolling
|
||||
- ✅ **Position tracking** - Get current mouse coordinates
|
||||
|
||||
### Keyboard Control
|
||||
- ✅ **Text typing** - Fast, accurate text input
|
||||
- ✅ **Hotkeys** - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
|
||||
- ✅ **Special keys** - Enter, Tab, Escape, Arrow keys, F-keys
|
||||
- ✅ **Key combinations** - Multi-key press combinations
|
||||
- ✅ **Hold & release** - Manual key state control
|
||||
- ✅ **Typing speed** - Configurable WPM (instant to human-like)
|
||||
|
||||
### Screen Operations
|
||||
- ✅ **Screenshot** - Capture entire screen or regions
|
||||
- ✅ **Image recognition** - Find elements on screen (via OpenCV)
|
||||
- ✅ **Color detection** - Get pixel colors at coordinates
|
||||
- ✅ **Multi-monitor** - Support for multiple displays
|
||||
|
||||
### Window Management
|
||||
- ✅ **Window list** - Get all open windows
|
||||
- ✅ **Activate window** - Bring window to front
|
||||
- ✅ **Window info** - Get position, size, title
|
||||
- ✅ **Minimize/Maximize** - Control window states
|
||||
|
||||
### Safety Features
|
||||
- ✅ **Failsafe** - Move mouse to corner to abort
|
||||
- ✅ **Pause control** - Emergency stop mechanism
|
||||
- ✅ **Approval mode** - Require confirmation for actions
|
||||
- ✅ **Bounds checking** - Prevent out-of-screen operations
|
||||
- ✅ **Logging** - Track all automation actions
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
First, install required dependencies:
|
||||
|
||||
```bash
|
||||
pip install pyautogui pillow opencv-python pygetwindow
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from skills.desktop_control import DesktopController
|
||||
|
||||
# Initialize controller
|
||||
dc = DesktopController(failsafe=True)
|
||||
|
||||
# Mouse operations
|
||||
dc.move_mouse(500, 300) # Move to coordinates
|
||||
dc.click() # Left click at current position
|
||||
dc.click(100, 200, button="right") # Right click at position
|
||||
|
||||
# Keyboard operations
|
||||
dc.type_text("Hello from OpenClaw!")
|
||||
dc.hotkey("ctrl", "c") # Copy
|
||||
dc.press("enter")
|
||||
|
||||
# Screen operations
|
||||
screenshot = dc.screenshot()
|
||||
position = dc.get_mouse_position()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Complete API Reference
|
||||
|
||||
### Mouse Functions
|
||||
|
||||
#### `move_mouse(x, y, duration=0, smooth=True)`
|
||||
Move mouse to absolute screen coordinates.
|
||||
|
||||
**Parameters:**
|
||||
- `x` (int): X coordinate (pixels from left)
|
||||
- `y` (int): Y coordinate (pixels from top)
|
||||
- `duration` (float): Movement time in seconds (0 = instant, 0.5 = smooth)
|
||||
- `smooth` (bool): Use bezier curve for natural movement
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Instant movement
|
||||
dc.move_mouse(1000, 500)
|
||||
|
||||
# Smooth 1-second movement
|
||||
dc.move_mouse(1000, 500, duration=1.0)
|
||||
```
|
||||
|
||||
#### `move_relative(x_offset, y_offset, duration=0)`
|
||||
Move mouse relative to current position.
|
||||
|
||||
**Parameters:**
|
||||
- `x_offset` (int): Pixels to move horizontally (positive = right)
|
||||
- `y_offset` (int): Pixels to move vertically (positive = down)
|
||||
- `duration` (float): Movement time in seconds
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Move 100px right, 50px down
|
||||
dc.move_relative(100, 50, duration=0.3)
|
||||
```
|
||||
|
||||
#### `click(x=None, y=None, button='left', clicks=1, interval=0.1)`
|
||||
Perform mouse click.
|
||||
|
||||
**Parameters:**
|
||||
- `x, y` (int, optional): Coordinates to click (None = current position)
|
||||
- `button` (str): 'left', 'right', 'middle'
|
||||
- `clicks` (int): Number of clicks (1 = single, 2 = double)
|
||||
- `interval` (float): Delay between multiple clicks
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Simple left click
|
||||
dc.click()
|
||||
|
||||
# Double-click at specific position
|
||||
dc.click(500, 300, clicks=2)
|
||||
|
||||
# Right-click
|
||||
dc.click(button='right')
|
||||
```
|
||||
|
||||
#### `drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`
|
||||
Drag and drop operation.
|
||||
|
||||
**Parameters:**
|
||||
- `start_x, start_y` (int): Starting coordinates
|
||||
- `end_x, end_y` (int): Ending coordinates
|
||||
- `duration` (float): Drag duration
|
||||
- `button` (str): Mouse button to use
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Drag file from desktop to folder
|
||||
dc.drag(100, 100, 500, 500, duration=1.0)
|
||||
```
|
||||
|
||||
#### `scroll(clicks, direction='vertical', x=None, y=None)`
|
||||
Scroll mouse wheel.
|
||||
|
||||
**Parameters:**
|
||||
- `clicks` (int): Scroll amount (positive = up/left, negative = down/right)
|
||||
- `direction` (str): 'vertical' or 'horizontal'
|
||||
- `x, y` (int, optional): Position to scroll at
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Scroll down 5 clicks
|
||||
dc.scroll(-5)
|
||||
|
||||
# Scroll up 10 clicks
|
||||
dc.scroll(10)
|
||||
|
||||
# Horizontal scroll
|
||||
dc.scroll(5, direction='horizontal')
|
||||
```
|
||||
|
||||
#### `get_mouse_position()`
|
||||
Get current mouse coordinates.
|
||||
|
||||
**Returns:** `(x, y)` tuple
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
x, y = dc.get_mouse_position()
|
||||
print(f"Mouse is at: {x}, {y}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Keyboard Functions
|
||||
|
||||
#### `type_text(text, interval=0, wpm=None)`
|
||||
Type text with configurable speed.
|
||||
|
||||
**Parameters:**
|
||||
- `text` (str): Text to type
|
||||
- `interval` (float): Delay between keystrokes (0 = instant)
|
||||
- `wpm` (int, optional): Words per minute (overrides interval)
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Instant typing
|
||||
dc.type_text("Hello World")
|
||||
|
||||
# Human-like typing at 60 WPM
|
||||
dc.type_text("Hello World", wpm=60)
|
||||
|
||||
# Slow typing with 0.1s between keys
|
||||
dc.type_text("Hello World", interval=0.1)
|
||||
```
|
||||
|
||||
#### `press(key, presses=1, interval=0.1)`
|
||||
Press and release a key.
|
||||
|
||||
**Parameters:**
|
||||
- `key` (str): Key name (see Key Names section)
|
||||
- `presses` (int): Number of times to press
|
||||
- `interval` (float): Delay between presses
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Press Enter
|
||||
dc.press('enter')
|
||||
|
||||
# Press Space 3 times
|
||||
dc.press('space', presses=3)
|
||||
|
||||
# Press Down arrow
|
||||
dc.press('down')
|
||||
```
|
||||
|
||||
#### `hotkey(*keys, interval=0.05)`
|
||||
Execute keyboard shortcut.
|
||||
|
||||
**Parameters:**
|
||||
- `*keys` (str): Keys to press together
|
||||
- `interval` (float): Delay between key presses
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Copy (Ctrl+C)
|
||||
dc.hotkey('ctrl', 'c')
|
||||
|
||||
# Paste (Ctrl+V)
|
||||
dc.hotkey('ctrl', 'v')
|
||||
|
||||
# Open Run dialog (Win+R)
|
||||
dc.hotkey('win', 'r')
|
||||
|
||||
# Save (Ctrl+S)
|
||||
dc.hotkey('ctrl', 's')
|
||||
|
||||
# Select All (Ctrl+A)
|
||||
dc.hotkey('ctrl', 'a')
|
||||
```
|
||||
|
||||
#### `key_down(key)` / `key_up(key)`
|
||||
Manually control key state.
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Hold Shift
|
||||
dc.key_down('shift')
|
||||
dc.type_text("hello") # Types "HELLO"
|
||||
dc.key_up('shift')
|
||||
|
||||
# Hold Ctrl and click (for multi-select)
|
||||
dc.key_down('ctrl')
|
||||
dc.click(100, 100)
|
||||
dc.click(200, 100)
|
||||
dc.key_up('ctrl')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Screen Functions
|
||||
|
||||
#### `screenshot(region=None, filename=None)`
|
||||
Capture screen or region.
|
||||
|
||||
**Parameters:**
|
||||
- `region` (tuple, optional): (left, top, width, height) for partial capture
|
||||
- `filename` (str, optional): Path to save image
|
||||
|
||||
**Returns:** PIL Image object
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Full screen
|
||||
img = dc.screenshot()
|
||||
|
||||
# Save to file
|
||||
dc.screenshot(filename="screenshot.png")
|
||||
|
||||
# Capture specific region
|
||||
img = dc.screenshot(region=(100, 100, 500, 300))
|
||||
```
|
||||
|
||||
#### `get_pixel_color(x, y)`
|
||||
Get color of pixel at coordinates.
|
||||
|
||||
**Returns:** RGB tuple `(r, g, b)`
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
r, g, b = dc.get_pixel_color(500, 300)
|
||||
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
|
||||
```
|
||||
|
||||
#### `find_on_screen(image_path, confidence=0.8)`
|
||||
Find image on screen (requires OpenCV).
|
||||
|
||||
**Parameters:**
|
||||
- `image_path` (str): Path to template image
|
||||
- `confidence` (float): Match threshold (0-1)
|
||||
|
||||
**Returns:** `(x, y, width, height)` or None
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Find button on screen
|
||||
location = dc.find_on_screen("button.png")
|
||||
if location:
|
||||
x, y, w, h = location
|
||||
# Click center of found image
|
||||
dc.click(x + w//2, y + h//2)
|
||||
```
|
||||
|
||||
#### `get_screen_size()`
|
||||
Get screen resolution.
|
||||
|
||||
**Returns:** `(width, height)` tuple
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
width, height = dc.get_screen_size()
|
||||
print(f"Screen: {width}x{height}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Window Functions
|
||||
|
||||
#### `get_all_windows()`
|
||||
List all open windows.
|
||||
|
||||
**Returns:** List of window titles
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
windows = dc.get_all_windows()
|
||||
for title in windows:
|
||||
print(f"Window: {title}")
|
||||
```
|
||||
|
||||
#### `activate_window(title_substring)`
|
||||
Bring window to front by title.
|
||||
|
||||
**Parameters:**
|
||||
- `title_substring` (str): Part of window title to match
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Activate Chrome
|
||||
dc.activate_window("Chrome")
|
||||
|
||||
# Activate VS Code
|
||||
dc.activate_window("Visual Studio Code")
|
||||
```
|
||||
|
||||
#### `get_active_window()`
|
||||
Get currently focused window.
|
||||
|
||||
**Returns:** Window title (str)
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
active = dc.get_active_window()
|
||||
print(f"Active window: {active}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Clipboard Functions
|
||||
|
||||
#### `copy_to_clipboard(text)`
|
||||
Copy text to clipboard.
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
dc.copy_to_clipboard("Hello from OpenClaw!")
|
||||
```
|
||||
|
||||
#### `get_from_clipboard()`
|
||||
Get text from clipboard.
|
||||
|
||||
**Returns:** str
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
text = dc.get_from_clipboard()
|
||||
print(f"Clipboard: {text}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⌨️ Key Names Reference
|
||||
|
||||
### Alphabet Keys
|
||||
`'a'` through `'z'`
|
||||
|
||||
### Number Keys
|
||||
`'0'` through `'9'`
|
||||
|
||||
### Function Keys
|
||||
`'f1'` through `'f24'`
|
||||
|
||||
### Special Keys
|
||||
- `'enter'` / `'return'`
|
||||
- `'esc'` / `'escape'`
|
||||
- `'space'` / `'spacebar'`
|
||||
- `'tab'`
|
||||
- `'backspace'`
|
||||
- `'delete'` / `'del'`
|
||||
- `'insert'`
|
||||
- `'home'`
|
||||
- `'end'`
|
||||
- `'pageup'` / `'pgup'`
|
||||
- `'pagedown'` / `'pgdn'`
|
||||
|
||||
### Arrow Keys
|
||||
- `'up'` / `'down'` / `'left'` / `'right'`
|
||||
|
||||
### Modifier Keys
|
||||
- `'ctrl'` / `'control'`
|
||||
- `'shift'`
|
||||
- `'alt'`
|
||||
- `'win'` / `'winleft'` / `'winright'`
|
||||
- `'cmd'` / `'command'` (Mac)
|
||||
|
||||
### Lock Keys
|
||||
- `'capslock'`
|
||||
- `'numlock'`
|
||||
- `'scrolllock'`
|
||||
|
||||
### Punctuation
|
||||
- `'.'` / `','` / `'?'` / `'!'` / `';'` / `':'`
|
||||
- `'['` / `']'` / `'{'` / `'}'`
|
||||
- `'('` / `')'`
|
||||
- `'+'` / `'-'` / `'*'` / `'/'` / `'='`
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ Safety Features
|
||||
|
||||
### Failsafe Mode
|
||||
|
||||
Move mouse to **any corner** of the screen to abort all automation.
|
||||
|
||||
```python
|
||||
# Enable failsafe (enabled by default)
|
||||
dc = DesktopController(failsafe=True)
|
||||
```
|
||||
|
||||
### Pause Control
|
||||
|
||||
```python
|
||||
# Pause all automation for 2 seconds
|
||||
dc.pause(2.0)
|
||||
|
||||
# Check if automation is safe to proceed
|
||||
if dc.is_safe():
|
||||
dc.click(500, 500)
|
||||
```
|
||||
|
||||
### Approval Mode
|
||||
|
||||
Require user confirmation before actions:
|
||||
|
||||
```python
|
||||
dc = DesktopController(require_approval=True)
|
||||
|
||||
# This will ask for confirmation
|
||||
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 Advanced Examples
|
||||
|
||||
### Example 1: Automated Form Filling
|
||||
|
||||
```python
|
||||
dc = DesktopController()
|
||||
|
||||
# Click name field
|
||||
dc.click(300, 200)
|
||||
dc.type_text("John Doe", wpm=80)
|
||||
|
||||
# Tab to next field
|
||||
dc.press('tab')
|
||||
dc.type_text("john@example.com", wpm=80)
|
||||
|
||||
# Tab to password
|
||||
dc.press('tab')
|
||||
dc.type_text("SecurePassword123", wpm=60)
|
||||
|
||||
# Submit form
|
||||
dc.press('enter')
|
||||
```
|
||||
|
||||
### Example 2: Screenshot Region and Save
|
||||
|
||||
```python
|
||||
# Capture specific area
|
||||
region = (100, 100, 800, 600) # left, top, width, height
|
||||
img = dc.screenshot(region=region)
|
||||
|
||||
# Save with timestamp
|
||||
import datetime
|
||||
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
img.save(f"capture_{timestamp}.png")
|
||||
```
|
||||
|
||||
### Example 3: Multi-File Selection
|
||||
|
||||
```python
|
||||
# Hold Ctrl and click multiple files
|
||||
dc.key_down('ctrl')
|
||||
dc.click(100, 200) # First file
|
||||
dc.click(100, 250) # Second file
|
||||
dc.click(100, 300) # Third file
|
||||
dc.key_up('ctrl')
|
||||
|
||||
# Copy selected files
|
||||
dc.hotkey('ctrl', 'c')
|
||||
```
|
||||
|
||||
### Example 4: Window Automation
|
||||
|
||||
```python
|
||||
# Activate Calculator
|
||||
dc.activate_window("Calculator")
|
||||
time.sleep(0.5)
|
||||
|
||||
# Type calculation
|
||||
dc.type_text("5+3=", interval=0.2)
|
||||
time.sleep(0.5)
|
||||
|
||||
# Take screenshot of result
|
||||
dc.screenshot(filename="calculation_result.png")
|
||||
```
|
||||
|
||||
### Example 5: Drag & Drop File
|
||||
|
||||
```python
|
||||
# Drag file from source to destination
|
||||
dc.drag(
|
||||
start_x=200, start_y=300, # File location
|
||||
end_x=800, end_y=500, # Folder location
|
||||
duration=1.0 # Smooth 1-second drag
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Performance Tips
|
||||
|
||||
1. **Use instant movements** for speed: `duration=0`
|
||||
2. **Batch operations** instead of individual calls
|
||||
3. **Cache screen positions** instead of recalculating
|
||||
4. **Disable failsafe** for maximum performance (use with caution)
|
||||
5. **Use hotkeys** instead of menu navigation
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Important Notes
|
||||
|
||||
- **Screen coordinates** start at (0, 0) in top-left corner
|
||||
- **Multi-monitor setups** may have negative coordinates for secondary displays
|
||||
- **Windows DPI scaling** may affect coordinate accuracy
|
||||
- **Failsafe corners** are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)
|
||||
- **Some applications** may block simulated input (games, secure apps)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Mouse not moving to correct position
|
||||
- Check DPI scaling settings
|
||||
- Verify screen resolution matches expectations
|
||||
- Use `get_screen_size()` to confirm dimensions
|
||||
|
||||
### Keyboard input not working
|
||||
- Ensure target application has focus
|
||||
- Some apps require admin privileges
|
||||
- Try increasing `interval` for reliability
|
||||
|
||||
### Failsafe triggering accidentally
|
||||
- Increase screen border tolerance
|
||||
- Move mouse away from corners during normal use
|
||||
- Disable if needed: `DesktopController(failsafe=False)`
|
||||
|
||||
### Permission errors
|
||||
- Run Python with administrator privileges for some operations
|
||||
- Some secure applications block automation
|
||||
|
||||
---
|
||||
|
||||
## 📦 Dependencies
|
||||
|
||||
- **PyAutoGUI** - Core automation engine
|
||||
- **Pillow** - Image processing
|
||||
- **OpenCV** (optional) - Image recognition
|
||||
- **PyGetWindow** - Window management
|
||||
|
||||
Install all:
|
||||
```bash
|
||||
pip install pyautogui pillow opencv-python pygetwindow
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Built for OpenClaw** - The ultimate desktop automation companion 🦞
|
||||
522
__init__.py
Normal file
522
__init__.py
Normal file
@@ -0,0 +1,522 @@
|
||||
"""
|
||||
Desktop Control - Advanced Mouse, Keyboard, and Screen Automation
|
||||
The best ever possible responsive desktop control for OpenClaw
|
||||
"""
|
||||
|
||||
import pyautogui
|
||||
import time
|
||||
import sys
|
||||
from typing import Tuple, Optional, List, Union
|
||||
from pathlib import Path
|
||||
import logging
|
||||
|
||||
# Configure PyAutoGUI
|
||||
pyautogui.MINIMUM_DURATION = 0 # Allow instant movements
|
||||
pyautogui.MINIMUM_SLEEP = 0 # No forced delays
|
||||
pyautogui.PAUSE = 0 # No pause between function calls
|
||||
|
||||
# Setup logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DesktopController:
|
||||
"""
|
||||
Advanced desktop automation controller with mouse, keyboard, and screen operations.
|
||||
Designed for maximum responsiveness and reliability.
|
||||
"""
|
||||
|
||||
def __init__(self, failsafe: bool = True, require_approval: bool = False):
|
||||
"""
|
||||
Initialize desktop controller.
|
||||
|
||||
Args:
|
||||
failsafe: Enable failsafe (move mouse to corner to abort)
|
||||
require_approval: Require user confirmation for actions
|
||||
"""
|
||||
self.failsafe = failsafe
|
||||
self.require_approval = require_approval
|
||||
pyautogui.FAILSAFE = failsafe
|
||||
|
||||
# Get screen info
|
||||
self.screen_width, self.screen_height = pyautogui.size()
|
||||
logger.info(f"Desktop Controller initialized. Screen: {self.screen_width}x{self.screen_height}")
|
||||
logger.info(f"Failsafe: {failsafe}, Require Approval: {require_approval}")
|
||||
|
||||
# ========== MOUSE OPERATIONS ==========
|
||||
|
||||
def move_mouse(self, x: int, y: int, duration: float = 0, smooth: bool = True) -> None:
|
||||
"""
|
||||
Move mouse to absolute screen coordinates.
|
||||
|
||||
Args:
|
||||
x: X coordinate (pixels from left)
|
||||
y: Y coordinate (pixels from top)
|
||||
duration: Movement time in seconds (0 = instant)
|
||||
smooth: Use smooth movement (cubic bezier)
|
||||
"""
|
||||
if self._check_approval(f"move mouse to ({x}, {y})"):
|
||||
if smooth and duration > 0:
|
||||
pyautogui.moveTo(x, y, duration=duration, tween=pyautogui.easeInOutQuad)
|
||||
else:
|
||||
pyautogui.moveTo(x, y, duration=duration)
|
||||
logger.debug(f"Moved mouse to ({x}, {y}) in {duration}s")
|
||||
|
||||
def move_relative(self, x_offset: int, y_offset: int, duration: float = 0) -> None:
|
||||
"""
|
||||
Move mouse relative to current position.
|
||||
|
||||
Args:
|
||||
x_offset: Pixels to move horizontally (+ = right, - = left)
|
||||
y_offset: Pixels to move vertically (+ = down, - = up)
|
||||
duration: Movement time in seconds
|
||||
"""
|
||||
if self._check_approval(f"move mouse relative ({x_offset}, {y_offset})"):
|
||||
pyautogui.move(x_offset, y_offset, duration=duration)
|
||||
logger.debug(f"Moved mouse relative ({x_offset}, {y_offset})")
|
||||
|
||||
def click(self, x: Optional[int] = None, y: Optional[int] = None,
|
||||
button: str = 'left', clicks: int = 1, interval: float = 0.1) -> None:
|
||||
"""
|
||||
Perform mouse click.
|
||||
|
||||
Args:
|
||||
x, y: Coordinates to click (None = current position)
|
||||
button: 'left', 'right', 'middle'
|
||||
clicks: Number of clicks (1 = single, 2 = double, etc.)
|
||||
interval: Delay between multiple clicks
|
||||
"""
|
||||
position_str = f"at ({x}, {y})" if x is not None else "at current position"
|
||||
if self._check_approval(f"{button} click {position_str}"):
|
||||
pyautogui.click(x=x, y=y, clicks=clicks, interval=interval, button=button)
|
||||
logger.info(f"{button.capitalize()} click {position_str} (x{clicks})")
|
||||
|
||||
def double_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""Convenience method for double-click."""
|
||||
self.click(x, y, clicks=2)
|
||||
|
||||
def right_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""Convenience method for right-click."""
|
||||
self.click(x, y, button='right')
|
||||
|
||||
def middle_click(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""Convenience method for middle-click."""
|
||||
self.click(x, y, button='middle')
|
||||
|
||||
def drag(self, start_x: int, start_y: int, end_x: int, end_y: int,
|
||||
duration: float = 0.5, button: str = 'left') -> None:
|
||||
"""
|
||||
Drag and drop operation.
|
||||
|
||||
Args:
|
||||
start_x, start_y: Starting coordinates
|
||||
end_x, end_y: Ending coordinates
|
||||
duration: Drag duration in seconds
|
||||
button: Mouse button to use ('left', 'right', 'middle')
|
||||
"""
|
||||
if self._check_approval(f"drag from ({start_x}, {start_y}) to ({end_x}, {end_y})"):
|
||||
pyautogui.moveTo(start_x, start_y)
|
||||
time.sleep(0.05) # Small delay to ensure position
|
||||
pyautogui.drag(end_x - start_x, end_y - start_y, duration=duration, button=button)
|
||||
logger.info(f"Dragged from ({start_x}, {start_y}) to ({end_x}, {end_y})")
|
||||
|
||||
def scroll(self, clicks: int, direction: str = 'vertical',
|
||||
x: Optional[int] = None, y: Optional[int] = None) -> None:
|
||||
"""
|
||||
Scroll mouse wheel.
|
||||
|
||||
Args:
|
||||
clicks: Scroll amount (+ = up/left, - = down/right)
|
||||
direction: 'vertical' or 'horizontal'
|
||||
x, y: Position to scroll at (None = current position)
|
||||
"""
|
||||
if x is not None and y is not None:
|
||||
pyautogui.moveTo(x, y)
|
||||
|
||||
if direction == 'vertical':
|
||||
pyautogui.scroll(clicks)
|
||||
else:
|
||||
pyautogui.hscroll(clicks)
|
||||
logger.debug(f"Scrolled {direction} {clicks} clicks")
|
||||
|
||||
def get_mouse_position(self) -> Tuple[int, int]:
|
||||
"""
|
||||
Get current mouse coordinates.
|
||||
|
||||
Returns:
|
||||
(x, y) tuple
|
||||
"""
|
||||
pos = pyautogui.position()
|
||||
return (pos.x, pos.y)
|
||||
|
||||
# ========== KEYBOARD OPERATIONS ==========
|
||||
|
||||
def type_text(self, text: str, interval: float = 0, wpm: Optional[int] = None) -> None:
|
||||
"""
|
||||
Type text with configurable speed.
|
||||
|
||||
Args:
|
||||
text: Text to type
|
||||
interval: Delay between keystrokes (0 = instant)
|
||||
wpm: Words per minute (overrides interval, typical human: 40-80 WPM)
|
||||
"""
|
||||
if wpm is not None:
|
||||
# Convert WPM to interval (assuming avg 5 chars per word)
|
||||
chars_per_second = (wpm * 5) / 60
|
||||
interval = 1.0 / chars_per_second
|
||||
|
||||
if self._check_approval(f"type text: '{text[:50]}...'"):
|
||||
pyautogui.write(text, interval=interval)
|
||||
logger.info(f"Typed text: '{text[:50]}{'...' if len(text) > 50 else ''}' (interval={interval:.3f}s)")
|
||||
|
||||
def press(self, key: str, presses: int = 1, interval: float = 0.1) -> None:
|
||||
"""
|
||||
Press and release a key.
|
||||
|
||||
Args:
|
||||
key: Key name (e.g., 'enter', 'space', 'a', 'f1')
|
||||
presses: Number of times to press
|
||||
interval: Delay between presses
|
||||
"""
|
||||
if self._check_approval(f"press '{key}' {presses}x"):
|
||||
pyautogui.press(key, presses=presses, interval=interval)
|
||||
logger.info(f"Pressed '{key}' {presses}x")
|
||||
|
||||
def hotkey(self, *keys, interval: float = 0.05) -> None:
|
||||
"""
|
||||
Execute keyboard shortcut (e.g., Ctrl+C, Alt+Tab).
|
||||
|
||||
Args:
|
||||
*keys: Keys to press together (e.g., 'ctrl', 'c')
|
||||
interval: Delay between key presses
|
||||
"""
|
||||
keys_str = '+'.join(keys)
|
||||
if self._check_approval(f"hotkey: {keys_str}"):
|
||||
pyautogui.hotkey(*keys, interval=interval)
|
||||
logger.info(f"Executed hotkey: {keys_str}")
|
||||
|
||||
def key_down(self, key: str) -> None:
|
||||
"""Press and hold a key without releasing."""
|
||||
pyautogui.keyDown(key)
|
||||
logger.debug(f"Key down: '{key}'")
|
||||
|
||||
def key_up(self, key: str) -> None:
|
||||
"""Release a held key."""
|
||||
pyautogui.keyUp(key)
|
||||
logger.debug(f"Key up: '{key}'")
|
||||
|
||||
# ========== SCREEN OPERATIONS ==========
|
||||
|
||||
def screenshot(self, region: Optional[Tuple[int, int, int, int]] = None,
|
||||
filename: Optional[str] = None):
|
||||
"""
|
||||
Capture screen or region.
|
||||
|
||||
Args:
|
||||
region: (left, top, width, height) for partial capture
|
||||
filename: Path to save image (None = return PIL Image)
|
||||
|
||||
Returns:
|
||||
PIL Image object (if filename is None)
|
||||
"""
|
||||
img = pyautogui.screenshot(region=region)
|
||||
|
||||
if filename:
|
||||
img.save(filename)
|
||||
logger.info(f"Screenshot saved to: {filename}")
|
||||
else:
|
||||
logger.debug(f"Screenshot captured (region={region})")
|
||||
return img
|
||||
|
||||
def get_pixel_color(self, x: int, y: int) -> Tuple[int, int, int]:
|
||||
"""
|
||||
Get RGB color of pixel at coordinates.
|
||||
|
||||
Args:
|
||||
x, y: Screen coordinates
|
||||
|
||||
Returns:
|
||||
(r, g, b) tuple
|
||||
"""
|
||||
color = pyautogui.pixel(x, y)
|
||||
return color
|
||||
|
||||
def find_on_screen(self, image_path: str, confidence: float = 0.8,
|
||||
region: Optional[Tuple[int, int, int, int]] = None):
|
||||
"""
|
||||
Find image on screen using template matching.
|
||||
Requires OpenCV (opencv-python).
|
||||
|
||||
Args:
|
||||
image_path: Path to template image
|
||||
confidence: Match threshold 0-1 (0.8 = 80% match)
|
||||
region: Search region (left, top, width, height)
|
||||
|
||||
Returns:
|
||||
(x, y, width, height) of match, or None if not found
|
||||
"""
|
||||
try:
|
||||
location = pyautogui.locateOnScreen(image_path, confidence=confidence, region=region)
|
||||
if location:
|
||||
logger.info(f"Found '{image_path}' at {location}")
|
||||
return location
|
||||
else:
|
||||
logger.debug(f"'{image_path}' not found on screen")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Error finding image: {e}")
|
||||
return None
|
||||
|
||||
def get_screen_size(self) -> Tuple[int, int]:
|
||||
"""
|
||||
Get screen resolution.
|
||||
|
||||
Returns:
|
||||
(width, height) tuple
|
||||
"""
|
||||
return (self.screen_width, self.screen_height)
|
||||
|
||||
# ========== WINDOW OPERATIONS ==========
|
||||
|
||||
def get_all_windows(self) -> List[str]:
|
||||
"""
|
||||
Get list of all open window titles.
|
||||
|
||||
Returns:
|
||||
List of window title strings
|
||||
"""
|
||||
try:
|
||||
import pygetwindow as gw
|
||||
windows = gw.getAllTitles()
|
||||
# Filter out empty titles
|
||||
windows = [w for w in windows if w.strip()]
|
||||
return windows
|
||||
except ImportError:
|
||||
logger.error("pygetwindow not installed. Run: pip install pygetwindow")
|
||||
return []
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting windows: {e}")
|
||||
return []
|
||||
|
||||
def activate_window(self, title_substring: str) -> bool:
|
||||
"""
|
||||
Bring window to front by title (partial match).
|
||||
|
||||
Args:
|
||||
title_substring: Part of window title to match
|
||||
|
||||
Returns:
|
||||
True if window was activated, False otherwise
|
||||
"""
|
||||
try:
|
||||
import pygetwindow as gw
|
||||
windows = gw.getWindowsWithTitle(title_substring)
|
||||
if windows:
|
||||
windows[0].activate()
|
||||
logger.info(f"Activated window: '{windows[0].title}'")
|
||||
return True
|
||||
else:
|
||||
logger.warning(f"No window found with title containing: '{title_substring}'")
|
||||
return False
|
||||
except ImportError:
|
||||
logger.error("pygetwindow not installed")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"Error activating window: {e}")
|
||||
return False
|
||||
|
||||
def get_active_window(self) -> Optional[str]:
|
||||
"""
|
||||
Get title of currently focused window.
|
||||
|
||||
Returns:
|
||||
Window title string, or None if error
|
||||
"""
|
||||
try:
|
||||
import pygetwindow as gw
|
||||
active = gw.getActiveWindow()
|
||||
return active.title if active else None
|
||||
except ImportError:
|
||||
logger.error("pygetwindow not installed")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting active window: {e}")
|
||||
return None
|
||||
|
||||
# ========== CLIPBOARD OPERATIONS ==========
|
||||
|
||||
def copy_to_clipboard(self, text: str) -> None:
|
||||
"""
|
||||
Copy text to clipboard.
|
||||
|
||||
Args:
|
||||
text: Text to copy
|
||||
"""
|
||||
try:
|
||||
import pyperclip
|
||||
pyperclip.copy(text)
|
||||
logger.info(f"Copied to clipboard: '{text[:50]}...'")
|
||||
except ImportError:
|
||||
logger.error("pyperclip not installed. Run: pip install pyperclip")
|
||||
except Exception as e:
|
||||
logger.error(f"Error copying to clipboard: {e}")
|
||||
|
||||
def get_from_clipboard(self) -> Optional[str]:
|
||||
"""
|
||||
Get text from clipboard.
|
||||
|
||||
Returns:
|
||||
Clipboard text, or None if error
|
||||
"""
|
||||
try:
|
||||
import pyperclip
|
||||
text = pyperclip.paste()
|
||||
logger.debug(f"Got from clipboard: '{text[:50]}...'")
|
||||
return text
|
||||
except ImportError:
|
||||
logger.error("pyperclip not installed. Run: pip install pyperclip")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting clipboard: {e}")
|
||||
return None
|
||||
|
||||
# ========== UTILITY METHODS ==========
|
||||
|
||||
def pause(self, seconds: float) -> None:
|
||||
"""
|
||||
Pause automation for specified duration.
|
||||
|
||||
Args:
|
||||
seconds: Time to pause
|
||||
"""
|
||||
logger.info(f"Pausing for {seconds}s...")
|
||||
time.sleep(seconds)
|
||||
|
||||
def is_safe(self) -> bool:
|
||||
"""
|
||||
Check if it's safe to continue automation.
|
||||
Returns False if mouse is in a corner (failsafe position).
|
||||
|
||||
Returns:
|
||||
True if safe to continue
|
||||
"""
|
||||
if not self.failsafe:
|
||||
return True
|
||||
|
||||
x, y = self.get_mouse_position()
|
||||
corner_tolerance = 5
|
||||
|
||||
# Check corners
|
||||
corners = [
|
||||
(0, 0), # Top-left
|
||||
(self.screen_width - 1, 0), # Top-right
|
||||
(0, self.screen_height - 1), # Bottom-left
|
||||
(self.screen_width - 1, self.screen_height - 1) # Bottom-right
|
||||
]
|
||||
|
||||
for cx, cy in corners:
|
||||
if abs(x - cx) <= corner_tolerance and abs(y - cy) <= corner_tolerance:
|
||||
logger.warning(f"Mouse in corner ({x}, {y}) - FAILSAFE TRIGGERED")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _check_approval(self, action: str) -> bool:
|
||||
"""
|
||||
Check if user approves action (if approval mode is enabled).
|
||||
|
||||
Args:
|
||||
action: Description of action
|
||||
|
||||
Returns:
|
||||
True if approved (or approval not required)
|
||||
"""
|
||||
if not self.require_approval:
|
||||
return True
|
||||
|
||||
response = input(f"Allow: {action}? [y/n]: ").strip().lower()
|
||||
approved = response in ['y', 'yes']
|
||||
|
||||
if not approved:
|
||||
logger.warning(f"Action declined: {action}")
|
||||
|
||||
return approved
|
||||
|
||||
# ========== CONVENIENCE METHODS ==========
|
||||
|
||||
def alert(self, text: str = '', title: str = 'Alert', button: str = 'OK') -> None:
|
||||
"""Show alert dialog box."""
|
||||
pyautogui.alert(text=text, title=title, button=button)
|
||||
|
||||
def confirm(self, text: str = '', title: str = 'Confirm', buttons: List[str] = None) -> str:
|
||||
"""Show confirmation dialog with buttons."""
|
||||
if buttons is None:
|
||||
buttons = ['OK', 'Cancel']
|
||||
return pyautogui.confirm(text=text, title=title, buttons=buttons)
|
||||
|
||||
def prompt(self, text: str = '', title: str = 'Input', default: str = '') -> Optional[str]:
|
||||
"""Show input prompt dialog."""
|
||||
return pyautogui.prompt(text=text, title=title, default=default)
|
||||
|
||||
|
||||
# ========== QUICK ACCESS FUNCTIONS ==========
|
||||
|
||||
# Global controller instance for quick access
|
||||
_controller = None
|
||||
|
||||
def get_controller(**kwargs) -> DesktopController:
|
||||
"""Get or create global controller instance."""
|
||||
global _controller
|
||||
if _controller is None:
|
||||
_controller = DesktopController(**kwargs)
|
||||
return _controller
|
||||
|
||||
|
||||
# Convenience function exports
|
||||
def move_mouse(x: int, y: int, duration: float = 0) -> None:
|
||||
"""Quick mouse move."""
|
||||
get_controller().move_mouse(x, y, duration)
|
||||
|
||||
def click(x: Optional[int] = None, y: Optional[int] = None, button: str = 'left') -> None:
|
||||
"""Quick click."""
|
||||
get_controller().click(x, y, button=button)
|
||||
|
||||
def type_text(text: str, wpm: Optional[int] = None) -> None:
|
||||
"""Quick text typing."""
|
||||
get_controller().type_text(text, wpm=wpm)
|
||||
|
||||
def hotkey(*keys) -> None:
|
||||
"""Quick hotkey."""
|
||||
get_controller().hotkey(*keys)
|
||||
|
||||
def screenshot(filename: Optional[str] = None):
|
||||
"""Quick screenshot."""
|
||||
return get_controller().screenshot(filename=filename)
|
||||
|
||||
|
||||
# ========== DEMONSTRATION ==========
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🖱️ Desktop Control Skill - Test Mode")
|
||||
print("=" * 50)
|
||||
|
||||
# Initialize controller
|
||||
dc = DesktopController(failsafe=True)
|
||||
|
||||
# Display info
|
||||
print(f"\n📺 Screen Size: {dc.get_screen_size()}")
|
||||
print(f"🖱️ Current Mouse Position: {dc.get_mouse_position()}")
|
||||
|
||||
# Test window operations
|
||||
print(f"\n🪟 Active Window: {dc.get_active_window()}")
|
||||
|
||||
windows = dc.get_all_windows()
|
||||
print(f"\n📋 Open Windows ({len(windows)}):")
|
||||
for i, title in enumerate(windows[:10], 1): # Show first 10
|
||||
print(f" {i}. {title}")
|
||||
|
||||
print("\n✅ Desktop Control ready!")
|
||||
print("⚠️ Move mouse to any corner to trigger failsafe")
|
||||
|
||||
# Keep running to allow testing
|
||||
print("\nController is ready. Import this module to use it in your OpenClaw skills!")
|
||||
6
_meta.json
Normal file
6
_meta.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"ownerId": "kn7ag28ra4hhta8bx2k2j1kpv180kqbk",
|
||||
"slug": "desktop-control",
|
||||
"version": "1.0.0",
|
||||
"publishedAt": 1770255200863
|
||||
}
|
||||
613
ai_agent.py
Normal file
613
ai_agent.py
Normal file
@@ -0,0 +1,613 @@
|
||||
"""
|
||||
AI Desktop Agent - Cognitive Desktop Automation
|
||||
Combines vision, reasoning, and control for autonomous task execution
|
||||
"""
|
||||
|
||||
import base64
|
||||
import time
|
||||
from typing import Dict, List, Optional, Any, Callable
|
||||
from pathlib import Path
|
||||
import logging
|
||||
|
||||
from desktop_control import DesktopController
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class AIDesktopAgent:
|
||||
"""
|
||||
Intelligent desktop agent that combines computer vision, LLM reasoning,
|
||||
and desktop control for autonomous task execution.
|
||||
|
||||
Can understand screen content, plan actions, and execute complex workflows.
|
||||
"""
|
||||
|
||||
def __init__(self, llm_client=None, failsafe: bool = True):
|
||||
"""
|
||||
Initialize AI Desktop Agent.
|
||||
|
||||
Args:
|
||||
llm_client: OpenClaw LLM client for reasoning (optional, will try to auto-detect)
|
||||
failsafe: Enable failsafe mode
|
||||
"""
|
||||
self.dc = DesktopController(failsafe=failsafe)
|
||||
self.llm_client = llm_client
|
||||
self.screen_width, self.screen_height = self.dc.get_screen_size()
|
||||
|
||||
# Action history for learning
|
||||
self.action_history = []
|
||||
|
||||
# Application knowledge base
|
||||
self.app_knowledge = self._load_app_knowledge()
|
||||
|
||||
logger.info("AI Desktop Agent initialized")
|
||||
|
||||
def _load_app_knowledge(self) -> Dict[str, Dict]:
|
||||
"""
|
||||
Load application-specific knowledge.
|
||||
This can be extended with learned patterns.
|
||||
"""
|
||||
return {
|
||||
"mspaint": {
|
||||
"name": "Microsoft Paint",
|
||||
"launch_command": "mspaint",
|
||||
"common_actions": {
|
||||
"select_pencil": {"menu": "Tools", "position": "toolbar_left"},
|
||||
"select_brush": {"menu": "Tools", "position": "toolbar"},
|
||||
"select_color": {"menu": "Colors", "action": "click_palette"},
|
||||
"draw_line": {"action": "drag", "tool_required": "line"},
|
||||
}
|
||||
},
|
||||
"notepad": {
|
||||
"name": "Notepad",
|
||||
"launch_command": "notepad",
|
||||
"common_actions": {
|
||||
"type_text": {"action": "type"},
|
||||
"save": {"hotkey": ["ctrl", "s"]},
|
||||
"new_file": {"hotkey": ["ctrl", "n"]},
|
||||
}
|
||||
},
|
||||
"calculator": {
|
||||
"name": "Calculator",
|
||||
"launch_command": "calc",
|
||||
"common_actions": {
|
||||
"calculate": {"action": "type_numbers"},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
def execute_task(self, task: str, max_steps: int = 50) -> Dict[str, Any]:
|
||||
"""
|
||||
Execute a high-level task autonomously.
|
||||
|
||||
Args:
|
||||
task: Natural language task description
|
||||
max_steps: Maximum number of steps to attempt
|
||||
|
||||
Returns:
|
||||
Execution result with status and details
|
||||
"""
|
||||
logger.info(f"Executing task: {task}")
|
||||
|
||||
# Initialize result
|
||||
result = {
|
||||
"task": task,
|
||||
"status": "in_progress",
|
||||
"steps": [],
|
||||
"screenshots": [],
|
||||
"success": False
|
||||
}
|
||||
|
||||
try:
|
||||
# Step 1: Analyze task and plan
|
||||
plan = self._plan_task(task)
|
||||
logger.info(f"Generated plan with {len(plan)} steps")
|
||||
|
||||
# Step 2: Execute plan step by step
|
||||
for step_num, step in enumerate(plan, 1):
|
||||
if step_num > max_steps:
|
||||
logger.warning(f"Reached max steps ({max_steps})")
|
||||
break
|
||||
|
||||
logger.info(f"Step {step_num}/{len(plan)}: {step['description']}")
|
||||
|
||||
# Capture screen before action
|
||||
screenshot_before = self.dc.screenshot()
|
||||
|
||||
# Execute step
|
||||
step_result = self._execute_step(step)
|
||||
result["steps"].append(step_result)
|
||||
|
||||
# Capture screen after action
|
||||
screenshot_after = self.dc.screenshot()
|
||||
result["screenshots"].append({
|
||||
"step": step_num,
|
||||
"before": screenshot_before,
|
||||
"after": screenshot_after
|
||||
})
|
||||
|
||||
# Verify step success
|
||||
if not step_result.get("success", False):
|
||||
logger.error(f"Step {step_num} failed: {step_result.get('error')}")
|
||||
result["status"] = "failed"
|
||||
result["failed_at_step"] = step_num
|
||||
return result
|
||||
|
||||
# Small delay between steps
|
||||
time.sleep(0.5)
|
||||
|
||||
result["status"] = "completed"
|
||||
result["success"] = True
|
||||
logger.info(f"Task completed successfully in {len(result['steps'])} steps")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Task execution error: {e}")
|
||||
result["status"] = "error"
|
||||
result["error"] = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _plan_task(self, task: str) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Plan task execution using LLM reasoning.
|
||||
|
||||
Args:
|
||||
task: Task description
|
||||
|
||||
Returns:
|
||||
List of execution steps
|
||||
"""
|
||||
# For now, use rule-based planning
|
||||
# TODO: Integrate with OpenClaw LLM for intelligent planning
|
||||
|
||||
# Parse task intent
|
||||
task_lower = task.lower()
|
||||
|
||||
# Pattern matching for common tasks
|
||||
if "draw" in task_lower and "paint" in task_lower:
|
||||
return self._plan_paint_drawing(task)
|
||||
elif "type" in task_lower or "write" in task_lower:
|
||||
return self._plan_text_entry(task)
|
||||
elif "play" in task_lower and "game" in task_lower:
|
||||
return self._plan_game_play(task)
|
||||
elif "open" in task_lower or "launch" in task_lower:
|
||||
return self._plan_app_launch(task)
|
||||
else:
|
||||
# Generic plan - analyze and improvise
|
||||
return self._plan_generic(task)
|
||||
|
||||
def _plan_paint_drawing(self, task: str) -> List[Dict]:
|
||||
"""Plan for drawing in MS Paint."""
|
||||
# Extract what to draw
|
||||
drawing_subject = self._extract_subject(task)
|
||||
|
||||
return [
|
||||
{
|
||||
"type": "launch_app",
|
||||
"app": "mspaint",
|
||||
"description": "Launch Microsoft Paint"
|
||||
},
|
||||
{
|
||||
"type": "wait",
|
||||
"duration": 2.0,
|
||||
"description": "Wait for Paint to load"
|
||||
},
|
||||
{
|
||||
"type": "activate_window",
|
||||
"title": "Paint",
|
||||
"description": "Ensure Paint window is active"
|
||||
},
|
||||
{
|
||||
"type": "select_tool",
|
||||
"tool": "pencil",
|
||||
"description": "Select pencil tool"
|
||||
},
|
||||
{
|
||||
"type": "draw",
|
||||
"subject": drawing_subject,
|
||||
"description": f"Draw {drawing_subject}"
|
||||
},
|
||||
{
|
||||
"type": "screenshot",
|
||||
"save_as": "drawing_result.png",
|
||||
"description": "Capture the drawing"
|
||||
}
|
||||
]
|
||||
|
||||
def _plan_text_entry(self, task: str) -> List[Dict]:
|
||||
"""Plan for text entry task."""
|
||||
# Extract text to type
|
||||
text_content = self._extract_text_content(task)
|
||||
|
||||
return [
|
||||
{
|
||||
"type": "launch_app",
|
||||
"app": "notepad",
|
||||
"description": "Launch Notepad"
|
||||
},
|
||||
{
|
||||
"type": "wait",
|
||||
"duration": 1.0,
|
||||
"description": "Wait for Notepad to load"
|
||||
},
|
||||
{
|
||||
"type": "type_text",
|
||||
"text": text_content,
|
||||
"wpm": 80,
|
||||
"description": f"Type: {text_content[:50]}..."
|
||||
}
|
||||
]
|
||||
|
||||
def _plan_game_play(self, task: str) -> List[Dict]:
|
||||
"""Plan for playing a game."""
|
||||
game_name = self._extract_game_name(task)
|
||||
|
||||
return [
|
||||
{
|
||||
"type": "analyze_screen",
|
||||
"description": "Analyze game screen"
|
||||
},
|
||||
{
|
||||
"type": "detect_game_state",
|
||||
"game": game_name,
|
||||
"description": f"Detect {game_name} state"
|
||||
},
|
||||
{
|
||||
"type": "execute_game_loop",
|
||||
"game": game_name,
|
||||
"max_iterations": 100,
|
||||
"description": f"Play {game_name}"
|
||||
}
|
||||
]
|
||||
|
||||
def _plan_app_launch(self, task: str) -> List[Dict]:
|
||||
"""Plan for launching an application."""
|
||||
app_name = self._extract_app_name(task)
|
||||
|
||||
return [
|
||||
{
|
||||
"type": "launch_app",
|
||||
"app": app_name,
|
||||
"description": f"Launch {app_name}"
|
||||
},
|
||||
{
|
||||
"type": "wait",
|
||||
"duration": 2.0,
|
||||
"description": f"Wait for {app_name} to load"
|
||||
}
|
||||
]
|
||||
|
||||
def _plan_generic(self, task: str) -> List[Dict]:
|
||||
"""Generic planning fallback."""
|
||||
return [
|
||||
{
|
||||
"type": "analyze_screen",
|
||||
"description": "Analyze current screen state"
|
||||
},
|
||||
{
|
||||
"type": "infer_action",
|
||||
"task": task,
|
||||
"description": f"Infer action for: {task}"
|
||||
}
|
||||
]
|
||||
|
||||
def _execute_step(self, step: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Execute a single step.
|
||||
|
||||
Args:
|
||||
step: Step definition
|
||||
|
||||
Returns:
|
||||
Execution result
|
||||
"""
|
||||
step_type = step.get("type")
|
||||
result = {"step": step, "success": False}
|
||||
|
||||
try:
|
||||
if step_type == "launch_app":
|
||||
self._do_launch_app(step["app"])
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "wait":
|
||||
time.sleep(step["duration"])
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "activate_window":
|
||||
success = self.dc.activate_window(step["title"])
|
||||
result["success"] = success
|
||||
|
||||
elif step_type == "select_tool":
|
||||
self._do_select_tool(step["tool"])
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "draw":
|
||||
self._do_draw(step["subject"])
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "type_text":
|
||||
self.dc.type_text(step["text"], wpm=step.get("wpm", 80))
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "screenshot":
|
||||
filename = step.get("save_as", "screenshot.png")
|
||||
self.dc.screenshot(filename=filename)
|
||||
result["success"] = True
|
||||
result["saved_to"] = filename
|
||||
|
||||
elif step_type == "analyze_screen":
|
||||
analysis = self._analyze_screen()
|
||||
result["analysis"] = analysis
|
||||
result["success"] = True
|
||||
|
||||
elif step_type == "execute_game_loop":
|
||||
game_result = self._execute_game_loop(step)
|
||||
result["game_result"] = game_result
|
||||
result["success"] = True
|
||||
|
||||
else:
|
||||
result["error"] = f"Unknown step type: {step_type}"
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Step execution error: {e}")
|
||||
result["error"] = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _do_launch_app(self, app: str) -> None:
|
||||
"""Launch an application."""
|
||||
# Get launch command from knowledge base
|
||||
app_info = self.app_knowledge.get(app, {})
|
||||
launch_cmd = app_info.get("launch_command", app)
|
||||
|
||||
# Open Run dialog
|
||||
self.dc.hotkey('win', 'r')
|
||||
time.sleep(0.5)
|
||||
|
||||
# Type and execute command
|
||||
self.dc.type_text(launch_cmd, wpm=100)
|
||||
self.dc.press('enter')
|
||||
|
||||
logger.info(f"Launched: {app}")
|
||||
|
||||
def _do_select_tool(self, tool: str) -> None:
|
||||
"""Select a tool (e.g., in Paint)."""
|
||||
# This is simplified - in reality would use computer vision
|
||||
# to find and click the tool button
|
||||
|
||||
# For Paint, tools are typically in the ribbon
|
||||
# We'll use hotkeys where possible
|
||||
if tool == "pencil":
|
||||
# In Paint, press 'P' for pencil
|
||||
self.dc.press('p')
|
||||
elif tool == "brush":
|
||||
self.dc.press('b')
|
||||
elif tool == "eraser":
|
||||
self.dc.press('e')
|
||||
|
||||
logger.info(f"Selected tool: {tool}")
|
||||
|
||||
def _do_draw(self, subject: str) -> None:
|
||||
"""
|
||||
Draw something on screen.
|
||||
This is a simplified implementation - would be enhanced with:
|
||||
- Image generation (use wan2gp to generate reference)
|
||||
- Trace generation (convert image to draw commands)
|
||||
- Executed drawing (execute the commands)
|
||||
"""
|
||||
logger.info(f"Drawing: {subject}")
|
||||
|
||||
# Get canvas center (simplified - would detect canvas)
|
||||
canvas_x = self.screen_width // 2
|
||||
canvas_y = self.screen_height // 2
|
||||
|
||||
# Simple drawing pattern (example: draw a simple shape)
|
||||
if "circle" in subject.lower():
|
||||
self._draw_circle(canvas_x, canvas_y, radius=100)
|
||||
elif "square" in subject.lower():
|
||||
self._draw_square(canvas_x, canvas_y, size=200)
|
||||
elif "star" in subject.lower():
|
||||
self._draw_star(canvas_x, canvas_y, size=100)
|
||||
else:
|
||||
# Generic: draw a simple pattern
|
||||
self._draw_simple_pattern(canvas_x, canvas_y)
|
||||
|
||||
logger.info(f"Completed drawing: {subject}")
|
||||
|
||||
def _draw_circle(self, cx: int, cy: int, radius: int) -> None:
|
||||
"""Draw a circle."""
|
||||
import math
|
||||
|
||||
points = []
|
||||
for angle in range(0, 360, 5):
|
||||
rad = math.radians(angle)
|
||||
x = int(cx + radius * math.cos(rad))
|
||||
y = int(cy + radius * math.sin(rad))
|
||||
points.append((x, y))
|
||||
|
||||
# Draw by connecting points
|
||||
for i in range(len(points) - 1):
|
||||
self.dc.drag(points[i][0], points[i][1],
|
||||
points[i+1][0], points[i+1][1],
|
||||
duration=0.01)
|
||||
# Close the circle
|
||||
self.dc.drag(points[-1][0], points[-1][1],
|
||||
points[0][0], points[0][1],
|
||||
duration=0.01)
|
||||
|
||||
def _draw_square(self, cx: int, cy: int, size: int) -> None:
|
||||
"""Draw a square."""
|
||||
half = size // 2
|
||||
corners = [
|
||||
(cx - half, cy - half), # Top-left
|
||||
(cx + half, cy - half), # Top-right
|
||||
(cx + half, cy + half), # Bottom-right
|
||||
(cx - half, cy + half), # Bottom-left
|
||||
]
|
||||
|
||||
# Draw sides
|
||||
for i in range(4):
|
||||
start = corners[i]
|
||||
end = corners[(i + 1) % 4]
|
||||
self.dc.drag(start[0], start[1], end[0], end[1], duration=0.2)
|
||||
|
||||
def _draw_star(self, cx: int, cy: int, size: int) -> None:
|
||||
"""Draw a 5-pointed star."""
|
||||
import math
|
||||
|
||||
points = []
|
||||
for i in range(10):
|
||||
angle = math.radians(i * 36 - 90)
|
||||
radius = size if i % 2 == 0 else size // 2
|
||||
x = int(cx + radius * math.cos(angle))
|
||||
y = int(cy + radius * math.sin(angle))
|
||||
points.append((x, y))
|
||||
|
||||
# Draw by connecting points
|
||||
for i in range(len(points)):
|
||||
start = points[i]
|
||||
end = points[(i + 1) % len(points)]
|
||||
self.dc.drag(start[0], start[1], end[0], end[1], duration=0.1)
|
||||
|
||||
def _draw_simple_pattern(self, cx: int, cy: int) -> None:
|
||||
"""Draw a simple decorative pattern."""
|
||||
# Draw a few curved lines
|
||||
for offset in [-50, 0, 50]:
|
||||
self.dc.drag(cx - 100, cy + offset,
|
||||
cx + 100, cy + offset,
|
||||
duration=0.3)
|
||||
|
||||
def _analyze_screen(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze current screen state.
|
||||
Would use OCR, object detection in full implementation.
|
||||
"""
|
||||
screenshot = self.dc.screenshot()
|
||||
active_window = self.dc.get_active_window()
|
||||
mouse_pos = self.dc.get_mouse_position()
|
||||
|
||||
analysis = {
|
||||
"active_window": active_window,
|
||||
"mouse_position": mouse_pos,
|
||||
"screen_size": (self.screen_width, self.screen_height),
|
||||
"timestamp": time.time()
|
||||
}
|
||||
|
||||
# TODO: Add OCR, object detection, UI element detection
|
||||
|
||||
return analysis
|
||||
|
||||
def _execute_game_loop(self, step: Dict) -> Dict:
|
||||
"""
|
||||
Execute game playing loop.
|
||||
Would use reinforcement learning in full implementation.
|
||||
"""
|
||||
game = step.get("game", "unknown")
|
||||
max_iter = step.get("max_iterations", 100)
|
||||
|
||||
logger.info(f"Starting game loop for: {game}")
|
||||
|
||||
result = {
|
||||
"game": game,
|
||||
"iterations": 0,
|
||||
"actions_taken": []
|
||||
}
|
||||
|
||||
# Simple game loop - would be much more sophisticated
|
||||
for i in range(max_iter):
|
||||
# Analyze game state
|
||||
state = self._analyze_screen()
|
||||
|
||||
# Decide action (simplified - would use ML model)
|
||||
action = self._decide_game_action(state, game)
|
||||
|
||||
# Execute action
|
||||
self._execute_game_action(action)
|
||||
|
||||
result["iterations"] += 1
|
||||
result["actions_taken"].append(action)
|
||||
|
||||
# Check win/lose condition
|
||||
# (would detect from screen)
|
||||
|
||||
time.sleep(0.1)
|
||||
|
||||
return result
|
||||
|
||||
def _decide_game_action(self, state: Dict, game: str) -> str:
|
||||
"""Decide next game action based on state."""
|
||||
# Simplified - would use game-specific AI
|
||||
return "continue"
|
||||
|
||||
def _execute_game_action(self, action: str) -> None:
|
||||
"""Execute a game action."""
|
||||
# Simplified - would translate to specific inputs
|
||||
pass
|
||||
|
||||
# Helper methods for parsing
|
||||
|
||||
def _extract_subject(self, text: str) -> str:
|
||||
"""Extract subject from drawing request."""
|
||||
# Simple extraction - would use NLP
|
||||
if "draw" in text.lower():
|
||||
parts = text.lower().split("draw")
|
||||
if len(parts) > 1:
|
||||
return parts[1].strip()
|
||||
return "unknown"
|
||||
|
||||
def _extract_text_content(self, text: str) -> str:
|
||||
"""Extract text content from typing request."""
|
||||
# Simple extraction
|
||||
if "type" in text.lower():
|
||||
parts = text.split("type")
|
||||
if len(parts) > 1:
|
||||
return parts[1].strip().strip('"').strip("'")
|
||||
return text
|
||||
|
||||
def _extract_game_name(self, text: str) -> str:
|
||||
"""Extract game name from request."""
|
||||
# Would use NER for better extraction
|
||||
return "unknown_game"
|
||||
|
||||
def _extract_app_name(self, text: str) -> str:
|
||||
"""Extract application name from request."""
|
||||
# Simple extraction - would use NER
|
||||
for app in self.app_knowledge.keys():
|
||||
if app in text.lower():
|
||||
return app
|
||||
return "notepad" # Default fallback
|
||||
|
||||
|
||||
# Quick access function
|
||||
def create_agent(**kwargs) -> AIDesktopAgent:
|
||||
"""Create an AI Desktop Agent instance."""
|
||||
return AIDesktopAgent(**kwargs)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🤖 AI Desktop Agent - Cognitive Automation")
|
||||
print("=" * 60)
|
||||
|
||||
# Create agent
|
||||
agent = AIDesktopAgent(failsafe=True)
|
||||
|
||||
print("\n✨ Examples of what you can ask:")
|
||||
print(" - 'Draw a circle in Paint'")
|
||||
print(" - 'Type Hello World in Notepad'")
|
||||
print(" - 'Open Calculator'")
|
||||
print(" - 'Play Solitaire for me'")
|
||||
|
||||
print("\n🎯 Try it:")
|
||||
task = input("\nWhat would you like me to do? ")
|
||||
|
||||
if task.strip():
|
||||
result = agent.execute_task(task)
|
||||
print(f"\n{'='* 60}")
|
||||
print(f"Task Status: {result['status']}")
|
||||
print(f"Steps Executed: {len(result['steps'])}")
|
||||
print(f"Success: {result['success']}")
|
||||
|
||||
if result.get('screenshots'):
|
||||
print(f"Screenshots captured: {len(result['screenshots'])}")
|
||||
else:
|
||||
print("\nNo task entered. Exiting.")
|
||||
238
demo.py
Normal file
238
demo.py
Normal file
@@ -0,0 +1,238 @@
|
||||
"""
|
||||
Desktop Control Demo - Quick examples and tests
|
||||
"""
|
||||
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
# Add skills to path
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
|
||||
from desktop_control import DesktopController
|
||||
|
||||
def demo_mouse_control():
|
||||
"""Demo: Mouse movement and clicking"""
|
||||
print("\n🖱️ === MOUSE CONTROL DEMO ===")
|
||||
|
||||
dc = DesktopController(failsafe=True)
|
||||
|
||||
print(f"Current position: {dc.get_mouse_position()}")
|
||||
|
||||
# Smooth movement
|
||||
print("\n1. Moving mouse smoothly to center of screen...")
|
||||
screen_w, screen_h = dc.get_screen_size()
|
||||
center_x, center_y = screen_w // 2, screen_h // 2
|
||||
dc.move_mouse(center_x, center_y, duration=1.0)
|
||||
|
||||
# Relative movement
|
||||
print("2. Moving 100px right...")
|
||||
dc.move_relative(100, 0, duration=0.5)
|
||||
|
||||
print(f"Final position: {dc.get_mouse_position()}")
|
||||
|
||||
print("✅ Mouse demo complete!")
|
||||
|
||||
|
||||
def demo_keyboard_control():
|
||||
"""Demo: Keyboard typing"""
|
||||
print("\n⌨️ === KEYBOARD CONTROL DEMO ===")
|
||||
|
||||
dc = DesktopController()
|
||||
|
||||
print("\n⚠️ In 3 seconds, I'll type 'Hello from OpenClaw!' in the active window")
|
||||
print("Switch to Notepad or any text editor NOW!")
|
||||
time.sleep(3)
|
||||
|
||||
# Type with human-like speed
|
||||
dc.type_text("Hello from OpenClaw! ", wpm=60)
|
||||
dc.type_text("This is desktop automation in action. ", wpm=80)
|
||||
|
||||
# Press Enter
|
||||
dc.press('enter')
|
||||
dc.press('enter')
|
||||
|
||||
# Type instant
|
||||
dc.type_text("This was typed instantly!", interval=0)
|
||||
|
||||
print("\n✅ Keyboard demo complete!")
|
||||
|
||||
|
||||
def demo_screen_capture():
|
||||
"""Demo: Screenshot functionality"""
|
||||
print("\n📸 === SCREEN CAPTURE DEMO ===")
|
||||
|
||||
dc = DesktopController()
|
||||
|
||||
# Full screenshot
|
||||
print("\n1. Capturing full screen...")
|
||||
dc.screenshot(filename="demo_fullscreen.png")
|
||||
print(" Saved: demo_fullscreen.png")
|
||||
|
||||
# Region screenshot (center 800x600)
|
||||
print("\n2. Capturing center region (800x600)...")
|
||||
screen_w, screen_h = dc.get_screen_size()
|
||||
region = (
|
||||
(screen_w - 800) // 2, # left
|
||||
(screen_h - 600) // 2, # top
|
||||
800, # width
|
||||
600 # height
|
||||
)
|
||||
dc.screenshot(region=region, filename="demo_region.png")
|
||||
print(" Saved: demo_region.png")
|
||||
|
||||
# Get pixel color
|
||||
print("\n3. Getting pixel color at center...")
|
||||
center_x, center_y = screen_w // 2, screen_h // 2
|
||||
r, g, b = dc.get_pixel_color(center_x, center_y)
|
||||
print(f" Color at ({center_x}, {center_y}): RGB({r}, {g}, {b})")
|
||||
|
||||
print("\n✅ Screen capture demo complete!")
|
||||
|
||||
|
||||
def demo_window_management():
|
||||
"""Demo: Window operations"""
|
||||
print("\n🪟 === WINDOW MANAGEMENT DEMO ===")
|
||||
|
||||
dc = DesktopController()
|
||||
|
||||
# Get current window
|
||||
print(f"\n1. Active window: {dc.get_active_window()}")
|
||||
|
||||
# List all windows
|
||||
windows = dc.get_all_windows()
|
||||
print(f"\n2. Found {len(windows)} open windows:")
|
||||
for i, title in enumerate(windows[:15], 1): # Show first 15
|
||||
print(f" {i}. {title}")
|
||||
|
||||
print("\n✅ Window management demo complete!")
|
||||
|
||||
|
||||
def demo_hotkeys():
|
||||
"""Demo: Keyboard shortcuts"""
|
||||
print("\n🔥 === HOTKEY DEMO ===")
|
||||
|
||||
dc = DesktopController()
|
||||
|
||||
print("\n⚠️ This demo will:")
|
||||
print(" 1. Open Windows Run dialog (Win+R)")
|
||||
print(" 2. Type 'notepad'")
|
||||
print(" 3. Press Enter to open Notepad")
|
||||
print(" 4. Type a message")
|
||||
print("\nPress Enter to continue...")
|
||||
input()
|
||||
|
||||
# Open Run dialog
|
||||
print("\n1. Opening Run dialog...")
|
||||
dc.hotkey('win', 'r')
|
||||
time.sleep(0.5)
|
||||
|
||||
# Type notepad command
|
||||
print("2. Typing 'notepad'...")
|
||||
dc.type_text('notepad', wpm=80)
|
||||
time.sleep(0.3)
|
||||
|
||||
# Press Enter
|
||||
print("3. Launching Notepad...")
|
||||
dc.press('enter')
|
||||
time.sleep(1)
|
||||
|
||||
# Type message
|
||||
print("4. Typing message in Notepad...")
|
||||
dc.type_text("Desktop Control Skill Test\n\n", wpm=60)
|
||||
dc.type_text("This was automated by OpenClaw!\n", wpm=60)
|
||||
dc.type_text("- Mouse control ✓\n", wpm=60)
|
||||
dc.type_text("- Keyboard control ✓\n", wpm=60)
|
||||
dc.type_text("- Hotkeys ✓\n", wpm=60)
|
||||
|
||||
print("\n✅ Hotkey demo complete!")
|
||||
|
||||
|
||||
def demo_advanced_automation():
|
||||
"""Demo: Complete automation workflow"""
|
||||
print("\n🚀 === ADVANCED AUTOMATION DEMO ===")
|
||||
|
||||
dc = DesktopController()
|
||||
|
||||
print("\nThis demo will:")
|
||||
print("1. Get your clipboard content")
|
||||
print("2. Copy a new string to clipboard")
|
||||
print("3. Show the changes")
|
||||
print("\nPress Enter to continue...")
|
||||
input()
|
||||
|
||||
# Get current clipboard
|
||||
original = dc.get_from_clipboard()
|
||||
print(f"\n1. Original clipboard: '{original}'")
|
||||
|
||||
# Copy new content
|
||||
test_text = "Hello from OpenClaw Desktop Control!"
|
||||
dc.copy_to_clipboard(test_text)
|
||||
print(f"2. Copied to clipboard: '{test_text}'")
|
||||
|
||||
# Verify
|
||||
new_clipboard = dc.get_from_clipboard()
|
||||
print(f"3. Verified clipboard: '{new_clipboard}'")
|
||||
|
||||
# Restore original
|
||||
if original:
|
||||
dc.copy_to_clipboard(original)
|
||||
print("4. Restored original clipboard")
|
||||
|
||||
print("\n✅ Advanced automation demo complete!")
|
||||
|
||||
|
||||
def main():
|
||||
"""Run all demos"""
|
||||
print("=" * 60)
|
||||
print("🎮 DESKTOP CONTROL SKILL - DEMO SUITE")
|
||||
print("=" * 60)
|
||||
print("\n⚠️ IMPORTANT:")
|
||||
print("- Failsafe is ENABLED (move mouse to corner to abort)")
|
||||
print("- Some demos will control your mouse and keyboard")
|
||||
print("- Close important applications before continuing")
|
||||
print("\n" + "=" * 60)
|
||||
|
||||
demos = [
|
||||
("Mouse Control", demo_mouse_control),
|
||||
("Window Management", demo_window_management),
|
||||
("Screen Capture", demo_screen_capture),
|
||||
("Hotkeys", demo_hotkeys),
|
||||
("Keyboard Control", demo_keyboard_control),
|
||||
("Advanced Automation", demo_advanced_automation),
|
||||
]
|
||||
|
||||
while True:
|
||||
print("\n📋 SELECT DEMO:")
|
||||
for i, (name, _) in enumerate(demos, 1):
|
||||
print(f" {i}. {name}")
|
||||
print(f" {len(demos) + 1}. Run All")
|
||||
print(" 0. Exit")
|
||||
|
||||
choice = input("\nEnter choice: ").strip()
|
||||
|
||||
if choice == '0':
|
||||
print("\n👋 Goodbye!")
|
||||
break
|
||||
elif choice == str(len(demos) + 1):
|
||||
for name, func in demos:
|
||||
print(f"\n{'=' * 60}")
|
||||
func()
|
||||
time.sleep(1)
|
||||
print(f"\n{'=' * 60}")
|
||||
print("🎉 All demos complete!")
|
||||
elif choice.isdigit() and 1 <= int(choice) <= len(demos):
|
||||
demos[int(choice) - 1][1]()
|
||||
else:
|
||||
print("❌ Invalid choice!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
main()
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠️ Demo interrupted by user")
|
||||
except Exception as e:
|
||||
print(f"\n\n❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
Reference in New Issue
Block a user