mirroir-mcp

How it works

Every interaction follows the same loop — observe, reason, act.

👁

Eyes

describe_screen returns every text element on the iPhone screen with exact tap coordinates.

🧠

Brain

Your LLM reads the screen, decides what to do next, and picks the right tool.

✋

Hands

tap, type_text, swipe — the AI executes the action, then loops back to observe.

You say "send Alice a message." The AI screenshots the phone, reads the screen via OCR, taps Messages, finds Alice's conversation, types your message, and hits Send. No script. No coordinates. Just intent.

Just tell your AI

Paste any of these into Claude Code, ChatGPT, Cursor, or any MCP client.

"Open Calendar, create a Dentist event next Tuesday at 2pm"

The AI launches Calendar, taps "+", fills in title, date, and time, then saves. Handles confirmation dialogs automatically.

"Open Messages, find Alice, and send 'running 10 min late'"

The AI opens Messages, scrolls to find Alice's conversation, taps the text field, types the message, and hits Send.

"Test the login screen with test@example.com / password123"

The AI opens the app, taps Email, types the address, taps Password, types it, taps Sign In, and screenshots the result.

"Start recording, open Settings, scroll to General > About, stop recording"

The AI starts a video capture, navigates through Settings menus, then stops recording and returns the file path.

From exploration to CI

When you need deterministic, repeatable testing, mirroir provides a full pipeline. Point it at any app — it autonomously discovers every reachable screen using BFS graph traversal (screens are nodes, taps are edges), then outputs a bundle of ready-to-run SKILL.md files. Edit them, test them from the CLI, diagnose failures with --agent.

Generate

A single generate_skill(action: "explore") call runs autonomous BFS traversal — exploring each screen breadth-first, replaying paths to reach child screens, building a navigation graph of the entire app.

generate_skill(action: "explore", app: "Settings")

Write

Edit the generated skill or author one from scratch — numbered steps, no coordinates.

vim login.md

Run

The AI reads the skill via get_skill, executes each step with MCP tools, and auto-compiles coordinates at the end.

get_skill → record_step → save_compiled

Test

Replay with zero OCR — pure input injection. A 10-step skill drops from 5+ seconds of OCR to under a second.

mirroir test login

Diagnose

When a step fails, --agent compares expected vs. actual OCR and tells you the root cause and fix.

mirroir test --agent login

Agent diagnosis runs in two tiers: deterministic OCR analysis first (free, no API key), then optionally an AI model for richer analysis. Supports Anthropic, OpenAI, local Ollama, and CLI agents.

Repeatable flows: Skills

When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like Tap "Email" use OCR, not coordinates. Share them on the community repository.

Branching + looping

---
version: 1
name: Expo Go Signup + Onboarding
app: Expo Go
tags: ["expo", "signup", "onboarding"]
---

Sign up, handle error or success, then swipe through onboarding.

## Steps

1. Launch **Expo Go**
2. Wait for "LoginDemo" to appear
3. Tap "LoginDemo"
4. Tap "Sign Up"
5. Tap "Email"
6. Type "${TEST_EMAIL}"
7. Tap "Password"
8. Type "${TEST_PASSWORD}"
9. Tap "Create Account"
10. If "already exists" is visible:
    - Tap "Sign In Instead"
    - Tap "Email"
    - Type "${TEST_EMAIL}"
    - Tap "Password"
    - Type "${TEST_PASSWORD}"
    - Tap "Sign In"
11. Otherwise:
    - Wait for "Welcome" to appear
12. Verify "Welcome" is visible
13. Screenshot: "authenticated"
14. Repeat until "Get Started" is visible (max 5 iterations):
    - Swipe left
    - Screenshot: "onboarding"
15. Tap "Get Started"
16. Verify "Dashboard" is visible
17. Screenshot: "complete"

Cross-app workflow

---
version: 1
name: Commute ETA Notification
app: Waze, Messages
tags: ["waze", "messages", "cross-app", "commute"]
---

Get ETA from Waze, text it via iMessage.

## Steps

1. Launch **Waze**
2. Wait for "Where to?" to appear
3. Tap "Where to?"
4. Tap "${DESTINATION:-Work}"
5. Tap "Go"
6. Wait for "min" to appear
7. Remember: Read the commute ETA.
8. Press Home
9. Launch **Messages**
10. Tap "New Message"
11. Tap "To:"
12. Type "${RECIPIENT}"
13. Tap "${RECIPIENT}"
14. Tap "iMessage"
15. Type "On my way! ETA {eta}"
16. Press **Return**
17. Screenshot: "message_sent"

macOS window (experimental)

---
version: 1
name: Play a Song
app: Spotify (macOS)
tags: ["spotify", "macos", "music", "multi-target"]
---

Search for a song on the Spotify macOS app and play it.

## Steps

1. Switch to target "spotify"
2. Tap "Search"
3. Wait for "What do you want to listen to?" to appear
4. Type "${SONG:-Bohemian Rhapsody}"
5. Wait for "${SONG:-Bohemian Rhapsody}" to appear
6. Tap "${SONG:-Bohemian Rhapsody}"
7. Verify "Playing" is visible
8. Screenshot: "now_playing"

Community skill library

The mirroir-skills repository is an open collection of ready-made SKILL.md files — login flows, cross-app workflows, settings automation, and more. ${VAR} placeholders resolve from environment variables, so the same skill works across accounts and devices.

Install the full library as a Claude Code plugin^*:

$ claude plugin add jfarcand/mirroir-skills

^* Also supported by GitHub Copilot.

What Does mirroir-mcp Do?

32 tools exposed as an MCP server.

Touch

tap Tap at screen coordinates
double_tap Double-tap for zoom or text selection
long_press Hold for context menus
swipe Quick flick between two points
drag Slow drag for sliders and icons

Input

type_text Type text via virtual keyboard
press_key Send special keys with modifiers
shake Shake gesture for undo or dev menus

Observe

screenshot Capture screen as PNG
describe_screen OCR with tap coordinates
start_recording Begin video recording
stop_recording End recording, get file path
get_orientation Portrait or landscape
status Connection and device readiness
check_health Full setup diagnostic
calibrate_component Test UI component definitions against live screen
list_targets List configured automation targets

Navigate

launch_app Open app by name via Spotlight
open_url Open URL in Safari
press_home Return to home screen
press_app_switcher Show recent apps
spotlight Open Spotlight search
scroll_to Scroll until element visible via OCR
reset_app Force-quit app via App Switcher
set_network Toggle airplane, Wi-Fi, cellular
measure Time screen transitions
generate_skill Autonomous BFS exploration → navigation graph → SKILL.md bundle
list_skills List available skills
get_skill Read skill with env substitution + compilation status
record_step Record a compiled step during execution
save_compiled Save compiled .json for zero-OCR replay
switch_target Switch active automation target

Security-first by design

Giving an AI access to your phone demands defense in depth. mirroir-mcp is fail-closed at every layer.

Tool permissions

Without a config file, only read-only tools (screenshot, describe_screen) are exposed. Mutating tools are hidden from the MCP client entirely — it never sees them.

App blocking

blockedApps in permissions.json prevents the AI from interacting with sensitive apps like Wallet or Banking — even if mutating tools are allowed.

No root required

Runs as a regular user process using the macOS CGEvent API. No daemons, no kernel extensions, no root privileges — just Accessibility permissions.

{
  "allow": ["tap", "swipe", "type_text", "press_key", "launch_app"],
  "deny": [],
  "blockedApps": ["Wallet", "Banking"]
}

Drop this in ~/.mirroir-mcp/permissions.json to control exactly which tools your AI agent can use. Close iPhone Mirroring to kill all input instantly.

FAQ

Is this safe? Can the AI access my banking apps?

Without a config file, only read-only tools are exposed. Mutating tools require explicit opt-in. Use blockedApps in permissions.json to deny access to sensitive apps. Closing iPhone Mirroring kills all input immediately.

Why does my cursor jump when the AI is working?

macOS routes HID input to the frontmost app. The server must activate iPhone Mirroring before each input. Put it in a separate macOS Space to keep your workspace undisturbed.

Does it work with any iPhone app?

Yes. It operates at the screen level through iPhone Mirroring — no source code, SDK, or jailbreak required. If you can see it on screen, the AI can interact with it.

Does it need any kernel extensions or root access?

No. All input (touch and keyboard) is delivered via the macOS CGEvent API, which only requires Accessibility permissions. No kernel extensions, no root privileges, no helper daemons.

Can it control macOS apps too, not just iPhone?

Experimental. You can add targets in .mirroir-mcp/targets.json pointing at macOS windows. The same tools work, but with limitations — both the MCP client and target window must be in the same macOS Space, and iPhone-specific tools (press_home, app_switcher, spotlight) don't apply. See the README for details.

Can I restrict which tools the AI can use?

Yes. Drop a permissions.json with allow and deny lists. Tools not in the allow list are hidden from the MCP client entirely.

Read the full FAQ

See it in action

Login Flow

Cross-App Workflow