mirroir-mcp

Give your AI eyes, hands, and a real iPhone.

Tell your AI to send a message, test a login flow, or explore an app — it sees the screen, taps what it needs, and figures the rest out. An MCP server for macOS iPhone Mirroring, compatible with any MCP client.

See it in action

How it works

Every interaction follows the same loop — observe, reason, act.

👁

Eyes

describe_screen returns every text element on the iPhone screen with exact tap coordinates.

🧠

Brain

Your LLM reads the screen, decides what to do next, and picks the right tool.

Hands

tap, type_text, swipe — the AI executes the action, then loops back to observe.

You say "send Alice a message." The AI screenshots the phone, reads the screen via OCR, taps Messages, finds Alice's conversation, types your message, and hits Send. No script. No coordinates. Just intent.

Install mirroir-mcp v0.31.0

$ brew tap jfarcand/tap && brew install mirroir-mcp

or via npx:

$ npx -y mirroir-mcp install

or via shell script:

$ /bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"

Just tell your AI

Paste any of these into Claude Code, ChatGPT, Cursor, or any MCP client.

"Open Calendar, create a Dentist event next Tuesday at 2pm"

The AI launches Calendar, taps "+", fills in title, date, and time, then saves. Handles confirmation dialogs automatically.

"Open Messages, find Alice, and send 'running 10 min late'"

The AI opens Messages, scrolls to find Alice's conversation, taps the text field, types the message, and hits Send.

"Test the login screen with test@example.com / password123"

The AI opens the app, taps Email, types the address, taps Password, types it, taps Sign In, and screenshots the result.

"Start recording, open Settings, scroll to General > About, stop recording"

The AI starts a video capture, navigates through Settings menus, then stops recording and returns the file path.

From exploration to CI

When you need deterministic, repeatable testing, mirroir provides a full pipeline. Point it at any app — it autonomously discovers every reachable screen using BFS graph traversal (screens are nodes, taps are edges), then outputs a bundle of ready-to-run SKILL.md files. Edit them, test them from the CLI, diagnose failures with --agent.

1

Generate

A single generate_skill(action: "explore") call runs autonomous BFS traversal — exploring each screen breadth-first, replaying paths to reach child screens, building a navigation graph of the entire app.

generate_skill(action: "explore", app: "Settings")
2

Write

Edit the generated skill or author one from scratch — numbered steps, no coordinates.

vim login.md
3

Run

The AI reads the skill via get_skill, executes each step with MCP tools, and auto-compiles coordinates at the end.

get_skill → record_step → save_compiled
4

Test

Replay with zero OCR — pure input injection. A 10-step skill drops from 5+ seconds of OCR to under a second.

mirroir test login
5

Diagnose

When a step fails, --agent compares expected vs. actual OCR and tells you the root cause and fix.

mirroir test --agent login

Agent diagnosis runs in two tiers: deterministic OCR analysis first (free, no API key), then optionally an AI model for richer analysis. Supports Anthropic, OpenAI, local Ollama, and CLI agents.

Repeatable flows: Skills

When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like Tap "Email" use OCR, not coordinates. Share them on the community repository.

Community skill library

The mirroir-skills repository is an open collection of ready-made SKILL.md files — login flows, cross-app workflows, settings automation, and more. ${VAR} placeholders resolve from environment variables, so the same skill works across accounts and devices.

Install the full library as a Claude Code plugin*:

$ claude plugin add jfarcand/mirroir-skills

* Also supported by GitHub Copilot.

What Does mirroir-mcp Do?

32 tools exposed as an MCP server.

Touch

  • tap Tap at screen coordinates
  • double_tap Double-tap for zoom or text selection
  • long_press Hold for context menus
  • swipe Quick flick between two points
  • drag Slow drag for sliders and icons

Input

  • type_text Type text via virtual keyboard
  • press_key Send special keys with modifiers
  • shake Shake gesture for undo or dev menus

Observe

  • screenshot Capture screen as PNG
  • describe_screen OCR with tap coordinates
  • start_recording Begin video recording
  • stop_recording End recording, get file path
  • get_orientation Portrait or landscape
  • status Connection and device readiness
  • check_health Full setup diagnostic
  • calibrate_component Test UI component definitions against live screen
  • list_targets List configured automation targets

Navigate

  • launch_app Open app by name via Spotlight
  • open_url Open URL in Safari
  • press_home Return to home screen
  • press_app_switcher Show recent apps
  • spotlight Open Spotlight search
  • scroll_to Scroll until element visible via OCR
  • reset_app Force-quit app via App Switcher
  • set_network Toggle airplane, Wi-Fi, cellular
  • measure Time screen transitions
  • generate_skill Autonomous BFS exploration → navigation graph → SKILL.md bundle
  • list_skills List available skills
  • get_skill Read skill with env substitution + compilation status
  • record_step Record a compiled step during execution
  • save_compiled Save compiled .json for zero-OCR replay
  • switch_target Switch active automation target

Integrations

Any MCP client that supports stdio transport — plug into your editor or build your own agent.

Agents

Frameworks

Security-first by design

Giving an AI access to your phone demands defense in depth. mirroir-mcp is fail-closed at every layer.

Tool permissions

Without a config file, only read-only tools (screenshot, describe_screen) are exposed. Mutating tools are hidden from the MCP client entirely — it never sees them.

App blocking

blockedApps in permissions.json prevents the AI from interacting with sensitive apps like Wallet or Banking — even if mutating tools are allowed.

No root required

Runs as a regular user process using the macOS CGEvent API. No daemons, no kernel extensions, no root privileges — just Accessibility permissions.

{
  "allow": ["tap", "swipe", "type_text", "press_key", "launch_app"],
  "deny": [],
  "blockedApps": ["Wallet", "Banking"]
}

Drop this in ~/.mirroir-mcp/permissions.json to control exactly which tools your AI agent can use. Close iPhone Mirroring to kill all input instantly.

FAQ

Is this safe? Can the AI access my banking apps?

Without a config file, only read-only tools are exposed. Mutating tools require explicit opt-in. Use blockedApps in permissions.json to deny access to sensitive apps. Closing iPhone Mirroring kills all input immediately.

Why does my cursor jump when the AI is working?

macOS routes HID input to the frontmost app. The server must activate iPhone Mirroring before each input. Put it in a separate macOS Space to keep your workspace undisturbed.

Does it work with any iPhone app?

Yes. It operates at the screen level through iPhone Mirroring — no source code, SDK, or jailbreak required. If you can see it on screen, the AI can interact with it.

Does it need any kernel extensions or root access?

No. All input (touch and keyboard) is delivered via the macOS CGEvent API, which only requires Accessibility permissions. No kernel extensions, no root privileges, no helper daemons.

Can it control macOS apps too, not just iPhone?

Experimental. You can add targets in .mirroir-mcp/targets.json pointing at macOS windows. The same tools work, but with limitations — both the MCP client and target window must be in the same macOS Space, and iPhone-specific tools (press_home, app_switcher, spotlight) don't apply. See the README for details.

Can I restrict which tools the AI can use?

Yes. Drop a permissions.json with allow and deny lists. Tools not in the allow list are hidden from the MCP client entirely.

Read the full FAQ