Most people assume that humanoid robot development requires being in the same room as the hardware. You need to see the robot move, catch it when it falls, swap out a servo when something goes wrong. That assumption is understandable, and it is mostly wrong.

At Habil, roughly 80% of our skill deployment work happens remotely. Our engineering team in Bangalore regularly deploys trained models and updates to robots sitting in labs across the United States and Europe. The robot might be in a university lab in Michigan or a research facility in Munich, but the skill was built, tested, and shipped from 8,000 kilometers away.

This is not some future aspiration. It is how we work today. Here is the exact workflow.

The Sim-First Philosophy

Remote deployment only works if you have extreme confidence in what you are shipping before it touches the hardware. That confidence comes from simulation.

Every skill we build starts its life in simulation. We never begin development on the physical robot. The sequence is always the same:

This sim-first approach does more than de-risk deployment. It makes remote work possible in the first place. When your confidence in the trained policy sits above 95% before it touches hardware, you do not need to be standing next to the robot when you press deploy.

The Remote Deployment Pipeline

Once a skill passes simulation validation, it enters our deployment pipeline. The pipeline is built around SSH tunneling to the robot's onboard compute, which is typically an NVIDIA Jetson Orin.

Network Architecture

Every client robot we support has a standardized access setup:

SSH Configuration

Our engineers maintain an SSH config that looks roughly like this:

# ~/.ssh/config — Habil robot access

Host robot-client-alpha
    HostName 10.0.1.42
    User habil
    IdentityFile ~/.ssh/habil_deploy_ed25519
    ProxyJump vpn-gateway-alpha
    LocalForward 8501 localhost:8501   # Model serving
    LocalForward 5555 localhost:5555   # ZMQ camera stream
    LocalForward 9090 localhost:9090   # Telemetry dashboard
    LocalForward 7777 localhost:7777   # E-stop interface
    ServerAliveInterval 30
    ServerAliveCountMax 3

Host vpn-gateway-alpha
    HostName gateway.client-alpha.example.com
    User habil-vpn
    IdentityFile ~/.ssh/habil_vpn_ed25519

The ProxyJump directive handles the two-hop connection through the VPN gateway and into the robot's Jetson. A single ssh robot-client-alpha command sets up all the tunnels an engineer needs.

Establishing a Session

A typical deployment session starts like this:

# Connect and establish all tunnels
ssh robot-client-alpha

# In a second terminal, verify the robot is responsive
ssh robot-client-alpha "systemctl status habil-runtime"

# Check GPU memory and running processes
ssh robot-client-alpha "nvidia-smi && ps aux | grep habil"

Camera Streaming Over ZMQ

You cannot deploy skills remotely without seeing what the robot sees. We stream live camera feeds from the robot's Intel RealSense cameras using ZeroMQ, which gives us low-latency, reliable transport without the overhead of a full video streaming framework.

Publisher (On the Robot)

The publisher runs on the Jetson Orin and captures frames from the RealSense pipeline:

import zmq
import pyrealsense2 as rs
import numpy as np
import cv2
import time

def start_camera_publisher(port=5555):
    ctx = zmq.Context()
    sock = ctx.socket(zmq.PUB)
    sock.setsockopt(zmq.SNDHWM, 2)  # Drop old frames
    sock.bind(f"tcp://0.0.0.0:{port}")

    pipeline = rs.pipeline()
    config = rs.config()
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    pipeline.start(config)

    print(f"[camera] Publishing on port {port}")

    try:
        while True:
            frames = pipeline.wait_for_frames()
            color = np.asanyarray(
                frames.get_color_frame().get_data()
            )
            depth = np.asanyarray(
                frames.get_depth_frame().get_data()
            )

            # Encode color frame as JPEG for bandwidth
            _, color_jpg = cv2.imencode(
                '.jpg', color,
                [cv2.IMWRITE_JPEG_QUALITY, 75]
            )

            ts = time.time()
            sock.send_multipart([
                b"color", color_jpg.tobytes(),
                b"depth", depth.tobytes(),
                b"timestamp", str(ts).encode()
            ])
    finally:
        pipeline.stop()

if __name__ == "__main__":
    start_camera_publisher()

Subscriber (On the Engineer's Machine)

The subscriber connects through the SSH tunnel and renders the stream locally:

import zmq
import numpy as np
import cv2

def start_viewer(port=5555):
    ctx = zmq.Context()
    sock = ctx.socket(zmq.SUB)
    sock.connect(f"tcp://localhost:{port}")
    sock.subscribe(b"")
    sock.setsockopt(zmq.RCVHWM, 2)

    print("[viewer] Connected, waiting for frames...")

    while True:
        parts = sock.recv_multipart()

        # Parse color frame
        color_idx = parts.index(b"color")
        color_jpg = parts[color_idx + 1]
        color = cv2.imdecode(
            np.frombuffer(color_jpg, dtype=np.uint8),
            cv2.IMREAD_COLOR
        )

        # Parse depth frame
        depth_idx = parts.index(b"depth")
        depth_raw = parts[depth_idx + 1]
        depth = np.frombuffer(
            depth_raw, dtype=np.uint16
        ).reshape(480, 640)

        # Colorize depth for visualization
        depth_color = cv2.applyColorMap(
            cv2.convertScaleAbs(depth, alpha=0.03),
            cv2.COLORMAP_JET
        )

        cv2.imshow("Robot Color", color)
        cv2.imshow("Robot Depth", depth_color)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cv2.destroyAllWindows()

if __name__ == "__main__":
    start_viewer()

With JPEG compression on the color stream, the bandwidth requirement sits around 2-3 Mbps, which is manageable over most client VPN connections. Latency is typically 100-200ms, which is more than sufficient for monitoring.

Remote Skill Deployment

Once tunnels are established and cameras are streaming, the actual deployment is handled by a standardized script. Every deployment follows the same sequence: stop the running service, back up the current model, push the new model and config, restart the service, run a smoke test.

#!/bin/bash
# deploy_skill.sh — Remote skill deployment
set -euo pipefail

ROBOT_HOST="robot-client-alpha"
SKILL_NAME="${1:?Usage: deploy_skill.sh }"
MODEL_PATH="./trained_models/${SKILL_NAME}/policy_latest.onnx"
CONFIG_PATH="./configs/${SKILL_NAME}/runtime_config.yaml"
REMOTE_DIR="/opt/habil/skills/${SKILL_NAME}"

echo "[deploy] Deploying ${SKILL_NAME} to ${ROBOT_HOST}"

# 1. Pre-flight checks
echo "[deploy] Running pre-flight checks..."
ssh "${ROBOT_HOST}" "nvidia-smi --query-gpu=memory.free \
  --format=csv,noheader" | awk '{
    if ($1 < 2000) {
      print "[FAIL] Insufficient GPU memory"; exit 1
    }
  }'

# 2. Stop the running skill service
echo "[deploy] Stopping current skill service..."
ssh "${ROBOT_HOST}" "sudo systemctl stop habil-skill@${SKILL_NAME} \
  || true"

# 3. Backup current deployment
echo "[deploy] Backing up current deployment..."
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
ssh "${ROBOT_HOST}" "
  if [ -d ${REMOTE_DIR} ]; then
    cp -r ${REMOTE_DIR} ${REMOTE_DIR}.bak.${TIMESTAMP}
  fi
"

# 4. Push new model and config
echo "[deploy] Uploading model and config..."
ssh "${ROBOT_HOST}" "mkdir -p ${REMOTE_DIR}"
scp "${MODEL_PATH}" "${ROBOT_HOST}:${REMOTE_DIR}/policy.onnx"
scp "${CONFIG_PATH}" "${ROBOT_HOST}:${REMOTE_DIR}/config.yaml"

# 5. Restart service
echo "[deploy] Starting skill service..."
ssh "${ROBOT_HOST}" "sudo systemctl start habil-skill@${SKILL_NAME}"
sleep 3

# 6. Smoke test
echo "[deploy] Running smoke test..."
HEALTH=$(ssh "${ROBOT_HOST}" "curl -sf http://localhost:8501/health \
  | python3 -c 'import sys,json; \
    d=json.load(sys.stdin); \
    print(d[\"status\"])'")

if [ "${HEALTH}" = "healthy" ]; then
  echo "[deploy] SUCCESS — ${SKILL_NAME} is running"
else
  echo "[deploy] FAIL — Rolling back..."
  ssh "${ROBOT_HOST}" "
    sudo systemctl stop habil-skill@${SKILL_NAME}
    rm -rf ${REMOTE_DIR}
    mv ${REMOTE_DIR}.bak.${TIMESTAMP} ${REMOTE_DIR}
    sudo systemctl start habil-skill@${SKILL_NAME}
  "
  echo "[deploy] Rolled back to previous version"
  exit 1
fi

The automatic rollback is critical. If a deployment fails its health check, the script restores the previous version within seconds. This gives us the confidence to deploy frequently without requiring someone on-site to catch failures.

Real-Time Monitoring and Debugging

Deployment is only half the story. Once a skill is running, we need to watch it perform and catch problems early.

Telemetry Stack

Every robot runs a lightweight telemetry agent that reports:

All telemetry feeds into a Grafana dashboard accessible through the SSH tunnel on port 9090. An engineer can watch joint trajectories, inference times, and system health in real time while the robot executes a skill.

Emergency Stop

Every robot has a software emergency stop exposed on port 7777. It is a simple HTTP endpoint that commands the robot to freeze all joints and enter a safe resting position. The E-stop interface runs as a web page — one large red button that an engineer can hit from anywhere in the world with an active tunnel.

We also configure hardware E-stops on every robot, but the software E-stop gives remote engineers the ability to intervene without calling someone on-site.

Structured Logging

All skill execution logs use structured JSON format and stream to a centralized logging service. When something goes wrong, an engineer can query logs by skill name, timestamp, severity, or specific joint IDs without SSHing into the robot at all.

When Remote Does Not Work

We are not dogmatic about remote work. Roughly 20% of our engagements require someone on-site, and pretending otherwise would be dishonest.

Here is what typically requires physical presence:

The key insight is that these on-site requirements are front-loaded. Once a robot is set up, calibrated, and connected, the ongoing skill development and deployment is almost entirely remote.

Results

We have been running this workflow in production for over a year. The numbers speak for themselves:

The remote-first approach is not just about cost savings. It gives us access to a global client base without needing engineers on every continent. A client in Germany gets the same engineering team and the same turnaround as a client down the street.

The best part of working remote-first is that deployment quality actually improved. When you know you cannot walk over and fix something manually, you build better automated safeguards.