Most people assume that humanoid robot development requires being in the same room as the hardware. You need to see the robot move, catch it when it falls, swap out a servo when something goes wrong. That assumption is understandable, and it is mostly wrong.
At Habil, roughly 80% of our skill deployment work happens remotely. Our engineering team in Bangalore regularly deploys trained models and updates to robots sitting in labs across the United States and Europe. The robot might be in a university lab in Michigan or a research facility in Munich, but the skill was built, tested, and shipped from 8,000 kilometers away.
This is not some future aspiration. It is how we work today. Here is the exact workflow.
The Sim-First Philosophy
Remote deployment only works if you have extreme confidence in what you are shipping before it touches the hardware. That confidence comes from simulation.
Every skill we build starts its life in simulation. We never begin development on the physical robot. The sequence is always the same:
- Prototype in MuJoCo — We use MuJoCo for rapid iteration. It is fast, lightweight, and gives us enough fidelity to test control logic and basic behaviors. An engineer can spin up a new manipulation task in MuJoCo in under an hour.
- Train in Isaac Lab — Once the behavior looks right in MuJoCo, we move to NVIDIA Isaac Lab for GPU-accelerated reinforcement learning. Isaac Lab lets us run thousands of parallel environments, so a training run that would take days on a single instance finishes in hours.
- Validate in high-fidelity sim — Before anything ships, we run the trained policy through a battery of simulated edge cases. Random perturbations, varied object geometries, different lighting conditions. If the policy breaks in simulation, it never sees the real robot.
This sim-first approach does more than de-risk deployment. It makes remote work possible in the first place. When your confidence in the trained policy sits above 95% before it touches hardware, you do not need to be standing next to the robot when you press deploy.
The Remote Deployment Pipeline
Once a skill passes simulation validation, it enters our deployment pipeline. The pipeline is built around SSH tunneling to the robot's onboard compute, which is typically an NVIDIA Jetson Orin.
Network Architecture
Every client robot we support has a standardized access setup:
- Site VPN — The robot sits on the client's local network behind a VPN. We use WireGuard for its simplicity and speed.
- SSH key authentication — Password authentication is disabled. Each engineer has a unique key pair, and keys are rotated quarterly.
- Port forwarding — Specific ports are forwarded for different services: model inference, camera streams, telemetry, and the emergency stop interface.
SSH Configuration
Our engineers maintain an SSH config that looks roughly like this:
# ~/.ssh/config — Habil robot access
Host robot-client-alpha
HostName 10.0.1.42
User habil
IdentityFile ~/.ssh/habil_deploy_ed25519
ProxyJump vpn-gateway-alpha
LocalForward 8501 localhost:8501 # Model serving
LocalForward 5555 localhost:5555 # ZMQ camera stream
LocalForward 9090 localhost:9090 # Telemetry dashboard
LocalForward 7777 localhost:7777 # E-stop interface
ServerAliveInterval 30
ServerAliveCountMax 3
Host vpn-gateway-alpha
HostName gateway.client-alpha.example.com
User habil-vpn
IdentityFile ~/.ssh/habil_vpn_ed25519
The ProxyJump directive handles the two-hop connection through the VPN gateway and into the robot's Jetson. A single ssh robot-client-alpha command sets up all the tunnels an engineer needs.
Establishing a Session
A typical deployment session starts like this:
# Connect and establish all tunnels
ssh robot-client-alpha
# In a second terminal, verify the robot is responsive
ssh robot-client-alpha "systemctl status habil-runtime"
# Check GPU memory and running processes
ssh robot-client-alpha "nvidia-smi && ps aux | grep habil"
Camera Streaming Over ZMQ
You cannot deploy skills remotely without seeing what the robot sees. We stream live camera feeds from the robot's Intel RealSense cameras using ZeroMQ, which gives us low-latency, reliable transport without the overhead of a full video streaming framework.
Publisher (On the Robot)
The publisher runs on the Jetson Orin and captures frames from the RealSense pipeline:
import zmq
import pyrealsense2 as rs
import numpy as np
import cv2
import time
def start_camera_publisher(port=5555):
ctx = zmq.Context()
sock = ctx.socket(zmq.PUB)
sock.setsockopt(zmq.SNDHWM, 2) # Drop old frames
sock.bind(f"tcp://0.0.0.0:{port}")
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
pipeline.start(config)
print(f"[camera] Publishing on port {port}")
try:
while True:
frames = pipeline.wait_for_frames()
color = np.asanyarray(
frames.get_color_frame().get_data()
)
depth = np.asanyarray(
frames.get_depth_frame().get_data()
)
# Encode color frame as JPEG for bandwidth
_, color_jpg = cv2.imencode(
'.jpg', color,
[cv2.IMWRITE_JPEG_QUALITY, 75]
)
ts = time.time()
sock.send_multipart([
b"color", color_jpg.tobytes(),
b"depth", depth.tobytes(),
b"timestamp", str(ts).encode()
])
finally:
pipeline.stop()
if __name__ == "__main__":
start_camera_publisher()
Subscriber (On the Engineer's Machine)
The subscriber connects through the SSH tunnel and renders the stream locally:
import zmq
import numpy as np
import cv2
def start_viewer(port=5555):
ctx = zmq.Context()
sock = ctx.socket(zmq.SUB)
sock.connect(f"tcp://localhost:{port}")
sock.subscribe(b"")
sock.setsockopt(zmq.RCVHWM, 2)
print("[viewer] Connected, waiting for frames...")
while True:
parts = sock.recv_multipart()
# Parse color frame
color_idx = parts.index(b"color")
color_jpg = parts[color_idx + 1]
color = cv2.imdecode(
np.frombuffer(color_jpg, dtype=np.uint8),
cv2.IMREAD_COLOR
)
# Parse depth frame
depth_idx = parts.index(b"depth")
depth_raw = parts[depth_idx + 1]
depth = np.frombuffer(
depth_raw, dtype=np.uint16
).reshape(480, 640)
# Colorize depth for visualization
depth_color = cv2.applyColorMap(
cv2.convertScaleAbs(depth, alpha=0.03),
cv2.COLORMAP_JET
)
cv2.imshow("Robot Color", color)
cv2.imshow("Robot Depth", depth_color)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
if __name__ == "__main__":
start_viewer()
With JPEG compression on the color stream, the bandwidth requirement sits around 2-3 Mbps, which is manageable over most client VPN connections. Latency is typically 100-200ms, which is more than sufficient for monitoring.
Remote Skill Deployment
Once tunnels are established and cameras are streaming, the actual deployment is handled by a standardized script. Every deployment follows the same sequence: stop the running service, back up the current model, push the new model and config, restart the service, run a smoke test.
#!/bin/bash
# deploy_skill.sh — Remote skill deployment
set -euo pipefail
ROBOT_HOST="robot-client-alpha"
SKILL_NAME="${1:?Usage: deploy_skill.sh }"
MODEL_PATH="./trained_models/${SKILL_NAME}/policy_latest.onnx"
CONFIG_PATH="./configs/${SKILL_NAME}/runtime_config.yaml"
REMOTE_DIR="/opt/habil/skills/${SKILL_NAME}"
echo "[deploy] Deploying ${SKILL_NAME} to ${ROBOT_HOST}"
# 1. Pre-flight checks
echo "[deploy] Running pre-flight checks..."
ssh "${ROBOT_HOST}" "nvidia-smi --query-gpu=memory.free \
--format=csv,noheader" | awk '{
if ($1 < 2000) {
print "[FAIL] Insufficient GPU memory"; exit 1
}
}'
# 2. Stop the running skill service
echo "[deploy] Stopping current skill service..."
ssh "${ROBOT_HOST}" "sudo systemctl stop habil-skill@${SKILL_NAME} \
|| true"
# 3. Backup current deployment
echo "[deploy] Backing up current deployment..."
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
ssh "${ROBOT_HOST}" "
if [ -d ${REMOTE_DIR} ]; then
cp -r ${REMOTE_DIR} ${REMOTE_DIR}.bak.${TIMESTAMP}
fi
"
# 4. Push new model and config
echo "[deploy] Uploading model and config..."
ssh "${ROBOT_HOST}" "mkdir -p ${REMOTE_DIR}"
scp "${MODEL_PATH}" "${ROBOT_HOST}:${REMOTE_DIR}/policy.onnx"
scp "${CONFIG_PATH}" "${ROBOT_HOST}:${REMOTE_DIR}/config.yaml"
# 5. Restart service
echo "[deploy] Starting skill service..."
ssh "${ROBOT_HOST}" "sudo systemctl start habil-skill@${SKILL_NAME}"
sleep 3
# 6. Smoke test
echo "[deploy] Running smoke test..."
HEALTH=$(ssh "${ROBOT_HOST}" "curl -sf http://localhost:8501/health \
| python3 -c 'import sys,json; \
d=json.load(sys.stdin); \
print(d[\"status\"])'")
if [ "${HEALTH}" = "healthy" ]; then
echo "[deploy] SUCCESS — ${SKILL_NAME} is running"
else
echo "[deploy] FAIL — Rolling back..."
ssh "${ROBOT_HOST}" "
sudo systemctl stop habil-skill@${SKILL_NAME}
rm -rf ${REMOTE_DIR}
mv ${REMOTE_DIR}.bak.${TIMESTAMP} ${REMOTE_DIR}
sudo systemctl start habil-skill@${SKILL_NAME}
"
echo "[deploy] Rolled back to previous version"
exit 1
fi
The automatic rollback is critical. If a deployment fails its health check, the script restores the previous version within seconds. This gives us the confidence to deploy frequently without requiring someone on-site to catch failures.
Real-Time Monitoring and Debugging
Deployment is only half the story. Once a skill is running, we need to watch it perform and catch problems early.
Telemetry Stack
Every robot runs a lightweight telemetry agent that reports:
- Joint positions and velocities — 50Hz sampling, streamed over ZMQ
- Model inference latency — Per-frame timing for the ONNX policy
- Contact forces — From the robot's force-torque sensors
- System metrics — CPU, GPU, memory, thermals from the Jetson
All telemetry feeds into a Grafana dashboard accessible through the SSH tunnel on port 9090. An engineer can watch joint trajectories, inference times, and system health in real time while the robot executes a skill.
Emergency Stop
Every robot has a software emergency stop exposed on port 7777. It is a simple HTTP endpoint that commands the robot to freeze all joints and enter a safe resting position. The E-stop interface runs as a web page — one large red button that an engineer can hit from anywhere in the world with an active tunnel.
We also configure hardware E-stops on every robot, but the software E-stop gives remote engineers the ability to intervene without calling someone on-site.
Structured Logging
All skill execution logs use structured JSON format and stream to a centralized logging service. When something goes wrong, an engineer can query logs by skill name, timestamp, severity, or specific joint IDs without SSHing into the robot at all.
When Remote Does Not Work
We are not dogmatic about remote work. Roughly 20% of our engagements require someone on-site, and pretending otherwise would be dishonest.
Here is what typically requires physical presence:
- Initial hardware setup — Unboxing, assembly, networking, and first-boot calibration need hands on the robot. We handle this during the onboarding phase.
- Sensor calibration — Camera extrinsic calibration, force-torque sensor zeroing, and IMU alignment require precise physical measurements. Getting these wrong poisons everything downstream.
- Physical hardware failures — A loose connector, a degraded motor, a damaged cable. These need a human to diagnose and fix. We maintain a network of field engineers for these situations.
- New environment mapping — When the robot operates in a new physical space, the first map generation and landmark calibration benefits from someone walking the space.
- High-stakes demonstrations — When a client has stakeholders in the room and the demo needs to be flawless, we send an engineer. Some things should not be left to a remote connection.
The key insight is that these on-site requirements are front-loaded. Once a robot is set up, calibrated, and connected, the ongoing skill development and deployment is almost entirely remote.
Results
We have been running this workflow in production for over a year. The numbers speak for themselves:
- 80% remote delivery — Four out of five skill deployments happen without anyone on-site
- Typical turnaround — A new manipulation skill goes from client request to deployed-and-tested in 2-3 weeks
- Deployment frequency — We push updates to active robots an average of 3 times per week
- Rollback rate — Less than 5% of deployments trigger an automatic rollback
- Geographic reach — We currently serve clients in 4 US states, Germany, the UK, and Singapore, all from our Bangalore office
The remote-first approach is not just about cost savings. It gives us access to a global client base without needing engineers on every continent. A client in Germany gets the same engineering team and the same turnaround as a client down the street.
The best part of working remote-first is that deployment quality actually improved. When you know you cannot walk over and fix something manually, you build better automated safeguards.