Cloud Sync
When the Iotistica agent is connected to the Iotistica Cloud, it operates on a continuous synchronisation loop: the cloud sends down a desired state, the agent compares it against what is actually running, and automatically reconciles the difference. This happens without any manual intervention and survives network outages.
Overview
Cloud sync has two parallel tracks running at all times:
| Track | Direction | What it does |
|---|---|---|
| Target state polling | Cloud → Agent | Downloads what the cloud wants the agent to be doing |
| State reporting | Agent → Cloud | Uploads what the agent is actually doing right now |
Together they give the cloud a live view of every agent and allow it to push configuration changes — new applications, updated images, changed endpoint settings — that the agent picks up and applies automatically.
Provisioning
Before an agent can sync with the cloud it must be provisioned — a one-time registration that establishes its identity.
During provisioning the agent:
- Generates a unique API key using a cryptographic proof-of-possession protocol.
- Sends the key fingerprint to the cloud provisioning endpoint along with the provisioning token.
- Receives back a device UUID, tenant assignment, and (optionally) VPN credentials.
- Stores the UUID and API key locally in SQLite. All subsequent cloud communication is authenticated with this key.
After provisioning the agent UUID is fixed for the lifetime of the device. The provisioning token is only needed once.
Provisioning happens automatically on first boot when PROVISIONING_KEY is set in the environment. If the agent loses its database, reprovisioning will generate a new UUID and the old device entry in the cloud will need to be removed.
Target State
The target state is the cloud's description of what the agent should be doing. It contains:
- Applications — which Docker containers to run, their images, port mappings, environment variables, volume mounts, and restart policies.
- Configuration — agent-level settings such as polling intervals the cloud can push down dynamically.
- Version number — a monotonically increasing counter the agent uses to detect when the state has changed.
The cloud holds the authoritative copy. The agent treats it as the source of truth for what should be running.
How the Agent Receives It
The State Poller fetches the target state from the cloud REST API on a configurable interval (default: 60 seconds). It uses HTTP ETags so that if the state has not changed since the last poll, the cloud returns 304 Not Modified and no work is done.
The ETag is persisted in SQLite across restarts, so the agent does not re-download the full state every time it boots.
The cloud can also push state changes in real time via MQTT, bypassing the poll interval entirely. When an MQTT push arrives the agent applies it immediately and the next scheduled poll is skipped.
Circuit Breaker
If the cloud API becomes unreachable, the state poller uses exponential backoff with jitter:
| Attempt | Wait before retry |
|---|---|
| 1st failure | 15 seconds |
| 2nd failure | ~30 seconds |
| 3rd failure | ~60 seconds |
| … | doubles each time (±30 % jitter) |
| Max | 15 minutes |
After 10 consecutive failures the circuit breaker trips and polling pauses for 5 minutes before trying again. This prevents the agent from hammering a cloud endpoint that is down.
Current State
The current state is what the agent observes on the device right now. It is always read directly from Docker — never assumed.
For each managed container the current state records:
- Container ID
- Running / stopped / exited / error status
- Image currently in use
- Error details if the container failed to start (ImagePullBackOff, StartFailure, CrashLoopBackOff)
The agent refreshes current state from Docker every time it needs to make a reconciliation decision, so it always works from fresh data rather than a stale cache.
Reconciliation
Reconciliation is the process of moving current state towards target state. It runs automatically whenever:
- A new target state is received from the cloud.
- The agent starts up and detects a difference between what is stored and what Docker reports.
- A service action (start, stop, restart) is triggered from the admin UI.
How It Works
Target state (from cloud)
│
▼
Reconciliation Planner
│ computes diff
▼
Step list:
├── createNetwork
├── pullImage
├── createContainer
├── startContainer
├── updateContainer (zero-downtime where possible)
└── removeContainer
│
▼
Step Executor
│ runs each step against Docker
▼
Current state (updated)
The planner computes the minimal set of operations needed — it does not tear down and rebuild containers that haven't changed. If only the environment variables of one service in a three-service application change, only that container is recreated.
Zero-Downtime Updates
When a running container needs to be updated (image change, port change, config change), the agent uses a blue-green swap:
- Pull the new image (while the old container keeps serving traffic).
- Create a new container with a temporary name.
- Stop the old container.
- Rename the new container to the canonical name.
If the new container fails to start, the old one is not removed and can be restarted manually.
Error Tracking
If a container repeatedly fails to start, the agent tracks the error Kubernetes-style:
| Error type | Meaning |
|---|---|
ErrImagePull | Docker could not pull the image (bad tag, registry unreachable, auth failure) |
ImagePullBackOff | Image pull failed and the agent is backing off before retrying |
StartFailure | Container was created but exited immediately on start |
CrashLoopBackOff | Container starts and crashes repeatedly |
These are visible in the Applications page as a red tag on the service row. Hover over the tag to read the full error message.
State Reporting
While the state poller brings configuration down from the cloud, the State Reporter sends the agent's actual status up.
Reports are sent every 10 seconds by default and include:
| Field | Description |
|---|---|
| Running apps | Which containers are running, their status and image |
| Endpoints health | Status of each protocol adapter (connected, polling, error) |
| System metrics | CPU, memory, storage, temperature, uptime |
| Network interfaces | IP addresses, MAC, link state, Wi-Fi signal |
| Agent version | Software version currently running |
| OS version | Host operating system version |
| Publish health | MQTT and cloud transport connection status |
An immediate report is also triggered automatically after every reconciliation completes and after every MQTT reconnect, so the cloud gets an up-to-date picture right after any change.
System metrics (a heavier snapshot) are reported less frequently, on a separate 5-minute interval.
Diff-Only Reporting
The reporter compares each outgoing report to the previous one and only sends fields that changed. This reduces bandwidth on constrained connections. A full report is always sent after reconnect.
Offline Resilience
The connection between the agent and the cloud can drop at any time. The agent handles this without losing data or getting stuck.
Publish Modes
The transport layer operates in one of three modes:
| Mode | Meaning |
|---|---|
direct | Cloud is reachable. Reports are sent immediately. |
buffer-only | Cloud is unreachable. Reports are written to the SQLite offline queue instead of sent. |
recovering | MQTT just reconnected. Reports are being sent but the queue is also draining. |
The mode switches automatically based on connection health. The admin UI and API surface the current mode so you can tell at a glance whether the agent is syncing.
Offline Queue
When the agent is in buffer-only mode, all outgoing state reports accumulate in a local SQLite queue. When the connection is restored the queue drains automatically in order, so the cloud receives a complete timeline of what happened during the outage.
You can inspect the queue depth and oldest queued entry via the /v1/buffer/status endpoint.
Connection Health States
| State | Meaning |
|---|---|
| Online | Both polls and reports succeeding |
| Degraded | Some failures but not enough to trip the circuit breaker |
| Offline | Circuit breaker tripped; polling and reporting paused |
The agent emits these transitions as internal events so other subsystems (MQTT publish mode, health arbiter) can react in real time.
Configuration
Cloud sync is configured via environment variables:
| Variable | Default | Description |
|---|---|---|
IOTISTICA_API | — | Base URL of the Iotistica Cloud API. Required for cloud sync. |
PROVISIONING_KEY | — | Token used for initial provisioning. Not needed after first boot. |
STANDALONE | false | Set to true to disable all cloud sync and run fully offline. |
MQTT_BROKER_URL | — | Cloud MQTT broker URL for real-time state push. Optional; polling works without it. |
The cloud can dynamically override the polling and reporting intervals by including them in the target state response. Agent-side environment variables set the startup defaults; the cloud values take precedence once the first poll succeeds.
Running Without Cloud
The agent is designed to run fully offline with no cloud connection. Set STANDALONE=true (or omit CLOUD_API_ENDPOINT) and the agent operates entirely in standalone mode:
- No target state is fetched; all configuration is managed locally through the admin UI.
- No state reports are sent.
- All data stays on the device.
- The offline queue is not used.
Everything else — endpoints, destinations, subscriptions, applications, discovery, anomaly detection — works identically in standalone mode. You can switch to cloud-connected mode later by adding the environment variables and reprovisioning without changing any local configuration.
Related Docs
- Applications — deploying containers, which cloud sync manages via target state
- Settings — agent configuration including cloud connection details
- Quick Start — first-time setup including provisioning