Skip to main content

Cloud Sync

When the Iotistica agent is connected to the Iotistica Cloud, it operates on a continuous synchronisation loop: the cloud sends down a desired state, the agent compares it against what is actually running, and automatically reconciles the difference. This happens without any manual intervention and survives network outages.


Overview

Cloud sync has two parallel tracks running at all times:

TrackDirectionWhat it does
Target state pollingCloud → AgentDownloads what the cloud wants the agent to be doing
State reportingAgent → CloudUploads what the agent is actually doing right now

Together they give the cloud a live view of every agent and allow it to push configuration changes — new applications, updated images, changed endpoint settings — that the agent picks up and applies automatically.


Provisioning

Before an agent can sync with the cloud it must be provisioned — a one-time registration that establishes its identity.

During provisioning the agent:

  1. Generates a unique API key using a cryptographic proof-of-possession protocol.
  2. Sends the key fingerprint to the cloud provisioning endpoint along with the provisioning token.
  3. Receives back a device UUID, tenant assignment, and (optionally) VPN credentials.
  4. Stores the UUID and API key locally in SQLite. All subsequent cloud communication is authenticated with this key.

After provisioning the agent UUID is fixed for the lifetime of the device. The provisioning token is only needed once.

tip

Provisioning happens automatically on first boot when PROVISIONING_KEY is set in the environment. If the agent loses its database, reprovisioning will generate a new UUID and the old device entry in the cloud will need to be removed.


Target State

The target state is the cloud's description of what the agent should be doing. It contains:

  • Applications — which Docker containers to run, their images, port mappings, environment variables, volume mounts, and restart policies.
  • Configuration — agent-level settings such as polling intervals the cloud can push down dynamically.
  • Version number — a monotonically increasing counter the agent uses to detect when the state has changed.

The cloud holds the authoritative copy. The agent treats it as the source of truth for what should be running.

How the Agent Receives It

The State Poller fetches the target state from the cloud REST API on a configurable interval (default: 60 seconds). It uses HTTP ETags so that if the state has not changed since the last poll, the cloud returns 304 Not Modified and no work is done.

The ETag is persisted in SQLite across restarts, so the agent does not re-download the full state every time it boots.

The cloud can also push state changes in real time via MQTT, bypassing the poll interval entirely. When an MQTT push arrives the agent applies it immediately and the next scheduled poll is skipped.

Circuit Breaker

If the cloud API becomes unreachable, the state poller uses exponential backoff with jitter:

AttemptWait before retry
1st failure15 seconds
2nd failure~30 seconds
3rd failure~60 seconds
doubles each time (±30 % jitter)
Max15 minutes

After 10 consecutive failures the circuit breaker trips and polling pauses for 5 minutes before trying again. This prevents the agent from hammering a cloud endpoint that is down.


Current State

The current state is what the agent observes on the device right now. It is always read directly from Docker — never assumed.

For each managed container the current state records:

  • Container ID
  • Running / stopped / exited / error status
  • Image currently in use
  • Error details if the container failed to start (ImagePullBackOff, StartFailure, CrashLoopBackOff)

The agent refreshes current state from Docker every time it needs to make a reconciliation decision, so it always works from fresh data rather than a stale cache.


Reconciliation

Reconciliation is the process of moving current state towards target state. It runs automatically whenever:

  • A new target state is received from the cloud.
  • The agent starts up and detects a difference between what is stored and what Docker reports.
  • A service action (start, stop, restart) is triggered from the admin UI.

How It Works

Target state (from cloud)


Reconciliation Planner
│ computes diff

Step list:
├── createNetwork
├── pullImage
├── createContainer
├── startContainer
├── updateContainer (zero-downtime where possible)
└── removeContainer


Step Executor
│ runs each step against Docker

Current state (updated)

The planner computes the minimal set of operations needed — it does not tear down and rebuild containers that haven't changed. If only the environment variables of one service in a three-service application change, only that container is recreated.

Zero-Downtime Updates

When a running container needs to be updated (image change, port change, config change), the agent uses a blue-green swap:

  1. Pull the new image (while the old container keeps serving traffic).
  2. Create a new container with a temporary name.
  3. Stop the old container.
  4. Rename the new container to the canonical name.

If the new container fails to start, the old one is not removed and can be restarted manually.

Error Tracking

If a container repeatedly fails to start, the agent tracks the error Kubernetes-style:

Error typeMeaning
ErrImagePullDocker could not pull the image (bad tag, registry unreachable, auth failure)
ImagePullBackOffImage pull failed and the agent is backing off before retrying
StartFailureContainer was created but exited immediately on start
CrashLoopBackOffContainer starts and crashes repeatedly

These are visible in the Applications page as a red tag on the service row. Hover over the tag to read the full error message.


State Reporting

While the state poller brings configuration down from the cloud, the State Reporter sends the agent's actual status up.

Reports are sent every 10 seconds by default and include:

FieldDescription
Running appsWhich containers are running, their status and image
Endpoints healthStatus of each protocol adapter (connected, polling, error)
System metricsCPU, memory, storage, temperature, uptime
Network interfacesIP addresses, MAC, link state, Wi-Fi signal
Agent versionSoftware version currently running
OS versionHost operating system version
Publish healthMQTT and cloud transport connection status

An immediate report is also triggered automatically after every reconciliation completes and after every MQTT reconnect, so the cloud gets an up-to-date picture right after any change.

System metrics (a heavier snapshot) are reported less frequently, on a separate 5-minute interval.

Diff-Only Reporting

The reporter compares each outgoing report to the previous one and only sends fields that changed. This reduces bandwidth on constrained connections. A full report is always sent after reconnect.


Offline Resilience

The connection between the agent and the cloud can drop at any time. The agent handles this without losing data or getting stuck.

Publish Modes

The transport layer operates in one of three modes:

ModeMeaning
directCloud is reachable. Reports are sent immediately.
buffer-onlyCloud is unreachable. Reports are written to the SQLite offline queue instead of sent.
recoveringMQTT just reconnected. Reports are being sent but the queue is also draining.

The mode switches automatically based on connection health. The admin UI and API surface the current mode so you can tell at a glance whether the agent is syncing.

Offline Queue

When the agent is in buffer-only mode, all outgoing state reports accumulate in a local SQLite queue. When the connection is restored the queue drains automatically in order, so the cloud receives a complete timeline of what happened during the outage.

You can inspect the queue depth and oldest queued entry via the /v1/buffer/status endpoint.

Connection Health States

StateMeaning
OnlineBoth polls and reports succeeding
DegradedSome failures but not enough to trip the circuit breaker
OfflineCircuit breaker tripped; polling and reporting paused

The agent emits these transitions as internal events so other subsystems (MQTT publish mode, health arbiter) can react in real time.


Configuration

Cloud sync is configured via environment variables:

VariableDefaultDescription
IOTISTICA_APIBase URL of the Iotistica Cloud API. Required for cloud sync.
PROVISIONING_KEYToken used for initial provisioning. Not needed after first boot.
STANDALONEfalseSet to true to disable all cloud sync and run fully offline.
MQTT_BROKER_URLCloud MQTT broker URL for real-time state push. Optional; polling works without it.

The cloud can dynamically override the polling and reporting intervals by including them in the target state response. Agent-side environment variables set the startup defaults; the cloud values take precedence once the first poll succeeds.


Running Without Cloud

The agent is designed to run fully offline with no cloud connection. Set STANDALONE=true (or omit CLOUD_API_ENDPOINT) and the agent operates entirely in standalone mode:

  • No target state is fetched; all configuration is managed locally through the admin UI.
  • No state reports are sent.
  • All data stays on the device.
  • The offline queue is not used.

Everything else — endpoints, destinations, subscriptions, applications, discovery, anomaly detection — works identically in standalone mode. You can switch to cloud-connected mode later by adding the environment variables and reprovisioning without changing any local configuration.


  • Applications — deploying containers, which cloud sync manages via target state
  • Settings — agent configuration including cloud connection details
  • Quick Start — first-time setup including provisioning