v4.0.0

Features:

  • New binary protocol support (under the hood)
  • Bulk actions support (under the hood)
  • Full typescript declaration files
  • Promises everywhere! Long live async/await!
  • Offline record support
{
    // Use indexdb to store data client side
    offlineEnabled: false,
    // Save each update as it comes in from the server
    saveUpdatesOffline: false,
    indexdb: {
        // The db version, incrementing this triggers a db upgrade
        dbVersion: 1,
        // This auto updates the indexdb version if the objectStore names change
        autoVersion: false,
        // The key to index records by
        primaryKey: 'id',
        // The indexdb databae name
        storageDatabaseName: 'deepstream',
        // The default store name if not using a '/' to indicate the object store (example person/uuid)
        defaultObjectStoreName: 'records',
        // The object store names, required in advance due to how indexdb works
        objectStoreNames: [],
        // Things to not save, such search results
        ignorePrefixes: [],
        // The amount of time to buffer together actions before making a request
        flushTimeout: 50
    }
}
  • Customizable offline storage support
export type offlineStoreWriteResponse = ((error: string | null, recordName: string) => void)

export interface RecordOfflineStore {
  get: (recordName: string, callback: ((recordName: string, version: number, data: RecordData) => void)) => void
  set: (recordName: string, version: number, data: RecordData, callback: offlineStoreWriteResponse) => void
  delete: (recordName: string, callback: offlineStoreWriteResponse) => void
}

Improvements

  • Separation of errors and warnings for clarity. Non critical failures (such as an ack timeout) can now be treated separated or fully muted.
  • Enhanced services to reduce timeout overhead

Backwards compatibility

  • Only works with V4 server
  • All single response APIs now return promises when not providing a callback. This means most APIs that could have been chained would now break.
const client = deepstream()
try {
    await client.login()

    const record = client.record.getRecord(name)
    await record.whenReady()

    const data = await client.record.snapshot(name)
    const version = await client.record.head(name)
    const exists = await client.record.has(name)
    const result = await client.rpc.make(name, data)
    const users = await client.presence.getAll()
} catch (e) {
    console.log('Error occurred', e)
}
  • Listening

The listening API has been ever so slightly tweaked in order to simplify removing an active subscription.

Before when an active provider was started you would usually need to store it in a higher scope, for example:

const listeners = new Map()

client.record.listen('users/.*', (name, isSubscribed, ({ accept, reject }) => {
    if (isSubscribed) {
        const updateInterval = setInterval(updateRecord.bind(this, name), 1000)
        listeners.set(name, updateInterval)
        accept()
    } else {
        clearTimeout(listeners.get(name))
        listeners.delete(name)
    }
})

Where now we instead do:

const listeners = new Map()

client.record.listen('users/.*', (name, ({ accept, reject, onStop }) => {
    const updateInterval = setInterval(updateRecord.bind(this, name), 1000)
    accept()

    onStop(() => clearTimeout(updateInterval))
})

TLDR:

Binary Protocol

Binary Protocol

The driver behind pretty much all of the V4 refactor was our move from our old text based protocol to binary. It makes building SDKs and new features so much easier. Seriously. LIKE SO MUCH EASIER.

Okay so first things first, the structure of text vs binary messages:

V3 -Text:

TOPIC | ACTION | meta1 | meta2 | ...metaN | payload +

This string had the initial TOPIC and ACTION read by the parser to find out where to route it, and the rest of the data was figured out within the code module that dealt with it. This gave some benefits like only parsing a full message once its actually required, but also meant that the message parsing code was distibuted and adding for example a meta field would require lots of refactoring. Tests also had to create text based messages even when testing internal code paths. Payload serialization also didn't use JSON, but instead used a custom form of serialization to minimize bandwidth: U for undefined, T for true, F for false, O for object, S prefix for string and a N prefix for number.

So the message object in V3 SDKs and server were like:

{
    "topic": "R",
    "action": "S",
    "data": ["A", "recordName"]
}

V4 - Binary:

The binary protocol is implemented using protobuf. The decision to use proto was due to its wide support of other languages, it's ease of formats and how quickly we managed to get it implemented.

The main message is simply this:

message Message {
  TOPIC topic = 2;
  bytes message = 3;
}

While individual messages use a combination of an action enum and fields.

For example, the event message looks something like this:

message EventMessage {
    required EVENT_ACTION action = 1;
    string data = 2;
    string correlationId = 3;
    bool isError = 4;
    bool isAck = 5;
    string name = 6;

    repeated string names = 7;
    string subscription = 8;

    TOPIC originalTOPIC = 10;
    EVENT_ACTION originalAction = 11;
}

An example representation that deepstream would get translated within the JS SDKs looks like this:

{
    "topic": 3,
    "action": 2,
    "isAck": true,
    "name": "event"
}

This makes writing code alot easier. At the time of writing the full message API that can be consumed is as follows:

export interface Message {
    topic: TOPIC
    action: ALL_ACTIONS
    name?: string

    isError?: boolean
    isAck?: boolean

    data?: string | Buffer
    parsedData?: RecordData | RPCResult | EventData | AuthData

    parseError?: false

    // listen
    subscription?: string

    originalTopic?: TOPIC | STATE_REGISTRY_TOPIC
    originalAction?: ALL_ACTIONS
    names?: Array<string>
    reason?: string

    // connection
    url?: string
    protocolVersion?: string

    // record
    isWriteAck?: boolean
    correlationId?: string
    path?: string
    version?: number
    versions?: { [index: string]: number }

    // state
    checksum?: number
    fullState?: Array<string>
    serverName?: string
    registryTopic?: TOPIC

    // cluster
    leaderScore?: number
    externalUrl?: string,
    role?: string

    // lock
    locked?: boolean
}

Using this approach has made adding new features and maintaining current ones significantly easier. And the given combination of TOPICs and ACTIONs we can pretty much ensure we'll be able to extend it without running out of space any time soon.

Cons

It wouldn't be fair to say that this overhaul has no downsides. There have been some sacrifices that we had to make along the way.

1) If you count messages in the billions, those extra bytes add up. Data bandwidth is quite expensive on cloud systems so lack of compression isn't just a latency issue anymore. Protobuf has some very good compression algorithms which defeats JSON objects in most cases.

Why yet another proprietary standard?

Because deepstream offers some very specific features, and has alot more on the way. For example we currently have a unique concept such as listening. Trying to use a realtime standard (which there aren't many of) would seriously hinder development. That being said deepstream allows swapping out of protocols quite easily as long as theres an interop layer so feel free to create compatibility protocols to work with your favourite SDKs!

Offline Storage

Offline storage is probably the biggest feature in 4.0. So I’m really happy to say it has been added. But offline storage is one of the hardest things we worked on due to the insane amount of states it introduces. So it’s with a bit of regret that I say you should not use it if you want to immediately go into production! What would be extremely helpful is if you have it enabled in development incase you run into issues, and hopefully once all small glitches are resolved I’ll release a 4.1 with it being officially production ready. If you are using a data pattern where you don’t have to do updates via deepstream (only consuming message for visual realtime updates) then ignore that, production ready it is!

So why use it at all? Because it gives you full record usage without a connection. Pretty slick!

The way offline works is as follows (this is just one path, but most likely):

  • User opens app first time, data is requested from server and stored on client side.
  • User loses connection to app, but from an app perspective functionality remains the same
  • User updates multiple things while offline, sets the record to dirty and updates the value in local storage
  • User is back online
  • Deepstream requests the version of the record on deepstream. If its the same as the one locally it sends all the modifications as the next update, it it isn’t, it requests the data and does a merge conflict.

The reason why it would be production ready for read only scenarios is because the record is never marked as dirty, which means server side always wins:

  • User opens app first time, data is requested from server and stored on client side.
  • User loses connection to app, but from an app perspective functionality remains the same
  • User is back online
  • User requests the version of the record on deepstream. If its the same as the one locally it, so doesn’t do anything more. If it isn’t, it requests the data and assumes its the latest (using a remote wins algorithm).

Typescript

We converted the majority of the codebase to typescript, for the benefit of future code maintenance as well making it easier for people to contribute. This also means consumers of the SDK can now directly use the generated declaration files when installing deepstream rather than maintaining separate bindings.

Services

We added a few services to improve the way things work in the client.

Timer Service

We now have a timer service that all timers in the sdk are registered against, rather than using the native nodeJS timeouts. This gives us two benefits. First off its just generally much quicker, if you do a CPU profile of native timeouts you’ll notice the time used is noticeable, while instead we have a single interval to poll the timeout queue. Secondly it allows us to easily deal with timing slips. What this means is that in the future rather than timeouts being fired much later due to the CPU being blocked, the timer registry can either allow certain timeouts to be ignored or reset.

Bulk Subscription Service

We now register our subscriptions via a service rather than directly sending a subscription message. This allows us generate a single subscription message for up to thousands of records with a single ack rather than thousands.

Connection Service

We now have a connection service that is driven by a state machine that can be consumed by any class to send messages as well as listen to any connection lifecycle change.

Current API hooks for reconnection logic are:

public onLost (() => void): void
public onReestablished (() => void): void
public onExitLimbo (() => void): void

For those who looked into the SDK internals before you’ll notice the introduction of a limbo state. What this means is the connection was just lost, but you don’t want API calls to immediately start failing as a reconnect might be likely to immediately happen. As such feature developers now have the potential of buffering those requests until either the connection is reestablished or the buffer timeout is exceeded and all API calls will fail with a not connected error.