deepstream.io is currently under reconstruction for the V4 release!

For V3 and enterprise documentation please go to https://deepstreamhub.com

To continue click here

v4.0.0 - JavaScript SDK

Features:

  • New binary protocol support (under the hood)
  • Bulk actions support (under the hood)
  • Full typescript declaration files
  • Promises everywhere! Long live async/await!
  • Offline record support
{
    offlineEnabled: false,
    saveUpdatesOffline: false,
    indexdb: {
        dbVersion: 1,
        primaryKey: 'id',
        storageDatabaseName: 'deepstream',
        defaultObjectStoreName: 'records',
        objectStoreNames: [],
        ignorePrefixes: [],
        flushTimeout: 50
    }
}
  • Customizable offline storage support
export type offlineStoreWriteResponse = ((error: string | null, recordName: string) => void)

export interface RecordOfflineStore {
  isReady: boolean,
  get: (recordName: string, callback: ((recordName: string, version: number, data: RecordData) => void)) => void
  set: (recordName: string, version: number, data: RecordData, callback: offlineStoreWriteResponse) => void
  delete: (recordName: string, callback: offlineStoreWriteResponse) => void
}

Improvements

  • Separation of errors and warnings for clarity. Non critical failures (such as an ack timeout) can now be treated separated or fully muted.
  • Enhanced services to reduce timeout overhead

Backwards compatibility

  • Only works with V4 server
  • All single response APIs now return promises when not providing a callback. This means most APIs that could have been chained would now break.
const client = deepstream()
await client.login()

const record = client.record.getRecord(name)
await record.whenReady()

try {
    const data = await client.record.snapshot(name)
    const version = await client.record.head(name)
    const exists = await client.record.has(name)
    const result = await client.rpc.make(name, data)
    const users = await client.presence.getAll()
} catch (e) {
    console.log('Error occurred', e)
}
  • Listening

The listening API has been ever so slightly tweaked in order to simplify removing an active subscription.

Before when an active provider was started you would usually need to store it in a higher scope, for example:

const listeners = new Map()

client.record.listen('users/.*', (name, isSubscribed, ({ accept, reject }) => {
    if (isSubscribed) {
        const updateInterval = setInterval(updateRecord.bind(this, name), 1000)
        listeners.set(name, updateInterval)
        accept()
    } else {
        clearTimeout(listeners.get(name))
        listeners.delete(name)
    }
})

Where now we instead do:

const listeners = new Map()

client.record.listen('users/.*', (name, ({ accept, reject, onStop }) => {
    const updateInterval = setInterval(updateRecord.bind(this, name), 1000)
    accept()

    onStop(() => clearTimeout(updateInterval))
})

TLDR:

Binary Protocol

The driver behind pretty much all of the V4 refactor was our move from our old text based protocol to binary. Before you ask, while we might add actual binary data support in deepstream we still currently use it to parse JSON payloads. But it makes building SDKs and new features so much easier. Seriously. LIKE SO MUCH EASIER.

Okay so first things first, the structure of text vs binary messages:

V3 -Text:

TOPIC | ACTION | meta1 | meta2 | ...metaN | payload +

This string had the initial TOPIC and ACTION read by the parser to find out where to route it, and the rest of the data was figured out within the code module that dealt with it. This gave some benefits like only parsing a full message once its actually required, but also meant that the message parsing code was distibuted and adding for example a meta field would require lots of refactoring. Tests also had to create text based messages even when testing internal code paths. Payload serialization also didn't use JSON, but instead used a custom form of serialization to minimize bandwidth: U for undefined, T for true, F for false, O for object, S prefix for string and a N prefix for number.

So the message object in V3 SDKs and server were like:

{
    "topic": "R",
    "action": "S",
    "data": ["A", "recordName"]
}

V4 - Binary:

 /*
 *  0                   1                   2                   3
 *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 * +-+-------------+-+-------------+-------------------------------+
 * |F|  Message    |A|  Message    |             Meta              |
 * |I|   Topic     |C|  Action     |            Length             |
 * |N|    (7)      |K|   (7)       |             (24)              |
 * +-+-------------+-+-------------+-------------------------------+
 * | Meta Cont.    |              Payload Length (24)              |
 * +---------------+-----------------------------------------------+
 * :                     Meta Data (Meta Length * 8)               :
 * + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 * |                  Payload Data (Payload Length * 8)            :
 * +---------------------------------------------------------------+
 *
 * The first 6 bytes of the message are the header, and the rest of 
 * the message is the payload.
 *
 * CONT (1 bit): The continuation bit. If this is set, the following
 * payload of the following message must be appended to this one. 
 * If this is not set, parsing may finish after the payload is read.
 *
 * RSV{0..3} (1 bit): Reserved for extension.
 *
 * Meta Length (24 bits, unsigned big-endian): The total length of 
 *                Meta Data in bytes.
 *                If Meta Data can be no longer than 16 MB.
 *
 * Payload Length (24 bits, unsigned big-endian): The total length of 
 *                Payload in bytes.
 *                If Payload is longer than 16 MB, it must be split into 
 *                chunks of less than 2^24 bytes with identical topic and
 *                action, setting the CONT bit in all but the final chunk.
 */

The binary protocol is utf8 based, with some bit shifting for things like ACKs for easier parsing. The only time deepstream actually creates or sees this object is in the parser itself, meaning as far as the code is concerned the actual protocol can change at any time.

The objects used within V4 SDKs and server look like this:

{
    "topic": 3,
    "action": 2,
    "isAck": true,
    "name": "recordName"
}

This makes writing code alot easier. At the time of writing out full message API that can be consumed by any SDK is as follows:

export interface Message {
    topic: TOPIC
    action: ALL_ACTIONS
    name?: string

    isError?: boolean
    isAck?: boolean

    isBulk?: boolean
    bulkId?: number
    bulkAction?: ALL_ACTIONS

    data?: string | Buffer
    parsedData?: RecordData | RPCResult | EventData | AuthData
    payloadEncoding?: PAYLOAD_ENCODING

    parseError?: false

    raw?: string | Buffer

    originalTopic?: TOPIC
    originalAction?: ALL_ACTIONS
    subscription?: string
    names?: Array<string>
    isWriteAck?: boolean
    correlationId?: string
    path?: string
    version?: number
    reason?: string
    url?: string
    protocolVersion?: string
}

Using this approach has made adding new features and maintaining current ones significantly easier. And the given the combination of TOPICs and ACTIONs we can pretty much ensure we'll be able to extend it without running out of space any time soon.

Cons

It wouldn't be fair to say that this overhaul has no downsides. There have been some sacrifices that we had to make along the way.

1) If you count messages in the billions, those extra control bytes add up. Data bandwidth is quite expensive on cloud systems so lack of compression isn't just a latency issue anymore.

2) Our meta data is a JSON object. It's predefined meaning we can have a much more optimial parser than those built in, and we minimize space by using abbreviations for the metadata names. However it still takes longer to parse and more bandwidth to transfer. There are optimizations planned to allow all this to happen further down in C++ land to reduce the weight of this occuring on the main node thread, but it's a small step back in optimal performance.

Why yet another proprietry protocol?

Because deepstream offers some very specific features, and has alot more on the way. For example we currently have a unique concept such as listening. We are also looking to release a monitoring topic in the 4.1 release, better OS clustering integration in 4.2 and an admin API in 4.3. Tying into another stack means we unfortunately can't move as quickly as we want with these features.

Offline Storage

Offline storage is probably the biggest feature in 4.0. So I’m really happy to say it has been added. But offline storage is one of the hardest things we worked on due to the insane amount of states it introduces. So it’s with a bit of regret that I say you should not use it if you want to immediately go into production! What would be extremely helpful is if you have it enabled in development incase you run into issues, and hopefully once all small glitches are resolved I’ll release a 4.1 with it being officially production ready. If you are using a data pattern where you don’t have to do updates via deepstream (only consuming message for visual realtime updates) then ignore that, production ready it is!

So why use it at all? Because it gives you full record usage without a connection. Pretty slick!

The way offline works is as follows (this is just one path, but most likely):

  • User opens app first time, data is requested from server and stored on client side.
  • User loses connection to app, but from an app perspective functionality remains the same
  • User updates multiple things while offline, sets the record to dirty and updates the value in local storage
  • User is back online
  • Deepstream requests the version of the record on deepstream. If its the same as the one locally it sends all the modifications as the next update, it it isn’t, it requests the data and does a merge conflict.

The reason why it would be production ready for read only scenarios is because the record is never marked as dirty, which means server side always wins:

  • User opens app first time, data is requested from server and stored on client side.
  • User loses connection to app, but from an app perspective functionality remains the same
  • User is back online
  • Deepstream requests the version of the record on deepstream. If its the same as the one locally it sends all the modifications as the next update, it it isn’t, it requests the data and just updates it.

Typescript

We converted the majority of the codebase to typescript, for the benefit of future code maintenance as well making it easier for people to contribute. This also means consumers of the SDK can now directly use the generated declaration files when installing deepstream rather than maintaining separate bindings.

Services

We added a few services to improve the way things work in the client.

Timer Service

We now have a timer service that all timers in the sdk are registered against, rather than using the native nodeJS timeouts. This gives us two benefits. First off its just generally much quicker, if you do a CPU profile of native timeouts you’ll notice the time used is noticeable, while instead we have a single interval to poll the timeout queue. Secondly it allows us to easily deal with timing slips. What this means is that in the future rather than timeouts being fired much later due to the CPU being blocked, the timer registry can either allow certain timeouts to be ignored or reset.

Bulk Subscription Service

We now register our subscriptions via a service rather than directly sending a subscription message. This allows us generate a single subscription message for up to thousands of records with a single ack rather than thousands.

Connection Service

We now have a connection service that is driven by a state machine that can be consumed by any class to send messages as well as listen to any connection lifecycle change.

Current API hooks for reconnection logic are:

public onLost (() => void): void
public onReestablished (() => void): void
public onExitLimbo (() => void): void

For those who looked into the SDK internals before you’ll notice the introduction of a limbo state. What this means is the connection was just lost, but you don’t want API calls to immediately start failing as a reconnect might be likely to immediately happen. As such feature developers now have the potential of buffering those requests until either the connection is reestablished or the buffer timeout is exceeded and all API calls will fail with a not connected error.