3D camera streaming with Javascript and Chrome
- Written by Omar Alan Osorio
Depth cameras such as Microsoft’s Kinect or Primesense’s ASUZ are devices that provides environment information in different channels, mainly RGB and depth distance. To enabled users to utilize these devices, developers must provide installable controllers for the specific user’s operating system. In some cases, these controllers are bounded to vendor’s frameworks and therefor not available to modification or portability. So this becomes the first challenge. How can we enable this technology into users’s personal computer in the simplest and compatible way?
_ Google's Chrome Apps API
Chrome allow web developers to securely access some of the operating system components that are not available through standard commercial web browsers: file system, network sockets, USB connections, to mention a few. Connecting a USB 3D camera via Chrome Apps will be the first approach to eliminate the user’s need to install drivers per device, per operating system.
Opening a connection to a USB device through Chrome’s API can be as simple as:
filter = { vendorId: 0x1d27, productId: 0x0601 } chrome.usb.getDevices filter, (chrome_devices) => chrome_device = chrome_devices[0] chrome.usb.openDevice chrome_device, (connection) => if connection # opened!
There are different ways to send and to send-receive data from a USB device:
- Control Transfers
- Isochronous Transfers
- Bulk Transfers
Deciding which one to use is completely up to the vendor’s specifications. For example: Kinect use control transfers to stream the RGB and Depth data, and Primesense ASUZ use bulk transfers for the same purpose.
After working with different devices it was possible to identify a common communication process:
- Device connection
- Camera information recollection
- Configuration setup
- Streaming (initialization and finalization)
So it is possible to design a framework with an abstract definition of each step, and then implement each one for every device. This will allow future development to incorporate more devices under the same framework.
_ PrimeSense Firmware Hack > 5.0
Once we opened the device with Chrome USB API lets get the firmware version and initialize the configuration. To establish a clear communication with the firmware we need to prepare the host protocol. In this case USB Control Transfers packages will do the work with two main sequences, one for sending data, and another for asking data back from the device.
This is an example of a control transfer using the PrimeSense firmware's host protocol, and it will ask to the camera the VERSION of the firmware.
# Control Transfer OUT + Host Protocol 0x40 0x00 0x00 0x00 magic 0x00 0x00 0x00 0x00 0x00 0x00 # Control Transfer IN 0xC0 0x00 0x00 0x00 0x20 0x00 # Response magic data_size 0x00 0x00 minor major build chip fpga sysversion
Making this transfers done with the Chrome API is as simple as:
control: (rtype, req, val, idx, data_or_length) -> new RSVP.Promise (resolve, reject) => req_obj = { direction: ..., requestType: ..., recipient: ..., request: ..., value: ..., index: ..., data: ..., length: ... } chrome.usb.controlTransfer(@connection, req_obj, (info)=> if info?.resultCode == 0 then resolve(info.data) else reject(info?.resultCode) )
The direction, requestType and recipient can be masked out from one single byte, rtype. the request, value and index must be provided. Finally, if the request type is OUT then set the data attribute, otherwise, if receiving data then set the length attribute. If the resultCode is zero then we can unpack the data.
After obtaining dozens of parameters from the firmware and setting up the necessary configuration parameters to the device we are ready to initialize the streaming using Bulk Transfers.
bulk: (ep, data_or_length)-> new RSVP.Promise (resolve, reject) => transfer_info = { direction: ..., endpoint: ..., data: ..., length: ... } chrome.usb.bulkTransfer(@connection, transfer_info, (info)-> resolve(info) )
Requesting a Bulk Transfer from the a camera device can be tricky. Let's recall that this USB operations are all made asynchronously by Chrome. As you may notice already, we are using RSVP. In order to keep the speed up to the real time streaming speed, say latency, we need to attack the device with multiple bulk requests as fast as we can; care about the data later. Wait, do we need threads? No. well, not necessary. We are going to take advantage of the asynchronous calls and setup a limited number of buffers to request the data.
A simple algorithm will setup multiple buffers, and continue recursively until error or manual stop.
# attack the device with N buffers per streaming. while (i+=1) < num_buffers buffers[i] = buffer_info = { buffer_id: i is_queued: false ep_handle: ep_handle buffer_size: buffer_size kill: false } @read_single_buffer(buffer_info) read_single_buffer: (buffer_info)-> return if buffer_info.kill buffer_info.is_queued = true @usb.bulk(buffer_info.ep_handle.address, buffer_info.buffer_size).then (info) => buffer_info.is_queued = false # process buffer # - kill the buffer if error or requested # Go again, and don't stop streaming @read_single_buffer(buffer_info)
_ Depth and Video Streaming
We have the buffers filled up with data and its time for some heavy processing. This are some of the algorithms we need to apply to the raw data:
1
Depth and image raw data decompression.
It is a good idea to setup data compression on the firmware. This way, a single thread can keep up with small bulk transfers.
2
Depth registration.
We need to match each pixel position from the depth data with the image data. Building a registration table once and only shifting the frame pixels will accelerate the process.
3
YUV to RGB image conversion.
The firmware can be setup with a YUV image format. A simple algorithm can be applied for each pixel to do the conversion.
4
Frame synchronization.
If we setup multiple bulk buffers per each streaming channel, we should expect to receive depth and image frames randomly, but only on frame with depth and image data should be triggered at a time.
_ HTML Canvas Render
Using a simple canvas we can render the RGB data, after adding a forth byte for the alpha channel, and also we can show the depth data building a RGBA per pixel.

- Depth and image channels were blended.
- Registration is perfect (depth and image pixels match together).
- The 3D camera has its own limitations over the blind spots.
_ Three.js and PointCloud
Lets take this information to the next level and build a 3D PointCloud render using Three.js. This is how we first initialize our PointCloud viewer.
canvas = $('#canvasContainer') width = window.innerWidth height = window.innerHeight # Setup a 3D Scene @scene = new THREE.Scene() @camera = new THREE.PerspectiveCamera( 5.859066080200536, width / height, 1, 100000) @camera.position.z = 4000 # Use the WebGL engine for rendering @renderer = new THREE.WebGLRenderer() @renderer.setSize( width, height ) canvas.append( @renderer.domElement ) # Setup the PointCloud geometry @geometry = new THREE.Geometry() material = new THREE.PointCloudMaterial( { size: 50, vertexColors: THREE.VertexColors } ) @pointcloud = new THREE.PointCloud( @geometry, material ) @scene.add( @pointcloud )
After filling the geometry vertices with the depth channel, and the geometry colors with the rgb channel, we can render the 3D input frame by frame. Also adding an orbit controller will make it even better for navigation.

Lets try a live streaming processing using up to 3 javascript workers to process all the streaming data per channel. There is still a visible lag on the streaming so a better improvement can be done.
Now think about the applications over the web that javascript developers can implement if we turn this into an opened source library available for everyone. The topics are broad wide: hand and face recognition, 3D video conference, 3D scenery preservation, web interaction, indoor exploration, and more.
Do you have any ideas? Share us some applications you can think of using this kind of web technologies.
