08 Jun Startup KnuEdge Emerges With Another Accelerator For Deep Learning

The market for Deep Learning Accelerators is heating up with new entrants, as I explored after NVIDIA’s traction in Artificial Intelligence (AI), and Google’s recent announcement of the somewhat mysterious TPU processor. Now KnuEdge, a startup led by former NASA administrator Dan Goldin, has emerged with its own accelerator called KNUPATH LambdaFabric, coupled with a suite of software that can identify and authenticate voices for intelligence, enterprise and consumer applications. The voice software, from their KnuVerse business unit, has been quietly in the market for some time, and the KNUPATH hardware has been sampled by a number of potential customers for several months.

While the strategy of being both a software and hardware company flies in the face of the startup mantra of “Focus, Focus, Focus”, it may prove to foster a virtuous innovation cycle. In theory, their software can teach them how to build better hardware, which in turn could deliver advantages to their software business. The company went on to disclose that they have been in development for 10 years, have raised $100M from unnamed accredited investors and have garnered over $20M in revenue from unnamed customers. (If this sounds spooky, read on.)

What did KnuEdge announce?

While much of the material had previously been made public on the two company websites, www.KNUPATH.com and www.knurld.com, the company pulled it all together, announcing a two pronged strategy to use “neurological principles” to solve “unsolvable” problems. First is their KnuVerse portfolio of voice recognition and authentication software. The company’s roots began with this software, targeting noisy environments for passive security applications when you-know-who is listening in and wants to know who is on the phone. They have extended this impressively robust technology to provide APIs for desktop and mobile application login authentication. The market should welcome a secure and pervasive voice-based login authentication, instead of dealing with a myriad of password rules, each unique to a site. (What? You mean mypw12345 isn’t secure?)


The KnuVerse software portfolio provides support for passive and active voice authentication for applications. (Source: KnuEdge)

But along the way, the company realized that their voice software would require a more scalable platform for things like multiple speaker real-time authentication, especially if it was going to be used to listen in on millions of sound streams and use more sophisticated voice algorithms. So they set out to develop a scalable processor node that is built from 8 “Tiny DSPs”, or Digital Signal Processors, sharing a small memory store (2.256 MB). To enable scaling, they added a Level 1 router to interconnect clusters of 8 nodes, and then scaled to 4 “Superclusters” of 8 clusters with very low latency. So if you follow the arithmetic, this amounts to 256 DSPs per die, all interconected by the LambdaFabric.


The Hermosa chip delivers 256 DSPs in a package, as well as the LambdaFabric on die. (Source: KnuEdge)

They company went on to say that the LambdaFabric can scale this architecture to 512,000 chips across multiple racks, using optical connections. While they use PCIe within a chassis for multiple Hermosas, they provide direct connectivity between cards, keeping latency down to 247 nanoseconds (ns), and then provide optical links to leap to the next chassis, rack or row, keeping latency down to an impressive 400ns across the network.


The LambdaFabric provides scalability while keeping latency very low. (Source: KnuEdge)

What is KNUPATH good for?

Since each node has a small amount of (shared) memory between the DSPs, and a low latency fabric to pass the resulting computation to the next node, this looks like a so-called data-flow architecture. Set up the initial state, and then follow the bouncing ball as far and deep as you’d like. The company believes this architecture will fit well with IoT data processing, signal processing (telco, aerospace and satellites) and Machine Learning applications. The dataflow architectural concept has adherents in the Deep Learning academic community as well as forming the basis for the highly anticipated products from startup Wave Computing.

Certainly this approach could provide a massively parallel architecture for people able to exploit its scalability. However, when asked about how one would actually program one of these things, the company referenced their own implementation of MPI (Message Passing Interface), which they of course call KPI. This implies that the programming and distribution of a problem across the massively parallel array of nodes will be left as an exercise for the C++ programmer, instead of supporting the popular Deep Learning (DNN) frameworks and optimized libraries such as NVIDIA’s CuDNN, Torch, Theano, Café and Google’s TensorFlow, which the search giant uses for the Tensor Processing Unit, or TPU. This could limit early adoption to those companies or institutions within the HPC community who are accustomed to such heavy lifting, such as supercomputing centers, at least until KnuEdge develops the needed libraries to improve ease of use and adoption.

Where do we go from here?

The real challenge for this and other attempts to model computing architectures on neural systems is to be able to compute more efficiently and produce more accurate results than can be achieved today with GPUs and well-understood algorithms like convolution and recurrent deep neural networks. It will take time for early adopters to figure out how to use this hardware to mimic the neural learning behavior of a brain. In fact, some question whether it is too early to mimic the brain’s function in silicon when it is still not well understood by neural scientists.

Fortunately, the company says they have “patient” investors who are willing to give the KnuEdge team time to experiment and “swing for the fences”. They certainly have a lot on their plates for a small company, but their innovative architecture shows that Machine Learning researchers will be able to explore many alternatives in the coming years in their quest to build learning machines that may one day approach intelligence.