Deep learning models are making great strides in research papers and industrial deployments alike, but it’s helpful to have a guide and toolkit to join this frontier. This post serves to orient researchers, engineers, and machine learning practitioners on how to incorporate deep learning into their own work. This orientation pairs an introduction to model structure and learned features for general understanding with an overview of the Caffe deep learning framework for practical know-how. References highlight recent and historical research for perspective on current progress.The framework survey points out key elements of the Caffe architecture, reference models, and worked examples. Through collaboration with NVIDIA, drop-in integration of the cuDNN library accelerates Caffe models. Follow this post to join the active deep learning community around Caffe.
Automating Perception by Deep Learning
Deep learning is a branch of machine learning that is advancing the state of the art for perceptual problems like vision and speech recognition. We can pose these tasks as mapping concrete inputs such as image pixels or audio waveforms to abstract outputs like the identity of a face or a spoken word. The “depth” of deep learning models comes from composing functions into a series of transformations from input, through intermediate representations, and on to output. The overall composition gives a deep, layered model, in which each layer encodes progress from low-level details to high-level concepts. This yields a rich, hierarchical representation of the perceptual problem. Figure 1 shows the kinds of visual features captured in the intermediate layers of the model between the pixels and the output. A simple classifier can recognize a category from these learned features while a classifier on the raw pixels has a more complex decision to make.