By Vineeta Tawney
What are Capsule Networks?
It is also known as Capsule Neural Network. It is a machine learning system which is used to better model hierarchical relationships. It is commonly known as CapsNet.
Definition of Capsule Networks
In simpler words, CapsNet is combined of numerous capsules. Every capsule is a group of neurons which learns to identify an object (e.g., a square) in a given region of the image.
It outputs a vector (e.g., an 8-dimensional vector) whose length represents the estimated probability that the object is present, and whose orientation (e.g., in 8D space) encodes the object’s pose parameters (e.g., precise position, rotation, etc.). If the position of an object is changed a little (e.g., shifted, rotated, resized, etc.) then the capsule output will be a vector image of the same length but placed slightly differently.
A CapsNet is arranged in multiple layers, very much like a regular neural network. The lowest layer capsules are called primary capsules: each of them obtains a small region of the image as input (called its receptive field). It tries to detect the presence and pose of a specific pattern, for example, a square. The higher layer Capsules are called routing capsules, identify larger and more complex objects, such as boats.
What do Capsule Networks do:
The purpose behind Capsule Networks is to perform computer vision as inverse graphics. In graphics, an object is represented through using a tree part. A specific rotation describes the conversion from the viewpoint of the part to the viewpoint of the parent.
CapsNets are encouraged by these tree-like representations and try to learn conversions relating the parts of an object to the whole. Capsules could be viewed as parts/object, with parent parts/objects that are also capsules.
Capsule Networks Deep Learning
Deep Learning is a feature of artificial intelligence (AI). In simple words, Deep Learning is a way to automate Predictive Analytics. Whereas traditional machine learning algorithms are linear, Deep Learning algorithms are stacked in a hierarchy of increasing difficulty and abstraction.
In simple terms, a CapsNet is combined of capsules and a capsule is a group of artificial neurons that learn to detect a specific object in a given region of the image. It produces a vector whose length represents the estimated probability of the object’s presence and whose orientation encodes the object’s position, size, and rotation. If the object is customized (for example, translated, rotated, or resized), the capsule will then produce a vector of the similar length, but with a slightly different orientation.
Capsule Networks: Deeper Analysis
CapsNet is organized in multiple layers. The deep layer is composed of primary capsules that receive a small portion of the input image and detect the presence and placement of a subject, such as a square, for example.
The high layer capsules, more commonly known as routing capsules, are capable of detecting larger and more complex objects. Capsules communicate mostly through an iterative “routing-by-agreement” mechanism: a lower level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.
“Lower level capsule can send its input to the higher-level capsule that ‘agrees’ with its input. This is the essence of the dynamic routing algorithm.” Most professionals working on Capsule Networks paper believe CapsNets to be an improvement on convolutional neural networks (CNN).
CapsNets attempts to solve the issues caused by Max Pooling and Deep Neural Networks like loss of information regarding the order and orientation of features. For example, a CNN used for face recognition will extract certain facial features of the image such as eyes, eyebrows, a mouth, a nose etc. Then the higher-level layers (the ones deeper down within the network) will merge those features and check if all of those features were found within the image regardless of order.
The mouth and nose may have switched places and your eyes can be sideways in the picture, but the CNN can still put together the facts and classify that as a face. This problem exacerbates the deeper your network gets as the features become more and more abstract and also shrink in size because of pooling and filtering. The idea behind CapsNets is that the low-level features could also be arranged in a certain order for the object to be classified as a face.
For example, it would learn that your nose must be between your two eyes and your mouth must be below that. Images with these features in the specific order can then be classified as a face, everything else will be rejected.
The publication of “Dynamic Capsule Routing” has led various researchers to work intensely towards refining algorithms and implementations, and advances have been published at a speedy pace.
Advantages of using CapsNets for Deep Learning:
1. Good preliminary results.
2. Requires less training data.
3. Works good with overlapping objects.
4. Potentially good on crowded scene.
5. Can detect partially visible objects.
6. Results are interpretable, components hierarchy can be mapped.
7. Equivariance (classifier adapt to small changes in input).
Disadvantages of using CapsNets
1. No known yet accuracy on large images.
2. Slow training time (so far).
3. Nonlinear squashing may not reflect the probability nature.
Future of Capsule Networks
Capsule Networks have presented a new building block that can be used in Deep Learning to better model hierarchical relationships inside of internal knowledge representation of a neural network.
To know more about Capsule Networks deep learning, refer critical essays on CapsNet models for Deep Learning. Or refer Capsule Networks paper for expert discussions on Deep Learning. Also, refer papers that contain discussions on Hinton’s Capsule Networks.