Computer Vision Models (Vision
API)
Native Lux Models
Boltz.Vision.AlexNet Type
AlexNet(; kwargs...)
Create an AlexNet model (Krizhevsky et al., 2012).
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.VGG Type
VGG(imsize; config, inchannels, batchnorm = false, nclasses, fcsize, dropout)
Create a VGG model (Simonyan, 2014).
Arguments
imsize
: input image width and height as a tupleconfig
: the configuration for the convolution layersinchannels
: number of input channelsbatchnorm
: set totrue
to use batch normalization after each convolutionnclasses
: number of output classesfcsize
: intermediate fully connected layer sizedropout
: dropout level between fully connected layers
VGG(depth::Int; batchnorm::Bool=false, pretrained::Bool=false)
Create a VGG model (Simonyan, 2014) with ImageNet Configuration.
Arguments
depth::Int
: the depth of the VGG model. Choices: {11
,13
,16
,19
}.
Keyword Arguments
batchnorm = false
: set totrue
to use batch normalization after each convolution.pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.VisionTransformer Type
VisionTransformer(name::Symbol; pretrained=false)
Creates a Vision Transformer model with the specified configuration.
Arguments
name::Symbol
: name of the Vision Transformer model to create. The following models are available –:tiny
,:small
,:base
,:large
,:huge
,:giant
,:gigantic
.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Imported from Metalhead.jl
Load Metalhead
You need to load Metalhead
before using these models.
Boltz.Vision.ConvMixer Function
ConvMixer(name::Symbol; pretrained::Bool=false)
Create a ConvMixer model (Trockman and Kolter, 2022).
Arguments
name::Symbol
: The name of the ConvMixer model. Must be one of:base
,:small
, or:large
.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.DenseNet Function
DenseNet(depth::Int; pretrained::Bool=false)
Create a DenseNet model (Huang et al., 2017).
Arguments
depth::Int
: The depth of the DenseNet model. Must be one of 121, 161, 169, or 201.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.GoogLeNet Function
GoogLeNet(; pretrained::Bool=false)
Create a GoogLeNet model (Szegedy et al., 2015).
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.MobileNet Function
MobileNet(name::Symbol; pretrained::Bool=false)
Create a MobileNet model (Howard, 2017; Sandler et al., 2018; Howard et al., 2019).
Arguments
name::Symbol
: The name of the MobileNet model. Must be one of:v1
,:v2
,:v3_small
, or:v3_large
.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.ResNet Function
ResNet(depth::Int; pretrained::Bool=false)
Create a ResNet model (He et al., 2016).
Arguments
depth::Int
: The depth of the ResNet model. Must be one of 18, 34, 50, 101, or 152.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.ResNeXt Function
ResNeXt(depth::Int; cardinality=32, base_width=nothing, pretrained::Bool=false)
Create a ResNeXt model (Xie et al., 2017).
Arguments
depth::Int
: The depth of the ResNeXt model. Must be one of 50, 101, or 152.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.cardinality
: The cardinality of the ResNeXt model. Defaults to 32.base_width
: The base width of the ResNeXt model. Defaults to 8 for depth 101 and 4 otherwise.
Boltz.Vision.SqueezeNet Function
SqueezeNet(; pretrained::Bool=false)
Create a SqueezeNet model (Iandola et al., 2016).
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Boltz.Vision.WideResNet Function
WideResNet(depth::Int; pretrained::Bool=false)
Create a WideResNet model (Zagoruyko and Komodakis, 2017).
Arguments
depth::Int
: The depth of the WideResNet model. Must be one of 18, 34, 50, 101, or 152.
Keyword Arguments
pretrained::Bool=false
: Iftrue
, loads pretrained weights whenLuxCore.setup
is called.
Pretrained Models
Load JLD2
You need to load JLD2
before being able to load pretrained weights.
Load Pretrained Weights
Pass pretrained=true
to the model constructor to load the pretrained weights.
MODEL | TOP 1 ACCURACY (%) | TOP 5 ACCURACY (%) |
---|---|---|
AlexNet() | 54.48 | 77.72 |
VGG(11) | 67.35 | 87.91 |
VGG(13) | 68.40 | 88.48 |
VGG(16) | 70.24 | 89.80 |
VGG(19) | 71.09 | 90.27 |
VGG(11; batchnorm=true) | 69.09 | 88.94 |
VGG(13; batchnorm=true) | 69.66 | 89.49 |
VGG(16; batchnorm=true) | 72.11 | 91.02 |
VGG(19; batchnorm=true) | 72.95 | 91.32 |
ResNet(18) | - | - |
ResNet(34) | - | - |
ResNet(50) | - | - |
ResNet(101) | - | - |
ResNet(152) | - | - |
ResNeXt(50; cardinality=32, base_width=4) | - | - |
ResNeXt(101; cardinality=32, base_width=8) | - | - |
ResNeXt(101; cardinality=64, base_width=4) | - | - |
SqueezeNet() | - | - |
WideResNet(50) | - | - |
WideResNet(101) | - | - |
Pretrained Models from Metalhead
For Models imported from Metalhead, the pretrained weights can be loaded if they are available in Metalhead. Refer to the Metalhead.jl docs for a list of available pretrained models.
Preprocessing
All the pretrained models require that the images be normalized with the parameters mean = [0.485f0, 0.456f0, 0.406f0]
and std = [0.229f0, 0.224f0, 0.225f0]
.
Bibliography
He, K.; Zhang, X.; Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 770–778.
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. and others (2019). Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision; pp. 1314–1324.
Howard, A. G. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv preprint arXiv:1704.04861.
Huang, G.; Liu, Z.; Van Der Maaten, L. and Weinberger, K. Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 4700–4708.
Iandola, F. N.; Han, S.; Moskewicz, M. W.; Ashraf, K.; Dally, W. J. and Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, arXiv:1602.07360 [cs.CV].
Krizhevsky, A.; Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25.
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A. and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 4510–4520.
Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V. and Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 1–9.
Trockman, A. and Kolter, J. Z. (2022). Patches are all you need? arXiv preprint arXiv:2201.09792.
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z. and He, K. (2017). Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 1492–1500.
Zagoruyko, S. and Komodakis, N. (2017). Wide Residual Networks, arXiv:1605.07146 [cs.CV].