开源软件名称(OpenSource Name): matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub开源软件地址(OpenSource Url): https://github.com/matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub开源编程语言(OpenSource Language):
MATLAB
100.0%
开源软件介绍(OpenSource Introduction): MATLAB Deep Learning Model Hub
Discover pretrained models for deep learning in MATLAB.
Models
Computer Vision
Natural Language Processing
Audio
Lidar
Image Classification
Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning.
Inputs are RGB images, the output is the predicted label and score:
These networks have been trained on more than a million images and can classify images into 1000 object categories.
Models available in MATLAB:
Tips for selecting a model
Pretrained networks have different characteristics that matter when choosing a network to apply to your problem. The most important characteristics are network accuracy, speed, and size. Choosing a network is generally a tradeoff between these characteristics. The following figure highlights these tradeoffs:
Figure. Comparing image classification model accuracy, speed and size.
Back to top
Object Detection
Object detection is a computer vision technique used for locating instances of objects in images or videos. When humans look at images or video, we can recognize and locate objects of interest within a matter of moments. The goal of object detection is to replicate this intelligence using a computer.
Inputs are RGB images, the output is the predicted label, bounding box and score:
These networks have been trained to detect 80 objects classes from the COCO dataset. These models are suitable for training a custom object detector using transfer learning.
Network
Backbone Networks
Size (MB)
Mean Average Precision (mAP)
Object Classes
Location
EfficientDet-D0
efficientnet
15.9
33.7
80
GitHub
YOLOX
YoloX-s YoloX-m YoloX-l
32 90.2 192.9
39.8 45.9 48.6
80
GitHub
YOLO v4
yolov4-coco yolov4-tiny-coco
229 21.5
44.2 19.7
80
Doc GitHub
YOLO v3
darknet53-coco tiny-yolov3-coco
220.4 31.5
34.4 9.3
80
Doc
YOLO v2
darknet19-COCO tiny-yolo_v2-coco
181 40
28.7 10.5
80
GitHub
Tips for selecting a model
Pretrained object detectors have different characteristics that matter when choosing a network to apply to your problem. The most important characteristics are mean average precision (mAP), speed, and size. Choosing a network is generally a tradeoff between these characteristics.
Application Specific Object Detectors
These networks have been trained to detect specific objects for a given application.
Back to top
Semantic Segmentation
Segmentation is essential for image analysis tasks. Semantic segmentation describes the process of associating each pixel of an image with a class label, (such as flower, person, road, sky, ocean, or car).
Inputs are RGB images, outputs are pixel classifications (semantic maps).
This network has been trained to detect 20 objects classes from the PASCAL VOC dataset:
Network
Size (MB)
Mean Accuracy
Object Classes
Location
DeepLabv3+
209
0.87
20
GitHub
Application Specific Semantic Segmentation Models
Network
Application
Size (MB)
Location
Example Output
U-net
Raw Camera Processing
31
Doc
3-D U-net
Brain Tumor Segmentation
56.2
Doc
AdaptSeg (GAN)
Model tuning using 3-D simulation data
54.4
Doc
Back to top
Instance Segmentation
Instance segmentation is an enhanced type of object detection that generates a segmentation map for each detected instance of an object. Instance segmentation treats individual objects as distinct entities, regardless of the class of the objects. In contrast, semantic segmentation considers all objects of the same class as belonging to a single entity.
Inputs are RGB images, outputs are pixel classifications (semantic maps), bounding boxes and classification labels.
Back to top
Image Translation
Image translation is the task of transferring styles and characteristics from one image domain to another. This technique can be extended to other image-to-image learning operations, such as image enhancement, image colorization, defect generation, and medical image analysis.
Inputs are images, outputs are translated RGB images. This example workflow shows how a semantic segmentation map input translates to a synthetic image via a pretrained model (Pix2PixHD):
Network
Application
Size (MB)
Location
Example Output
Pix2PixHD(CGAN)
Synthetic Image Translation
648
Doc
UNIT (GAN)
Day-to-Dusk Dusk-to-Day Image Translation
72.5
Doc
UNIT (GAN)
Medical Image Denoising
72.4
Doc
CycleGAN
Medical Image Denoising
75.3
Doc
VDSR
Super Resolution (estimate a high-resolution image from a low-resolution image)
2.4
Doc
Back to top
Pose Estimation
Pose estimation is a computer vision technique for localizing the position and orientation of an object using a fixed set of keypoints.
All inputs are RGB images, outputs are heatmaps and part affinity fields (PAFs) which via post processing perform pose estimation.
Back to top
Video Classification
Video classification is a computer vision technique for classifying the action or content in a sequence of video frames.
All inputs are Videos only or Video with Optical Flow data, outputs are gesture classifications and scores.
Network
Inputs
Size(MB)
Classifications (Human Actions)
Description
Location
SlowFast
Video
124
400
Faster convergence than Inflated-3D
Doc
R(2+1)D
Video
112
400
Faster convergence than Inflated-3D
Doc
Inflated-3D
Video & Optical Flow data
91
400
Accuracy of the classifier improves when combining optical flow and RGB data.
Doc
Back to top
Text Detection and Recognition
Text detection is a computer vision technique used for locating instances of text within in images.
Inputs are RGB images, outputs are bounding boxes that identify regions of text.
Network
Application
Size (MB)
Location
CRAFT
Trained to detect English, Korean, Italian, French, Arabic, German and Bangla (Indian).
3.8
Doc GitHub
Application Specific Text Detectors
Network
Application
Size (MB)
Location
Example Output
Seven Segment Digit Recognition
Seven segment digit recognition using deep learning and OCR. This is helpful in industrial automation applications where digital displays are often surrounded with complex background.
3.8
GitHub
Back to top
Transformers (Text)
Transformer pretained models have already learned to extract powerful and informative features features from text. Use them as a starting point to learn a new task using transfer learning.
Inputs are sequences of text, outputs are text feature embeddings.
Network
Applications
Size (MB)
Location
BERT
Feature Extraction (Sentence and Word embedding), Text Classification, Token Classification, Masked Language Modeling, Question Answering
390
GitHub
Application Specific Transformers
Network
Application
Size (MB)
Location
Output Example
FinBERT
The FinBERT model is a BERT model for financial sentiment analysis
388
GitHub
GPT-2
The GPT-2 model is a decoder model used for text summarization.
1.2GB
GitHub
Back to top
Audio
Audio Embedding pretrained models have already learned to extract powerful and informative features from audio signals. Use them as a starting point to learn a new task using transfer learning.
Inputs are Audio signals, outputs are audio feature embeddings.
Network
Application
Size (MB)
Location
VGGish
Feature Embeddings
257
Doc
OpenL3
Feature Embeddings
200
Doc
Application Specific Audio Models
Network
Application
Size (MB)
Output Classes
Location
Output Example
YAMNet
Sound Classification
13.5
521
Doc
CREPE
Pitch Estimation (Regression)
132
-
Doc
Speech to text pretrained models take an audio input and translate it into a text output. They are useful in digitizating audio files for downstream text processing tasks such as text summarization and sentiment analysis.
Inputs are Audio signals, outputs is text.
Back to top
Lidar
Point cloud data is acquired by a variety of sensors, such as lidar, radar, and depth cameras. Training robust classifiers with point cloud data is challenging because of the sparsity of data per object, object occlusions, and sensor noise. Deep learning techniques have been shown to address many of these challenges by learning robust feature representations directly from point cloud data.
Inputs are Lidar Point Clouds converted to five-channels, outputs are segmentation, classification or object detection results overlayed on point clouds.
Back to top
Model requests
If you'd like to request MATLAB support for additional pretrained models, please create an issue from this repo .
Alternatively send the request through to:
David Willingham
Deep Learning Product Manager
dwilling@mathworks.com
Copyright 2022, The MathWorks, Inc.
请发表评论