matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub: Discover pretrained models ...

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称（OpenSource Name）：

matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub

开源软件地址(OpenSource Url)：

https://github.com/matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub

开源编程语言(OpenSource Language)：

MATLAB 100.0%

开源软件介绍(OpenSource Introduction)：

MATLAB Deep Learning Model Hub

Discover pretrained models for deep learning in MATLAB.

Models

Image Classification

Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning.

Inputs are RGB images, the output is the predicted label and score:

These networks have been trained on more than a million images and can classify images into 1000 object categories.

Models available in MATLAB:

Network	Size (MB)	Classes	Accuracy %	Location
googlenet	27	1000	66.25	Doc GitHub
squeezenet	5.2	1000	55.16	Doc
alexnet	227	1000	54.10	Doc
resnet18	44	1000	69.49	Doc GitHub
resnet50	96	1000	74.46	Doc GitHub
resnet101	167	1000	75.96	Doc GitHub
mobilenetv2	13	1000	70.44	Doc GitHub
vgg16	515	1000	70.29	Doc
vgg19	535	1000	70.42	Doc
inceptionv3	89	1000	77.07	Doc
inceptionresnetv2	209	1000	79.62	Doc
xception	85	1000	78.20	Doc
darknet19	78	1000	74.00	Doc
darknet53	155	1000	76.46	Doc
densenet201	77	1000	75.85	Doc
shufflenet	5.4	1000	63.73	Doc
nasnetmobile	20	1000	73.41	Doc
nasnetlarge	332	1000	81.83	Doc
efficientnetb0	20	1000	74.72	Doc
ConvMixer	7.7	10	-	GitHub

Tips for selecting a model

Pretrained networks have different characteristics that matter when choosing a network to apply to your problem. The most important characteristics are network accuracy, speed, and size. Choosing a network is generally a tradeoff between these characteristics. The following figure highlights these tradeoffs:

Figure. Comparing image classification model accuracy, speed and size.

Object Detection

Object detection is a computer vision technique used for locating instances of objects in images or videos. When humans look at images or video, we can recognize and locate objects of interest within a matter of moments. The goal of object detection is to replicate this intelligence using a computer.

Inputs are RGB images, the output is the predicted label, bounding box and score:

These networks have been trained to detect 80 objects classes from the COCO dataset. These models are suitable for training a custom object detector using transfer learning.

Network	Backbone Networks	Size (MB)	Mean Average Precision (mAP)	Object Classes	Location
EfficientDet-D0	efficientnet	15.9	33.7	80	GitHub
YOLOX	YoloX-s YoloX-m YoloX-l	32 90.2 192.9	39.8 45.9 48.6	80	GitHub
YOLO v4	yolov4-coco yolov4-tiny-coco	229 21.5	44.2 19.7	80	Doc GitHub
YOLO v3	darknet53-coco tiny-yolov3-coco	220.4 31.5	34.4 9.3	80	Doc
YOLO v2	darknet19-COCO tiny-yolo_v2-coco	181 40	28.7 10.5	80	GitHub

Tips for selecting a model

Pretrained object detectors have different characteristics that matter when choosing a network to apply to your problem. The most important characteristics are mean average precision (mAP), speed, and size. Choosing a network is generally a tradeoff between these characteristics.

Application Specific Object Detectors

These networks have been trained to detect specific objects for a given application.

Network	Application	Size (MB)	Location
Spatial-CNN	Lane detection	74	GitHub
Single Shot Detector (SSD)	Vehicle detection	44	Doc
Faster R-CNN	Vehicle detection	118	Doc

Semantic Segmentation

Segmentation is essential for image analysis tasks. Semantic segmentation describes the process of associating each pixel of an image with a class label, (such as flower, person, road, sky, ocean, or car).

Inputs are RGB images, outputs are pixel classifications (semantic maps).

This network has been trained to detect 20 objects classes from the PASCAL VOC dataset:

Network	Size (MB)	Mean Accuracy	Object Classes	Location
DeepLabv3+	209	0.87	20	GitHub

Application Specific Semantic Segmentation Models

Network	Application	Size (MB)	Location
U-net	Raw Camera Processing	31	Doc
3-D U-net	Brain Tumor Segmentation	56.2	Doc
AdaptSeg (GAN)	Model tuning using 3-D simulation data	54.4	Doc

Instance Segmentation

Instance segmentation is an enhanced type of object detection that generates a segmentation map for each detected instance of an object. Instance segmentation treats individual objects as distinct entities, regardless of the class of the objects. In contrast, semantic segmentation considers all objects of the same class as belonging to a single entity.

Inputs are RGB images, outputs are pixel classifications (semantic maps), bounding boxes and classification labels.

Network	Object Classes	Location
Mask R-CNN	80	Doc Github

Image Translation

Image translation is the task of transferring styles and characteristics from one image domain to another. This technique can be extended to other image-to-image learning operations, such as image enhancement, image colorization, defect generation, and medical image analysis.

Inputs are images, outputs are translated RGB images. This example workflow shows how a semantic segmentation map input translates to a synthetic image via a pretrained model (Pix2PixHD):

Network	Application	Size (MB)	Location
Pix2PixHD(CGAN)	Synthetic Image Translation	648	Doc
UNIT (GAN)	Day-to-Dusk Dusk-to-Day Image Translation	72.5	Doc
UNIT (GAN)	Medical Image Denoising	72.4	Doc
CycleGAN	Medical Image Denoising	75.3	Doc
VDSR	Super Resolution (estimate a high-resolution image from a low-resolution image)	2.4	Doc

Pose Estimation

Pose estimation is a computer vision technique for localizing the position and orientation of an object using a fixed set of keypoints.

All inputs are RGB images, outputs are heatmaps and part affinity fields (PAFs) which via post processing perform pose estimation.

Network	Size (MB)	Location
OpenPose	14	Doc

Video Classification

Video classification is a computer vision technique for classifying the action or content in a sequence of video frames.

All inputs are Videos only or Video with Optical Flow data, outputs are gesture classifications and scores.

Network	Inputs	Size(MB)	Classifications (Human Actions)	Description	Location
SlowFast	Video	124	400	Faster convergence than Inflated-3D	Doc
R(2+1)D	Video	112	400	Faster convergence than Inflated-3D	Doc
Inflated-3D	Video & Optical Flow data	91	400	Accuracy of the classifier improves when combining optical flow and RGB data.	Doc

Text Detection and Recognition

Text detection is a computer vision technique used for locating instances of text within in images.

Inputs are RGB images, outputs are bounding boxes that identify regions of text.

Network	Application	Size (MB)	Location
CRAFT	Trained to detect English, Korean, Italian, French, Arabic, German and Bangla (Indian).	3.8	Doc GitHub

Application Specific Text Detectors

Network	Application	Size (MB)	Location	Example Output
Seven Segment Digit Recognition	Seven segment digit recognition using deep learning and OCR. This is helpful in industrial automation applications where digital displays are often surrounded with complex background.	3.8	GitHub

Transformers (Text)

Transformer pretained models have already learned to extract powerful and informative features features from text. Use them as a starting point to learn a new task using transfer learning.

Inputs are sequences of text, outputs are text feature embeddings.

Network	Applications	Size (MB)	Location
BERT	Feature Extraction (Sentence and Word embedding), Text Classification, Token Classification, Masked Language Modeling, Question Answering	390	GitHub

Application Specific Transformers

Network	Application	Size (MB)	Location	Output Example
FinBERT	The FinBERT model is a BERT model for financial sentiment analysis	388	GitHub
GPT-2	The GPT-2 model is a decoder model used for text summarization.	1.2GB	GitHub

Audio

Audio Embedding pretrained models have already learned to extract powerful and informative features from audio signals. Use them as a starting point to learn a new task using transfer learning.

Inputs are Audio signals, outputs are audio feature embeddings.

Network	Application	Size (MB)	Location
VGGish	Feature Embeddings	257	Doc
OpenL3	Feature Embeddings	200	Doc

Application Specific Audio Models

Network	Application	Size (MB)	Output Classes	Location	Output Example
YAMNet	Sound Classification	13.5	521	Doc
CREPE	Pitch Estimation (Regression)	132	-	Doc

Speech to Text

Speech to text pretrained models take an audio input and translate it into a text output. They are useful in digitizating audio files for downstream text processing tasks such as text summarization and sentiment analysis.

Inputs are Audio signals, outputs is text.

Network	Application	Size (MB)	Word Error Rate (WER)	Location
wav2vec	Speech to Text	236	3.2	GitHub
deepspeech	Speech to Text	167	5.97	GitHub

Lidar

Point cloud data is acquired by a variety of sensors, such as lidar, radar, and depth cameras. Training robust classifiers with point cloud data is challenging because of the sparsity of data per object, object occlusions, and sensor noise. Deep learning techniques have been shown to address many of these challenges by learning robust feature representations directly from point cloud data.

Inputs are Lidar Point Clouds converted to five-channels, outputs are segmentation, classification or object detection results overlayed on point clouds.

Network	Application	Size (MB)	Object Classes	Location
PointNet	Classification	5	14	Doc
PointNet++	Segmentation	3	8	Doc
PointSeg	Segmentation	14	3	Doc
SqueezeSegV2	Segmentation	5	12	Doc
SalsaNext	Segmentation	20.9	13	GitHub
PointPillars	Object Detection	8	3	Doc
Complex YOLO v4	Object Detection	233 (complex-yolov4) 21 (tiny-complex-yolov4)	3	GitHub

Model requests

If you'd like to request MATLAB support for additional pretrained models, please create an issue from this repo.

Alternatively send the request through to:

David Willingham
Deep Learning Product Manager
dwilling@mathworks.com

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

ImperialCollegeLondon/sap-voicebox: Speech Processing Toolbox for MATLAB发布时间：2022-08-17

vfitoolkit/VFIToolkit-matlab: A Matlab Toolkit for Macroeconomic Models using Va ...发布时间：2022-08-17

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：19684|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：10118|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8412|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8768|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8723|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9780|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8710|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：8076|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8758|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7603|2022-11-06

客服电话

电子邮件

matlab-deep-learning/MATLAB-Deep-Learning-Model-Hub: Discover pretrained models ...

开源软件名称（OpenSource Name）：

开源软件地址(OpenSource Url)：

开源编程语言(OpenSource Language)：

开源软件介绍(OpenSource Introduction)：

MATLAB Deep Learning Model Hub

Models

Computer Vision

Natural Language Processing

Audio

Lidar

Image Classification

Object Detection

Semantic Segmentation

Instance Segmentation

Image Translation

Pose Estimation

Video Classification

Text Detection and Recognition

Transformers (Text)

Audio

Speech to Text

Lidar

Model requests

请发表评论

全部评论

上一篇：

下一篇：

dphi-official/Machine_Learning_Bootcamp

Matlab中的标点符号

juven/maven-bash-completion: Maven Bash

win7系统注册表编辑器打开的操作方法

route101/mastoinker: Quick image view as

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053