开源软件名称(OpenSource Name):humphd/have-fun-with-machine-learning开源软件地址(OpenSource Url):https://github.com/humphd/have-fun-with-machine-learning开源编程语言(OpenSource Language):Python 100.0%开源软件介绍(OpenSource Introduction):Have Fun with Machine Learning: A Guide for BeginnersAlso available in Chinese (Traditional). PrefaceThis is a hands-on guide to machine learning for programmers with no background in AI. Using a neural network doesn’t require a PhD, and you don’t need to be the person who makes the next breakthrough in AI in order to use what exists today. What we have now is already breathtaking, and highly usable. I believe that more of us need to play with this stuff like we would any other open source technology, instead of treating it like a research topic. In this guide our goal will be to write a program that uses machine learning to predict, with a high degree of certainty, whether the images in data/untrained-samples are of dolphins or seahorses using only the images themselves, and without having seen them before. Here are two example images we'll use: To do that we’re going to train and use a Convolutional Neural Network (CNN). We’re going to approach this from the point of view of a practitioner vs. from first principles. There is so much excitement about AI right now, but much of what’s being written feels like being taught to do tricks on your bike by a physics professor at a chalkboard instead of your friends in the park. I’ve decided to write this on Github vs. as a blog post because I’m sure that some of what I’ve written below is misleading, naive, or just plain wrong. I’m still learning myself, and I’ve found the lack of solid beginner documentation an obstacle. If you see me making a mistake or missing important details, please send a pull request. With all of that out the way, let me show you how to do some tricks on your bike! OverviewHere’s what we’re going to explore:
This guide won’t teach you how neural networks are designed, cover much theory, or use a single mathematical expression. I don’t pretend to understand most of what I’m going to show you. Instead, we’re going to use existing things in interesting ways to solve a hard problem.
There are literally hundreds of introductions to this, from short posts to full online courses. Depending on how you like to learn, here are three options for a good starting point:
SetupInstalling the software we'll use (Caffe and DIGITS) can be frustrating, depending on your platform and OS version. By far the easiest way to do it is using Docker. Below we examine how to do it with Docker, as well as how to do it natively. Option 1a: Installing Caffe NativelyFirst, we’re going to be using the Caffe deep learning framework from the Berkely Vision and Learning Center (BSD licensed).
There are a lot of great choices available, and you should look at all the options. TensorFlow is great, and you should play with it. However, I’m using Caffe for a number of reasons:
But the number one reason I’m using Caffe is that you don’t need to write any code to work with it. You can do everything declaratively (Caffe uses structured text files to define the network architecture) and using command-line tools. Also, you can use some nice front-ends for Caffe to make training and validating your network a lot easier. We’ll be using nVidia’s DIGITS tool below for just this purpose. Caffe can be a bit of work to get installed. There are installation instructions for various platforms, including some prebuilt Docker or AWS configurations. NOTE: when making my walkthrough, I used the following non-released version of Caffe from their Github repo: https://github.com/BVLC/caffe/commit/5a201dd960840c319cefd9fa9e2a40d2c76ddd73 On a Mac it can be frustrating to get working, with version issues halting your progress at various steps in the build. It took me a couple of days of trial and error. There are a dozen guides I followed, each with slightly different problems. In the end I found this one to be the closest. I’d also recommend this post, which is quite recent and links to many of the same discussions I saw. Getting Caffe installed is by far the hardest thing we'll do, which is pretty neat, since you’d assume the AI aspects would be harder! Don’t give up if you have issues, it’s worth the pain. If I was doing this again, I’d probably use an Ubuntu VM instead of trying to do it on Mac directly. There's also a Caffe Users group, if you need answers.
It’s true, deep neural networks require a lot of computing power and energy to train...if you’re training them from scratch and using massive datasets. We aren’t going to do that. The secret is to use a pretrained network that someone else has already invested hundreds of hours of compute time training, and then to fine tune it to your particular dataset. We’ll look at how to do this below, but suffice it to say that everything I’m going to show you, I’m doing on a year old MacBook Pro without a fancy GPU. As an aside, because I have an integrated Intel graphics card vs. an nVidia GPU, I decided to use the OpenCL Caffe branch, and it’s worked great on my laptop. When you’re done installing Caffe, you should have, or be able to do all of the following:
On my machine, with Caffe fully built, I’ve got the following basic layout in my CAFFE_ROOT dir:
At this point, we have everything we need to train, test, and program with neural networks. In the next section we’ll add a user-friendly, web-based front end to Caffe called DIGITS, which will make training and testing our networks much easier. Option 1b: Installing DIGITS NativelynVidia’s Deep Learning GPU Training System, or DIGITS, is BSD-licensed Python web app for training neural networks. While it’s possible to do everything DIGITS does in Caffe at the command-line, or with code, using DIGITS makes it a lot easier to get started. I also found it more fun, due to the great visualizations, real-time charts, and other graphical features. Since you’re experimenting and trying to learn, I highly recommend beginning with DIGITS. There are quite a few good docs at https://github.com/NVIDIA/DIGITS/tree/master/docs, including a few Installation, Configuration, and Getting Started pages. I’d recommend reading through everything there before you continue, as I’m not an expert on everything you can do with DIGITS. There's also a public DIGITS User Group if you have questions you need to ask. There are various ways to install and run DIGITS, from Docker to pre-baked packages on Linux, or you can build it from source. I’m on a Mac, so I built it from source. NOTE: In my walkthrough I've used the following non-released version of DIGITS from their Github repo: https://github.com/NVIDIA/DIGITS/commit/81be5131821ade454eb47352477015d7c09753d9 Because it’s just a bunch of Python scripts, it was fairly painless to get working.
The one thing you need to do is tell DIGITS where your export CAFFE_ROOT=/path/to/caffe
./digits-devserver NOTE: on Mac I had issues with the server scripts assuming my Python binary was
called Once the server is started, you can do everything else via your web browser at http://localhost:5000, which is what I'll do below. Option 2: Caffe and DIGITS using DockerInstall Docker, if not already installed, then run the following command in order to pull and run a full Caffe + Digits container. A few things to note:
docker run --name digits -d -p 8080:5000 -v /path/to/this/repository:/data/repo kaixhin/digits Now that we have our container running you can open up your web browser and open If you need shell access, use the following command: docker exec -it digits /bin/bash Training a Neural NetworkTraining a neural network involves a few steps:
We’re going to do this 3 different ways, in order to show the difference between starting from scratch and using a pretrained network, and also to show how to work with two popular pretrained networks (AlexNet, GoogLeNet) that are commonly used with Caffe and DIGITs. For our training attempts, we’ll use a small dataset of Dolphins and Seahorses. I’ve put the images I used in data/dolphins-and-seahorses. You need at least 2 categories, but could have many more (some of the networks we’ll use were trained on 1000+ image categories). Our goal is to be able to give an image to our network and have it tell us whether it’s a Dolphin or a Seahorse. Prepare the DatasetThe easiest way to begin is to divide your images into a categorized directory layout:
Here each directory is a category we want to classify, and each image within that category dir an example we’ll use for training and validation.
No to both. The images sizes will be normalized before we feed them into the network. We’ll eventually want colour images of 256 x 256 pixels, but DIGITS will crop or squash (we'll squash) our images automatically in a moment. The filenames are irrelevant--it’s only important which category they are contained within.
Yes. See https://github.com/NVIDIA/DIGITS/blob/digits-4.0/docs/ImageFolderFormat.md. We want to use these images on disk to create a New Dataset, and specifically, a Classification Dataset. We’ll use the defaults DIGITS gives us, and point Training Images at the path
to our data/dolphins-and-seahorses folder.
DIGITS will use the categories ( Give your Dataset a name, This will create our dataset, which took only 4s on my laptop. In the end I have 92 Training images (49 dolphin, 43 seahorse) in 2 categories, with 30 Validation images (16 dolphin, 14 seahorse). It’s a really small dataset, but perfect for our experimentation and learning purposes, because it won’t take forever to train and validate a network that uses it. You can Explore the db if you want to see the images after they have been squashed. Training: Attempt 1, from ScratchBack in the DIGITS Home screen, we need to create a new Classification Model: We’ll start by training a model that uses our Caffe uses structured text files to define network architectures. These text files are based on Google’s Protocol Buffers. You can read the full schema Caffe uses. For the most part we’re not going to work with these, but it’s good to be aware of their existence, since we’ll have to modify them in later steps. The AlexNet prototxt file looks like this, for example: https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt. We’ll train our network for 30 epochs, which means that it will learn (with our training images) then test itself (using our validation images), and adjust the network’s weights depending on how well it’s doing, and repeat this process 30 times. Each time it completes a cycle we’ll get info about Accuracy (0% to 100%, where higher is better) and what our Loss is (the sum of all the mistakes that were made, where lower is better). Ideally we want a network that is able to predict with high accuracy, and with few errors (small loss). NOTE: some people have reported hitting errors in DIGITS doing this training run. For many, the problem related to available memory (the process needs a lot of memory to work). If you're using Docker, you might want to try increasing the amount of memory available to DIGITS (in Docker, preferences -> advanced -> memory). Initially, our network’s accuracy is a bit below 50%. This makes sense, because at first it’s just “guessing” between two categories using randomly assigned weights. Over time it’s able to achieve 87.5% accuracy, with a loss of 0.37. The entire 30 epoch run took me just under 6 minutes. We can test our model using an image we upload or a URL to an image on the web. Let’s test it on a few examples that weren’t in our training/validation dataset: It almost seems perfect, until we try another: Here it falls down completely, and confuses a seahorse for a dolphin, and worse, does so with a high degree of confidence. The reality is that our dataset is too small to be useful for training a really good neural network. We really need 10s or 100s of thousands of images, and with that, a lot of computing power to process everything. Training: Attempt 2, Fine Tuning AlexNetHow Fine Tuning worksDesigning a neural network from scratch, collecting data sufficient to train it (e.g., millions of images), and accessing GPUs for weeks to complete the training is beyond the reach of most of us. To make it practical for smaller amounts of data to be used, we employ a technique called Transfer Learning, or Fine Tuning. Fine tuning takes advantage of the layout of deep neural networks, and uses pretrained networks to do the hard work of initial object detection. Imagine using a neural network to be like looking at something far away with a pair of binoculars. You first put the binoculars to your eyes, and everything is blurry. As you adjust the focus, you start to see colours, lines, shapes, and eventually you are able to pick out the shape of a bird, then with some more adjustment you can identify the species of bird. In a multi-layered network, the initial layers extract features (e.g., edges), with later layers using these features to detect shapes (e.g., a wheel, an eye), which are then feed into final classification layers that detect items based on accumulated characteristics from previous layers (e.g., a cat vs. a dog). A network has to be able to go from pixels to circles to eyes to two eyes placed in a particular orientation, and so on up to being able to finally conclude that an image depicts a cat. What we’d like to do is to specialize an existing, pretrained network for classifying a new set of image classes instead of the ones on which it was initially trained. Because the network already knows how to “see” features in images, we’d like to retrain it to “see” our particular image types. We don’t need to start from scratch with the majority of the layers--we want to transfer the learning already done in these layers to our new classification task. Unlike our previous attempt, which used random weights, we’ll use the existing weights of the final network in our training. However, we’ll throw away the final classification layer(s) and retrain the network with our image dataset, fine tuning it to our image classes. For this to work, we need a pretrained network that is similar enough to our own data that the learned weights will be useful. Luckily, the networks we’ll use below were trained on millions of natural images from ImageNet, which is useful across a broad range of classification tasks. This technique has been used to do interesting things like screening for eye diseases from medical imagery, identifying plankton species from microscopic images collected at sea, to categorizing the artistic style of Flickr images. Doing this perfectly, like all of machine learning, requires you to understand the data and network architecture--you have to be careful with overfitting of the data, might need to fix some of the layers, might need to insert new layers, etc. However, my experience is that it “Just Works” much of the time, and it’s worth you simply doing an experiment to see what you can achieve using our naive approach. Uploading Pretrained NetworksIn our first attempt, we used AlexNet’s architecture, but started with random weights in the network’s layers. What we’d like to do is download and use a version of AlexNet that has already been trained on a massive dataset. Thankfully we can do exactly this. A snapshot of AlexNet is available for download: https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet.
We need the binary While you’re downloading pretrained models, let’s get one more at the same time.
In 2014, Google won the same ImageNet competition with GoogLeNet (codenamed Inception):
a 22-layer neural network. A snapshot of GoogLeNet is available for download
as well, see https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet.
Again, we’ll need the With these For both of these pretrained models, we can use the defaults DIGITs provides
(i.e., colour, squashed images of 256 x 256). We just need to provide the
For the model definitions we can use https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/train_val.prototxt
for GoogLeNet and https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt
for AlexNet. We aren’t going to use the classification labels of these networks,
so we’ll skip adding a Repeat this process for both AlexNet and GoogLeNet, as we’ll use them both in the coming steps.
The Caffe Model Zoo has quite a few other pretrained networks that could be used, see https://github.com/BVLC/caffe/wiki/Model-Zoo. Fine Tuning AlexNet for Dolphins and SeahorsesTraining a network using a pretrained Caffe Model is similar to starting from scratch, though we have to make a few adjustments. First, we’ll adjust the Base Learning Rate to 0.001 from 0.01, since we don’t need to make such large jumps (i.e., we’re fine tuning). We’ll also use a Pretrained Network, and Customize it. In the pretrained model’s definition (i.e., prototext), we need to rename all
references to the final Fully Connected Layer (where the end result classifications
happen). We do this because we want the model to re-learn new categories from
our dataset vs. its original training data (i.e., we want to throw away the current
final layer). We have to rename the last fully connected layer from “fc8” to
something else, “fc9” for example. Finally, we also need to adjust the number
of categories from Here are the changes we need to make: @@ -332,8 +332,8 @@
}
layer {
- name: "fc8"
+ name: "fc9"
type: "InnerProduct"
bottom: "fc7"
- top: "fc8"
+ top: "fc9"
param {
lr_mult: 1
@@ -345,5 +345,5 @@
}
inner_product_param {
- num_output: 1000
+ num_output: 2
weight_filler {
type: "gaussian"
@@ -359,5 +359,5 @@
name: "accuracy"
type: "Accuracy"
- bottom: "fc8"
+ bottom: "fc9"
bottom: "label"
top: "accuracy"
@@ -367,5 +367,5 @@
name: "loss"
type: "SoftmaxWithLoss"
- bottom: "fc8"
+ bottom: "fc9"
bottom: "label"
top: "loss"
@@ -375,5 +375,5 @@
name: "softmax"
type: "Softmax"
- bottom: "fc8"
+ bottom: "fc9"
top: "softmax"
include { stage: "deploy" } I’ve included the fully modified file I’m using in src/alexnet-customized.prototxt. This time our accuracy starts at ~60% and climbs right away to 87.5%, then to 96% and all the way up to 100%, with the Loss steadily decreasing. After 5 minutes we end up with an accuracy of 100% and a loss of 0.0009. Testing the same seahorse image our previous network got wrong, we see a complete reversal: 100% seahorse. Even a children’s drawing of a seahorse works: The same goes for a dolphin: Even with images that you think might be hard, like this one that has multiple dolphins close together, and with their bodies mostly underwater, it does the right thing: Training: Attempt 3, Fine Tuning GoogLeNetLike the previous AlexNet model we used for fine tuning, we can use GoogLeNet as well. Modifying the network is a bit trickier, since you have to redefine three fully connected layers instead of just one. To fine tune GoogLeNet for our use case, we need to once again create a new Classification Model: We rename all references to the three fully connected classification layers,
@@ -917,10 +917,10 @@
exclude { stage: "deploy" }
}
layer {
- name: "loss1/classifier"
+ name: "loss1a/classifier"
type: "InnerProduct"
bottom: "loss1/fc"
- top: "loss1/classifier"
+ top: "loss1a/classifier"
param {
lr_mult: 1
decay_mult: 1
@@ -930,7 +930,7 @@
decay_mult: 0
}
inner_product_param {
- num_output: 1000
+ num_output: 2
weight_filler {
type: "xavier"
std: 0.0009765625
@@ -945,7 +945,7 @@
layer {
name: "loss1/loss"
type: "SoftmaxWithLoss"
- bottom: "loss1/classifier"
+ bottom: "loss1a/classifier"
bottom: "label"
top: "loss1/loss"
loss_weight: 0.3
@@ -954,7 +954,7 @@
layer {
name: "loss1/top-1"
type: "Accuracy"
- bottom: "loss1/classifier"
+ bottom: "loss1a/classifier"
bottom: "label"
top: "loss1/accuracy"
include { stage: "val" }
@@ -962,7 +962,7 @@
layer {
name: "loss1/top-5"
type: "Accuracy"
- bottom: "loss1/classifier"
+ bottom: "loss1a/classifier"
bottom: "label"
top: "loss1/accuracy-top5"
include { stage: "val" }
@@ -1705,10 +1705,10 @@
exclude { stage: "deploy" }
}
layer {
- name: "loss2/classifier"
+ name: "loss2a/classifier"
type: "InnerProduct"
bottom: "loss2/fc"
- top: "loss2/classifier"
+ top: "loss2a/classifier"
param {
lr_mult: 1
decay_mult: 1
@@ -1718,7 +1718,7 @@
decay_mult: 0
}
inner_product_param {
- num_output: 1000
+ num_output: 2
weight_filler {
type: "xavier"
std: 0.0009765625
@@ -1733,7 +1733,7 @@
layer {
name: "loss2/loss"
type: "SoftmaxWithLoss"
- bottom: "loss2/classifier"
+ bottom: "loss2a/classifier"
bottom: "label"
top: "loss2/loss"
loss_weight: 0.3
@@ -1742,7 +1742,7 @@
layer {
name: "loss2/top-1"
type: "Accuracy"
- bottom: "loss2/classifier"
+ bottom: "loss2a/classifier"
bottom: "label"
top: "loss2/accuracy"
include { stage: "val" }
@@ -1750,7 +1750,7 @@
layer {
name: "loss2/top-5"
type: "Accuracy"
- bottom: "loss2/classifier"
+ bottom: "loss2a/classifier"
bottom: "label"
top: "loss2/accuracy-top5"
include { stage: "val" }
@@ -2435,10 +2435,10 @@
}
}
layer {
- name: "loss3/classifier"
+ name: "loss3a/classifier"
type: "InnerProduct"
bottom: "pool5/7x7_s1"
- top: "loss3/classifier"
+ top: "loss3a/classifier"
param {
lr_mult: 1
decay_mult: 1
@@ -2448,7 +2448,7 @@
decay_mult: 0
}
inner_product_param {
- num_output: 1000
+ num_output: 2
weight_filler {
type: "xavier"
}
@@ -2461,7 +2461,7 @@
layer {
name: "loss3/loss"
type: "SoftmaxWithLoss"
- bottom: "loss3/classifier"
+ bottom: "loss3a/classifier"
bottom: "label"
top: "loss"
loss_weight: 1
@@ -2470,7 +2470,7 @@
layer {
name: "loss3/top-1"
type: "Accuracy"
- bottom: "loss3/classifier"
+ bottom: "loss3a/classifier"
bottom: "label"
top: "accuracy"
include { stage: "val" }
@@ -2478,7 +2478,7 @@
layer {
name: "loss3/top-5"
type: "Accuracy"
- bottom: "loss3/classifier"
+ bottom: "loss3a/classifier"
bottom: "label"
top: "accuracy-top5"
include { stage: "val" }
@@ -2489,7 +2489,7 @@
layer {
name: "softmax"
type: "Softmax"
- bottom: "loss3/classifier"
+ bottom: "loss3a/classifier"
top: "softmax"
include { stage: "deploy" }
} I’ve put the complete file in src/googlenet-customized.prototxt.
Great question, and it's something I'm wondering, too. For example, I know that we can "fix" certain layers so the weights don't change. Doing other things involves understanding how the layers work, which is beyond this guide, and also beyond its author at present! Like we did with fine tuning AlexNet, we also reduce the learning rate by
10% from
Great question, and one that I wonder about as well. I only have a vague understanding of these and it’s likely that there are improvements we can make if you know how to alter these values when training. This is something that needs better documentation. Because GoogLeNet has a more complicated architecture than AlexNet, fine tuning it requires more time. On my laptop, it takes 10 minutes to retrain GoogLeNet with our dataset, achieving 100% accuracy and a loss of 0.0070: Just as we saw with the fine tuned version of AlexNet, our modified GoogLeNet performs amazing well--the best so far: |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论