YOLO: Real-Time Object Detection (2024)

You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev.

Comparison to Other Detectors

YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. Moreover, you can easily tradeoff between speed and accuracy simply by changing the size of the model, no retraining required!

Performance on the COCO Dataset

Model	Train	Test	mAP	FLOPS	FPS	Cfg	Weights
SSD300	COCO trainval	test-dev	41.2	-	46	link
SSD500	COCO trainval	test-dev	46.5	-	19	link
YOLOv2 608x608	COCO trainval	test-dev	48.1	62.94 Bn	40	cfg	weights
Tiny YOLO	COCO trainval	test-dev	23.7	5.41 Bn	244	cfg	weights

SSD321	COCO trainval	test-dev	45.4	-	16	link
DSSD321	COCO trainval	test-dev	46.1	-	12	link
R-FCN	COCO trainval	test-dev	51.9	-	12	link
SSD513	COCO trainval	test-dev	50.4	-	8	link
DSSD513	COCO trainval	test-dev	53.3	-	6	link
FPN FRCN	COCO trainval	test-dev	59.1	-	6	link
Retinanet-50-500	COCO trainval	test-dev	50.9	-	14	link
Retinanet-101-500	COCO trainval	test-dev	53.1	-	11	link
Retinanet-101-800	COCO trainval	test-dev	57.5	-	5	link
YOLOv3-320	COCO trainval	test-dev	51.5	38.97 Bn	45	cfg	weights
YOLOv3-416	COCO trainval	test-dev	55.3	65.86 Bn	35	cfg	weights
YOLOv3-608	COCO trainval	test-dev	57.9	140.69 Bn	20	cfg	weights
YOLOv3-tiny	COCO trainval	test-dev	33.1	5.56 Bn	220	cfg	weights
YOLOv3-spp	COCO trainval	test-dev	60.6	141.45 Bn	20	cfg	weights

How It Works

Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.

We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Our model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. See our paper for more details on the full system.

What's New in Version 3?

YOLOv3 uses a few tricks to improve training and increase performance, including: multi-scale predictions, a better backbone classifier, and more. The full details are in our paper!

Detection Using A Pre-Trained Model

This post will guide you through detecting objects with the YOLO system using a pre-trained model. If you don't already have Darknet installed, you should do that first. Or instead of reading all that just run:

git clone https://github.com/pjreddie/darknetcd darknetmake

Easy!

You already have the config file for YOLO in the cfg/ subdirectory. You will have to download the pre-trained weight file here (237 MB). Or just run this:

wget https://pjreddie.com/media/files/yolov3.weights

Then run the detector!

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

You will see some output like this:

Multiple Images

Instead of supplying an image on the command line, you can leave it blank to try multiple images in a row. Instead you will see a prompt when the config and weights are done loading:

./darknet detect cfg/yolov3.cfg yolov3.weightslayer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BFLOPs ....... 104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BFLOPs 106 detectionLoading weights from yolov3.weights...Done!Enter Image Path:

Enter an image path like data/horses.jpg to have it predict boxes for that image.

Once it is done it will prompt you for more paths to try different images. Use Ctrl-C to exit the program once you are done.

Changing The Detection Threshold

By default, YOLO only displays objects detected with a confidence of .25 or higher. You can change this by passing the -thresh <val> flag to the yolo command. For example, to display all detection you can set the threshold to 0:

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg -thresh 0

Which produces:

![][all]

So that's obviously not super useful but you can set it to different values to control what gets thresholded by the model.

Tiny YOLOv3

We have a very small model as well for constrained environments, yolov3-tiny. To use this model, first download the weights:

wget https://pjreddie.com/media/files/yolov3-tiny.weights

Then run the detector with the tiny config file and weights:

./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg

Real-Time Detection on a Webcam

Running YOLO on test data isn't very interesting if you can't see the result. Instead of running it on a bunch of images let's run it on the input from a webcam!

To run this demo you will need to compile Darknet with CUDA and OpenCV. Then run the command:

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights

YOLO will display the current FPS and predicted classes as well as the image with bounding boxes drawn on top of it.

You will need a webcam connected to the computer that OpenCV can connect to or it won't work. If you have multiple webcams connected and want to select which one to use you can pass the flag -c <num> to pick (OpenCV uses webcam 0 by default).

You can also run it on a video file if OpenCV can read the video:

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights <video file>

That's how we made the YouTube video above.

Training YOLO on VOC

You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the Pascal VOC dataset.

Get The Pascal VOC Data

To train YOLO you will need all of the VOC data from 2007 to 2012. You can find links to the data here. To get all the data, make a directory to store it all and from that directory run:

wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tarwget https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tarwget https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tartar xf VOCtrainval_11-May-2012.tartar xf VOCtrainval_06-Nov-2007.tartar xf VOCtest_06-Nov-2007.tar

There will now be a VOCdevkit/ subdirectory with all the VOC training data in it.

Generate Labels for VOC

Now we need to generate the label files that Darknet uses. Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:

<object-class> <x> <y> <width> <height>

Where x, y, width, and height are relative to the image's width and height. To generate these file we will run the voc_label.py script in Darknet's scripts/ directory. Let's just download it again because we are lazy.

wget https://pjreddie.com/media/files/voc_label.pypython voc_label.py

After a few minutes, this script will generate all of the requisite files. Mostly it generates a lot of label files in VOCdevkit/VOC2007/labels/ and VOCdevkit/VOC2012/labels/. In your directory you should see:

ls2007_test.txt VOCdevkit2007_train.txt voc_label.py2007_val.txt VOCtest_06-Nov-2007.tar2012_train.txt VOCtrainval_06-Nov-2007.tar2012_val.txt VOCtrainval_11-May-2012.tar

The text files like 2007_train.txt list the image files for that year and image set. Darknet needs one text file with all of the images you want to train on. In this example, let's train with everything except the 2007 test set so that we can test our model. Run:

cat 2007_train.txt 2007_val.txt 2012_*.txt > train.txt

Now we have all the 2007 trainval and the 2012 trainval set in one big list. That's all we have to do for data setup!

Modify Cfg for Pascal Data

Now go to your Darknet directory. We have to change the cfg/voc.data config file to point to your data:

 1 classes= 20 2 train = <path-to-voc>/train.txt 3 valid = <path-to-voc>2007_test.txt 4 names = data/voc.names 5 backup = backup

You should replace <path-to-voc> with the directory where you put the VOC data.

Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the darknet53 model. You can just download the weights for the convolutional layers here (76 MB).

wget https://pjreddie.com/media/files/darknet53.conv.74

Train The Model

Now we can train! Run the command:

./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

Training YOLO on COCO

You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.

Get The COCO Data

To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:

cp scripts/get_coco_dataset.sh datacd databash get_coco_dataset.sh

Now you should have all the data and the labels generated for Darknet.

Modify cfg for COCO

Now go to your Darknet directory. We have to change the cfg/coco.data config file to point to your data:

 1 classes= 80 2 train = <path-to-coco>/trainvalno5k.txt 3 valid = <path-to-coco>/5k.txt 4 names = data/coco.names 5 backup = backup

You should replace <path-to-coco> with the directory where you put the COCO data.

You should also modify your model cfg for training instead of testing. cfg/yolo.cfg should look like this:

[net]# Testing# batch=1# subdivisions=1# Trainingbatch=64subdivisions=8....

Train The Model

Now we can train! Run the command:

./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74

If you want to use multiple gpus run:

./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74 -gpus 0,1,2,3

If you want to stop and restart training from a checkpoint:

./darknet detector train cfg/coco.data cfg/yolov3.cfg backup/yolov3.backup -gpus 0,1,2,3

YOLOv3 on the Open Images dataset

wget https://pjreddie.com/media/files/yolov3-openimages.weights./darknet detector test cfg/openimages.data cfg/yolov3-openimages.cfg yolov3-openimages.weights

What Happened to the Old YOLO Site?

If you are using YOLO version 2 you can still find the site here: https://pjreddie.com/darknet/yolov2/

Cite

If you use YOLOv3 in your work please cite our paper!

@article{yolov3, title={YOLOv3: An Incremental Improvement}, author={Redmon, Joseph and Farhadi, Ali}, journal = {arXiv}, year={2018}}

FAQs

What is Yolo real-time object detection? ›

What is YOLO? YOLO (You Only Look Once) is a real-time object detection algorithm developed by Joseph Redmon and Ali Farhadi in 2015. It is a single-stage object detector that uses a convolutional neural network (CNN) to predict the bounding boxes and class probabilities of objects in input images.

What is real-time object detection? ›

**Real-Time Object Detection** is a computer vision task that involves identifying and locating objects of interest in real-time video sequences with fast inference while maintaining a base level of accuracy.

Discover More Details ›

Is Yolo the best object detection algorithm? ›

What is YOLOv7? YOLOv7 is one of the fastest and most accurate real-time object detection models for computer vision tasks.

Tell Me More ›

Why is Yolo better than CNN? ›

V.

As compared to Faster R-CNN, YOLO has more advanced applications. YOLO proves to be a cleaner and more efficient for doing object detection since it provides end-to-end training. Both the algorithms are fairly accurate but, in some cases, YOLO outperforms Faster R-CNN in terms of accuracy, speed and efficiency.

Explore More ›

What are the disadvantages of Yolo? ›

Limitations Of YOLO:

It is difficult to detect small objects that appear in groups. It struggles to generalize objects in new or unusual aspect ratios as the model learns to predict bounding boxes from data itself.

Read On ›

Does Tesla use Yolo? ›

By using yolo, sensors and cameras are detecting the objects at front and around the cars. The tesla AI is also using yolo algorithm to detecting the objects around the vehicle.

Know More ›

How does Yolo work step by step? ›

YOLO divides the input image into a grid and for each grid cell, predicts a certain number of bounding boxes and class probabilities. SSD, on the other hand, predicts bounding boxes and class probabilities at multiple scales in different feature maps.

Get More Info ›

What are the challenges of real-time object detection? ›

Challenges in real-time object detection include occlusion, scale variations, and cluttered environments. Researchers must navigate the trade-offs between accuracy and speed.

Show Me More ›

Why is Yolo so popular? ›

Speed: YOLO models are much faster than Faster RCNN models. YOLO can process images in real-time, with an average speed of 45 frames per second, while Faster RCNN is much slower, taking several seconds to process a single image.

Is Yolo supervised or unsupervised? ›

A new training procedure that uses a combination of supervised and unsupervised learning. YOLOv8 works by first dividing the input image into a grid of cells. For each cell, YOLOv8 predicts a set of bounding boxes, along with the class probabilities for each bounding box.

Show Me More ›

Can Yolo detect multiple objects? ›

This example trains a YOLO v2 multiclass object detector using the trainYOLOv2ObjectDetector function. The trained object detector is able to detect and identify multiple indoor objects.

Show Me More ›

Is Yolo deep learning or machine learning? ›

For each square, YOLO guesses if there is an object in it and, if so, what kind of object it is. It does this by using a deep learning model. The model has been trained on a lot of images and labels. This means that the model knows how to identify different types of objects in images.

Is SSD better than Yolo for object detection? ›

As for performance issues, SSD is more accurate in results than YOLO. Despite this, YOLO is faster and may seem more useful in real-time applications. One should take into consideration this trade-off of accuracy and speed in order to decide which model is more suitable for his application.

Get More Info Here ›

Which is better Tensorflow object detection or Yolo? ›

YOLO is effective for real-time applications since it processes the entire image in a single forward pass, in contrast to typical object recognition techniques that rely on region proposal networks and intricate pipelines.

What is the purpose of Yolo? ›

YOLO (You Only Look Once) is a popular object detection algorithm that has revolutionized the field of computer vision. It is fast and efficient, making it an excellent choice for real-time object detection tasks.

Find Out More ›

What objects can YOLOv3 detect? ›

YOLOv3 is a real-time object detection algorithm capable of detecting specific objects in videos and images. Leveraging features acquired through a deep convolutional neural network, the YOLO machine learning algorithm swiftly identifies objects within an image.

Get More Info Here ›

What is the difference between SSD and Yolo object detection? ›

Keep Reading ›

YOLO: Real-Time Object Detection (2024)

Comparison to Other Detectors

Performance on the COCO Dataset

How It Works

What's New in Version 3?

Detection Using A Pre-Trained Model

Multiple Images

Changing The Detection Threshold

Tiny YOLOv3

Real-Time Detection on a Webcam

Training YOLO on VOC

Get The Pascal VOC Data

Generate Labels for VOC

Modify Cfg for Pascal Data

Download Pretrained Convolutional Weights

Train The Model

Training YOLO on COCO

Get The COCO Data

Modify cfg for COCO

Train The Model

YOLOv3 on the Open Images dataset

What Happened to the Old YOLO Site?

Cite

FAQs

What is Yolo real-time object detection? ›

Can Yolo detect multiple objects? ›

References