problems with webcams

Matthew_Witherwax · September 11, 2013, 12:13pm

Michael,

I too am using the BeagleBone Black, and I have actually spent a lot of time working on this. A lot of what I have discovered is in this post https://groups.google.com/forum/#!topic/beagleboard/2NO62mGcSvA
To recap, you need to reduce the framerate using v4l2-ctl to no more than 15 FPS to capture at 640x480 using the PSEye. You can also do 320x240 at up to 60 FPS. It seems the PS3Eye transfers data in bulk mode putting a lot of data on the the bus. Once the amount of data reaches a certain limit, you will get select timeouts.

The PS3Eye sends uncompressed images, but if compressed images are acceptable for your application, you might want to look into a camera that supports MJPEG compression. Allowing the camera to compress the frames as jpegs greatly reduces the amount of data sent over usb as well as the cpu usage. I am currently capturing still images at 1920x1080 with the FPS set to 30 using the Logitech C920.

Today I am posting an article on thresholding colors using OpenCV on my blog at blog.lemoneerlabs.com and will document more on the webcams I have been working with as soon as I can.

As for the project I am working on, I am giving sight to an autonomous robot I am working on. I will post more on that as well… when I find the time. Please do let me know how you work goes, and I will do my best to help you.

Good luck!

Matthew Witherwax

Michael_Darling · September 11, 2013, 6:39pm

Wow! You seem to be pretty knowledgeable about all of this. You might have sold me on going out and buying a C920 ASAP! I’ve tried the Logitech C270 but couldn’t get above 15fps so I sort of gave up on trying/buying other cameras since nobody could confirm better performance with anything else.

I plan to put the camera on an autonomous RC aircraft that will be following another carrying very bright LEDs. I’m doing some simple “blob detection” and using the EPnP algorithm to estimate the pose (relative positition and orientation of the leader wrt the follower)

At what resolution can you get about 60 fps with the Logitech C920? I need a frame rate that is high enough to stop motion blur, but I don’t need to process every frame with OpenCV. I am limited to 10 Hz by the navigation loop of the autopilot I’m using anyways.

Do you think the C920 would work for my application?

Matthew_Witherwax · September 11, 2013, 7:27pm

I will do some testing with my cameras to see what frame rate I can achieve just sending the data to /dev/null. As of now, I have just confirmed what frame rates and resolutions will allow me to capture individual frames without select timeouts. I have not tried capturing a continuous stream to see what the recordable fps is.

There are several things you should note:

If you are displaying the images on the BBB, it will increase cpu utilization and reduce your frame rate.
If your are writing the stream to disk, the latency in writing will affect your frame rate.
The Logitech C920 tops out hardware wise at 30 FPS for all resolutions if I recall correctly.
Capturing in MJPG offers reduced cpu use and increased throughput due to the smaller images - but you have to decide if compression artifacts will cause you issues.
There are forum posts such as this http://answers.opencv.org/question/4346/videocapture-parameters-cv_cap_prop_fourcc-or/ that suggest you cannot set the pixel format through OpenCV, so in order to capture in MJPG you may have to write your own capture code.

I will let you know how the testing goes.

Michael_Darling · September 11, 2013, 8:11pm

To address some of the excellent points you brought up:

I am no need for displaying images on the BBB – the relative state of the leader w.r.t. the follower UAV (6 numbers, probably as integers) will be sent over to the autopilot via serial over the BBB’s UART pins (I still need to get that set up). The only displaying I might do would be purely for debugging purposes.
I may write frames to disk as part of my flight data recording – but for the exact problem you mentioned I will probably only save off a single frame every couple of seconds or so.
I think that 30 fps will be okay for my application. That is the fastest I could get the PS3 eye to operate in 640x480 even on my laptop. Faster would be better – but I have done some ground tests that give me some hope that 30 fps might work.
I don’t need to do much video processing aside from detecting the bright LEDs in the red channel of my image. I am using a modified version of OpenCV’s “simple blob detector”, which mainly identifies blobs by their brightness, circularity, inertia, etc. Since brightness is the primary criteria here, I think I would be okay with compression as long as the LEDs appear semi-circular, but I will do more reading on it.
I have actually been using custom capture code written by someone else (and modified a bit by me). I can add the v4l2 code to change the pixel format if the capability is not already in the capture code I’m using.

Thanks so much for your help! Its been incredibly helpful to have found someone else wrestling with the same problems.
-Mike

Matthew_Witherwax · September 11, 2013, 10:34pm

Mike,

Just wanted to give you a quick update. Using capture code based on the V4l2 example modified to capture frames in mjpeg format and throw them away, I am able to capture 640x480 at right around 30 fps using the C920. I set up the capture, take 1000 frames, and then tear it down. Inside the program I time actual runtime using calls to clock(), and the whole executable is timed with time on the command line. I did this 10 times on the BBB, and all runs were right around 33.6 seconds which works out to about 29.76 captured frames per second. While this was running, I had another connection open to the BBB running top, and the frame capture application used about .3% of the cpu. Just so you don’t glance over it, that was point 3%.

Keep in mind, I did no processing on the frame, just grabbed it and tossed it, but with only .3% cpu in use, you probably have enough to handle OpenCV. Which, if you don’t mind me suggesting, take a look at the code I posted on my blog today. It shows how to threshold colors, and I have used the technique to find and track both green and red lasers. Depending on your lighting conditions and if you leds are bright and of a distinctive color, it may work for you and be less compute intensive.

Matthew Witherwax

Matthew_Witherwax · September 11, 2013, 10:36pm

Mike,

As an addendum, I will post the code and calling details to my blog shortly… within the next day or so.

William_C_Bonner1 · September 11, 2013, 10:41pm

I thought I’d mention that 've spent a lot of time playing with FFMPEG and the C920 on my BBB. If I’m capturing directly from the camera and writing to the uSD flash in an mp4 file, having FFMPEG do no transcoding, it usually runs about 3% CPU if I’m running at 1GHz, and 10% if I’m running at 300MHz. If you’ve got your performance monitor at the default governer of on-demand, it’ll drop down to the lower frequency if you aren’t taxing the cpu.

Michael_Darling · September 11, 2013, 11:39pm

Wow! Thanks so much Matthew and William! It sounds like the C920 w/ MPEG encoding is the way to go. I will definitely check out the blog you referenced and look over any code you have posted, Matthew.

I’ll get myself a C920 and play around to see if I can get some simple OpenCV code working.

I can’t thank you guys enough! I’m finally feeling optimistic about getting this thesis finished.

-Mike

Matthew_Witherwax · September 12, 2013, 1:15pm

Mike,

I have posted my capture code, the results of some timing tests, and my understanding of USB webcams on my blog here http://blog.lemoneerlabs.com/post/BBB-webcams

Michael_Darling · September 14, 2013, 12:07am

Hi Matthew,

I read through your blog – great work! I have a question that I was hoping you could help me out with:

I was previously using a piece of custom capture code written by Martin Fox, (https://bitbucket.org/beldenfox/cvcapture/src/b7f279b278aa?at=default). His capture code is basically taken from the sample video4linux2 capture code available on the LinuxTV.org website (http://linuxtv.org/downloads/v4l-dvb-apis/capture-example.html). Martin’s code assumes that the camera is capturing in raw YUYV format and goes through the equations for converting YUYV to a cv::Mat object in OpenCV, which is all fairly straightforward. Not surprisingly, if I just change the format in Martin’s code using V4L2_PIX_FMT_H264, I get mostly green frames since the implied YUYV to RGB conversion is no longer valid.

Since I would, however, like to take advantage of the C920’s H.264 hardware compression, I need to make some significant modifications to Martin’s code to make it work for my application. What is the best way to go about decoding the H.264 video stream and converting it into the cv::Mat format that OpenCV understands? Do you know where I could find some sample video4linux capture code that uses the H.264 format that I can study? I see where the V4L2_PIX_FMT_ is set to “YUYV” in Martin’s code, but am not sure how the code might need to be modified if I set this to V4L2_PIX_FMT_H264. (For example: do I need to change the buffer size or will this somehow be handled internally when the buffers are created?) Will I need to make use of the libavcodec library to decompress the video stream? (I imagine that there has to be some library I can use so that I don’t have to reinvent the wheel.)

I’ve looked over your framegrabber.c code, but it looks like you aren’t decoding the video streams just yet. You are just writing out to a binary file – correct?. (You write: “Capturing in H264 and YUYV format will also work, but you will not be able to simply open the resulting file in your favorite image editor.”)

I’ve been Google-ing like crazy, but I was hoping that you could point me towards some helpful resources.

Thanks a ton!

-Mike

Matthew_Witherwax · September 14, 2013, 3:09pm

Mike,

You are correct, the code I posted is just writing out raw frames right now. If the frame that is written is of MJPEG format, then it should be an actual jpeg image. If you cannot open it, then pass it through the MJPEGPatcher program to insert the missing Huffman table.

If the frame is of YUYV format, then it will have to be converted to jpeg using code like the conversion code found in v4l2grab. As I am using this on the BBB, it is more beneficial to have the camera compress and encode the frames as jpeg thus reducing the amount of data transferred and the need to do the conversion on the BBB.

For H264, things are a bit different. In both MJPEG and YUYV, all the data for the frame is present. Or at least enough to reconstruct the image. With H264, frames are dependent on other frames for decoding. I have not begun working on decoding these, but it is my understanding the OpenCV uses or can use ffmpeg. If this is the case, then ffmpeg should be able to read the h264 frames. I would start with OpenCV’s code for capturing frames from the webcam to see if it in fact does anything using ffmpeg and go from there. I will try to start looking into decoding h264 frames this evening.

On your question about using h264 and needing to size the buffers, there is nothing you should need to change. If you have a look at framegrabber, when we receive a capture we also receive the amount of data returned. The only thing you should need to do is set up a process to decode the h264 frames.

Michael_Darling · September 14, 2013, 7:51pm

Sorry for the ignorance here, but I’ve only been working with C/C++/Linux/OpenCV for about a year now – The V4L2 code you modified originally sent the framebuffers as a stream to stdout. How can I get the stream passed to OpenCV as an argument in my own code?

Would it be sensible to compile the custom V4l2 capture code as a class that writes its frame buffer to some stream ( FILE*) that can be passed into a standard OpencV function that can operate on video streams (and hopefully decode the video) such as the VideoCapture class? Or would it be better to leave the V4L2 code as its own command line program and use system() calls in my OpenCV code?

I have played some with ffmpeg in the command line but don’t want to mess with the API if I can help it since it is still a little bit beyond my understanding.

Thanks.

Matthew_Witherwax · September 15, 2013, 12:54am

Mike,

I have been looking into this for awhile and there isn’t a whole lot to go on when it comes to working with h264. The current idea I am working on is modifying framegrabber to output continuously to stdout when the count is -1. This way you can send a continuous stream of captures to stdout.

With this I would pipe the output to avconv and have it set up an rtp stream with something like
./framegrabber -f h264 -H 1080 -W 1920 -c 10 -I 30 -o | avconv -re -i - -vcodec copy -f rtp rtp://xxx.xxx.x.x:5060/
replacing the xs with the ip address of my BBB.

I would then make sure opencv is compiled with ffmpeg support (and possibly gstreamer). If not, it will need to be recompiled. This step is the tricky part because recompiling on the bone will A) take a long time and B) possibly fail because you do not have enough free space depending on if you are running off a large sd card or the 2 gigs of nand. To get around this you can set up a cross compiler on a desktop install of linux. For this see http://archlinuxarm.org/developers/distcc-cross-compiling

I actually went the route of setting up a cross compiler running on a virtual box vm, and it wasn’t terribly difficult.

After all this, OpenCV should be able to open the stream with something like
VideoCapture cap;
cap.open(“rtp://xxx.xxx.x.x:5060/”);

This is the approach I am going to take. I will let you know how it turns out. In the meantime you may want to try it out yourself or see if you can accomplish your goals with the MJPEG stream. I updated my blog today with some final testing. The C920 will stream 1920x1080 at 30 FPS in MJPEG.

Michael_Darling · September 15, 2013, 2:20am

Okay, I think I follow all of that, rtp streams are a new concept to me though. Since I am not sending the video over a network to the BBB (it has to operate on the UAV), my initial thought was to send the video stream to the VideoCapture object using a FILE* pointer. However, the only implementations for VideoCapture::open() are:

VideoCapture::open(int devNo)

and

VideoCapture::open(const string& filename)

So your use of an rtp stream should work since the address can be passed as a string (and OpenCV does indeed accept rtp streams it looks like), but I think I will get complaints if I try to pass a FILE* pointer. Instead, should I set up the rtp stream using the loopback IP address (127.0.0.1)?

Also, after having a better understanding of H.264 I think you are right that MJPEG will probably be fine for what I’m doing. The C920 should be capable of delivering 640x480 MPEG compressed frames at up to 60 fps. In my case, I want as high of a framerate in the camera hardware as I can get to prevent motion blur, but I really don’t need to process every frame in software. Is there any way that I can set the camera to deliver one out of every n frames that it captures? I don’t see a setting for that in the v4l2 controls. I could choose to not mmap() some framebuffers in the video4linux capture code if processing becomes too costly, but by then the frame data has already come across the USB hardware. What I really want is the equivalent of a camera with fast shutter speed that delivers new frames in 640x480 res or higher at ~10 Hz.

Really sorry for my naievete! I never expected that I would have these sort of difficulties with my thesis. Thanks again for your help and patience.

Matthew_Witherwax · September 16, 2013, 11:35am

Mike,

You are correct, because your robot does not have a network connection, you would need to use 127.0.0.1. I did some preliminary testing with the rtp stream yesterday, and I have a couple of issues to report.

First, the stream is delayed by several seconds. No doubt this is due to avconv having to process the h264 stream from the camera and stream it out in the rtp format. Being several seconds off is probably not going to work for your application.

The second issue is the amount of compression artifacts when the scene changes dramatically. I am not sure if it is the camera or avconv, but if the camera is streaming a static scene and someone walks in to the view of the camera and moves about, it takes several seconds for the picture to stabilize. Interestingly, some of it seems to be due to the camera autofocusing. At any rate, I had hoped going the streaming route would allow us to make use of existing methods for consuming the h264 stream, but it looks like we will have to go back to figuring out how to consume it directly.

Once you open the camera, you are presented with a stream of captures. As you have said, there is no way to only capture certain frames without moving the ones you do not want out of the way. The best solution to this is probably to just continually grab frames from the camera and discard them until some signal is received. When the signal is received, process the frame(s) in whatever manner you would like, then switch back to discarding the frames until the next signal. Simply grabbing and discarding frames in MJPEG format is relatively cheap. Capturing MJPEG frames at 1920x1080 with framegrabber uses just .3% of the cpu.

A couple things to note:

The Select() call used to grab the frame data will suspend execution until there is data to grab http://linuxtv.org/downloads/v4l-dvb-apis/func-select.html

You should be able to grab frames in an infinite loop without swallowing the cpu as the suspend will yield the cpu for other tasks.

Per the statement above, you might want to do your capturing on a separate thread so your main program is not dependent on the arrival of data from the camera.

Finally you can always try to start up the camera, grab the frames you want, and shutdown the camera at certain time intervals to achieve the effect you desire, but I don’t think that will work out too well.

Michael_Darling · September 16, 2013, 1:12pm

Interesting…

I have been doing some googling and found a few tutorials on video capture with the BeagleBone by a guy named Derek Molloy http://derekmolloy.ie/beaglebone/beaglebone-video-capture-and-image-processing-on-embedded-linux-using-opencv/ . He has a repo on Github called “boneCV” with a few simple examples using the v4l2 sample capture code, OpenCV, and some command line utilities like avconv for RTP streaming. Theres not really anything you haven’t already done, but I referenced his tutorial (http://derekmolloy.ie/streaming-video-using-rtp-on-the-beaglebone-black/) for testing RTP streaming via the loopback IP, but unfortunately I never got it working so that I could see the video stream in VLC.

Since I wasn’t able to view the stream, I didn’t get the chance to see the delay you mentioned for myself – Did you try MJPEG format in as well as h264? I’m not sure if it would make any difference in the delay, but wonder if it might eliminate the artifacts from dramatic scene changes (knowing that h264 is an interframe method). Also, if your application allows it, you could always disable the C920’s autofocus and manually set it to “0” (out of 255 – the equivalent of “infinity focus”) with v4l2-ctl or libv4l. This is probably what I would do since my UAVs will probably keep at least 30 feet of separation.

As far as the behavior of the select() calls in the v4l2 library – I don’t really care if my entire program has to suspend while it waits for data from the camera since the BBB is only serving as the processor for my vision subsystem. All of the flight controls are done with the ArduPilot Mega. The only exception to this is if I want to implement some kind of state estimator (Kalman filter) on the BBB to provide a “guess” in between successful localization estimates.

“…It looks like we will have to go back to figuring out how to consume it directly.” – So you are now thinking that YUYV is the way to go after all? It seems to me like the ~13.2 Mbits/s maximum transfer rate over USB that you found is likely to be hardware related.

Some Googling…

This is all new-ish to me but it looks like there are three speeds of USB devices:
(http://abclife.wordpress.com/2008/01/11/usb-11-20-30-super-speed-high-speed-full-speed-low-speed/)
Low Speed: 1.5 Mbits/s
Full Speed: 12 Mbits/s <— looks close to what you estimated
High Speed: 480 Mbits/s

The BeagleBone is equipped with a USB 2.0 port (LS/FS/HS) according to the “Features” table on the product Wiki
(http://circuitco.com/support/index.php?title=BeagleBoneBlack)

I confirmed that 480 Mbit/s should be possible on the BBB:
beaglebone:~# cat /sys/bus/usb/devices/usb?/speed

480
480

Considering the fact that the PS3 Eye and C920 both support 60 fps at 640x480 in YUYV format, I would imagine that they are both High Speed USB devices and aren’t whats limiting the data transfer. I’m not sure where that leaves us, but maybe the issue is in software after all.

Matthew_Witherwax · September 16, 2013, 1:33pm

Mike,

I only tested streaming with H264 but will look into MJPEG streaming when I have a moment. Concerning the autofocus, I plan to see what happens with it as well as auto white balance turned off when I have some more free time.

On “…It looks like we will have to go back to figuring out how to consume it directly.” I meant in order to process the H264 stream, we will have to figure out how to work with it directly as opposed to streaming it and allowing OpenCV to read from the stream. I still believe for your purpose (and mine) MJPEG will be the most fruitful in terms of time to implement and performance.

Both cameras are High Speed devices. It is likely there is an underlying hardware issue or low level usb driver issue that is limiting the amount of data we can push through.

For a hardware example, I have two laptops, both Dells and both with i5 processors. I created a Linux thumbdrive and booted each to test capturing performance of the PS3Eye. The older laptop (by 1 processor generation) could not capture from the PS3Eye at 640x480 at 60 FPS while the newer one could. In both cases the ports used were usb 2.0 and I used the same the same Linux thumbdrive/software.

As a software example, there are numerous reports of terrible webcam performance with the Raspberry Pi due to a bad low level usb driver. http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=23544

It is a bit difficult to way which (or both?) is the problem with the BBB, but for our immediate needs, we should probably focus on working with the relatively quick MJPEG capture we have and solve the usb throughput issue as time permits.

Michael_Darling · September 18, 2013, 1:39am

I’ve spent the last couple of days looking into using the Libav libraries (particularly libavcodec), which avconv is built upon, to do real-time MJPEG decoding. My hope is to pass the frame buffers from the v4l2 capture code into some kind of decoder function written using Libav, and use the raw data to build-up a cv::Mat frame that OpenCV can work with (much like Martin Fox did to convert YUV pixel format to RGB in his own capture code).

I’ve been referencing the api-example.c as there is little documentation on the Libav API. I have reached out for some help with using the libraries on the libav-api mailing list. I will keep you posted as I progress. Let me know if you have any thoughts on all of this.

Michael_Darling · September 18, 2013, 6:27am

On a side note, I’ve noticed some weird issues where the BBB only recognizes the C920 if it is plugged in during boot. I can unplug/re-plug my PS3 eye and C270 and it will detect the change in “lsusb”, but once I plug in the C920 the lsusb command keeps returning the previous state. Everything works as expected on my laptop, though.

Do you have any clue what could cause this? I have tried reflashing the eMMC with the standard Angstrom distro as well as the most recent version of Ubuntu for the BBB. Are you having similar experiences?

-Mike

Matthew_Witherwax · September 18, 2013, 12:32pm

Mike,

You shouldn’t need to decode the jpeg yourself. Attached is a quick example I threw together using python. You can do the same thing in C or C++ see here http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html#imdecode

It would be something like this

CvMat cvmat = cvMat(HEIGHT, WIDTH, CV_8UC3, (void*)buffer);
IplImage * img;
img = cvDecodeImage(&cvmat, 1);

Where you set the height and width to your image parameters.

OpenCVjpeg.py (675 Bytes)