Here, in Catalonia, every upper-secondary school student, in its first year, has to decide a topic or issue to research into during one year. That’s what we call “Treball de Recerca de Batxillerat” in catalan, and it is quite important for the secondary school’s final mark before entering the university of our choice.
I decided to merge some of my interests and came out with the idea of developing a computer vision system for MAVs (Micro Aerial Vehicles, also known as UAVs or drones), primarily aimed for obstacle avoidance. Computer vision would prevent MAVs from colliding with other objects, which is something that cannot be accomplished with GPS.
Later on, I shifted to testing on an RC car instead of a flying machine since I was offered by the computer vision group at UAB to work on this project with them. Since then, all the work became very intensive and the project progressed really fast.
The research paper focuses on autonomous navigation using stereo cameras. It gets into how camera calibration, scene reconstruction, localization, obstacle avoidance and path planning are accomplished by processing images from a pair of cameras, which is very close to how humans and animals perform navigation using the sense of sight.
Some of the processes exposed in the paper are also implemented and tested in code (keep reading for more information).
Mobile platform and hardware
The computing hardware is composed of three elements:
- A PC motherboard: a powerful but compact ASRock Q1900DC-ITX board with an Intel quad-core processor.
- An Arduino board: an easy way of interfacing the computer with the car controls.
- A stereo camera: the DUO3D camera, although we have also tested a generic stereo webcam called Minoru.
In order to mount all the components on-board, the RC car was adapted using 3D printed pieces and aluminium tubes:
The project’s code has more than 4500 lines. It is divided into classes to accomplish different functionalities:
- Stereo camera management: stereo-frame retrieval, calibration, rectification, disparity and depth maps generation.
- Odometry: key point detection and tracking.
- Path planning: obstacle detection and avoidance.
- Graphical interface: a simple 2D path planning simulator.
- Arduino serial comunication: Communication over serial port is accomplished thanks to the Arduino-serial library writen by Tod E. Kurt.
StereoPair class provides a whide set of functions to control a stereo camera. It also has visualization and image saving options to ease debugging and camera configuration steps.
The code can handle generic stereo cameras as well as the DUO3D camera (using the DUO SDK).
Initialization of a generic camera:
- The camera IDs are usually between 0 and 3.
- InitStereoSuccess is a flag that is set to “false” if the initialization fails (the error is normally printed on the terminal window).
StereoPair stereoCam(STEREOCAM_LEFT_ID, STEREOCAM_RIGHT_ID, FRAME_WIDTH, FRAME_HEIGHT, STEREOCAM_FRAME_RATE, initStereoSuccess);
For initializing the DUO3D camera, simply set both camera IDs to 0:
StereoPair stereoCam(0, 0, FRAME_WIDTH, FRAME_HEIGHT, STEREOCAM_FRAME_RATE, initStereoSuccess);
Camera calibration is a required step for most computer vision aplications using stereo cameras. Its aim is to remove any distortion caused by the lenses and to align both camera planes. Whithout this step, the algorithms would fail to compute a precise 3D image suitable for path planning.
Making the calibration procedure fast and easy has been one of the most challenging tasks when writing the code for the stereo camera.
Calibrate the camera using the following call (if the provided calibration file does not exist it will be created automatically):
string CALIBRATION_FILE = &quot;&lt;project directory&gt;/data/stereo_calibration_parameters.xml&quot;; string OUTPUT_FOLDER = &quot;&lt;project directory&gt;/data/&quot;; stereoCam.calibrate(CALIBRATION_FILE, OUTPUT_FOLDER);
If you don’t want to save the results, just omit OUTPUT_FOLDER:
Once rectified (see the code at the repository), the stereo images look curved at the margins. A scaling parameter (“alpha” in function
stereoRectify) lets you choose wether to crop the image to the valid region of pixels or let it as it is:
//OpenCV's algorithm stereoRectify(cameraMatrix0, distCoeffs0, cameraMatrix1, distCoeffs1, imageSize, RInitial, TInitial, R1, R2, P1, P2, Q, CALIB_ZERO_DISPARITY, alpha, imageSize, validRoi, validRoi);
Having stereo frames completely undistorted and aligned makes it easy to triangulate the depth of the pixels. The first step is to find matching points between the images —these are the image points that correspond to the same physical point. Then for each pair of points is calculated their disparity, which is just the difference of their X coordinates (Y coordinates are supposed to be the same, as the images are rectified).
Calculating the disparity of each pair of points yields what is called a dense disparity map. In OpenCV these maps are left image based, that is, the color data of each pixel of the left image is replaced with its disparity. Because the coordinates of the pixels are maintained, the left image coincides with the depth map.
Finally, the 3D coordinates of the points are computed using the following equations (I recommend visiting the paper to see demonstrations on how to arrive to them):
The units of the 3D point coordinates will be the same units as the baseline. Note that since the units of the focal distance and the disparity are in pixels, they cancel out, so the units of Q_z will be the ones of b. And because x and y are also in pixels they cancel out too with their relative focal distances, leaving the units of Q_z –which are b‘s units– as the units of Q_x and Q_y.
The result of computing the 3D coordinates of every image point is called a depth map. A depth map cannot be easily represented with a 2D image, just as is done with disparity maps. Instead, it requires the implementation of a point cloud renderer. I used the Point Cloud Library for this purpose. The following screenshots show how depth maps can be viewed from different angles, even though they are computed from just two flat images.
This tool is really powerful not only for obstacle detection, but for many other tasks such as image segmentation, scene reconstruction, 3D scanning, human pose tracking, etc.
This part of the project is only partially implemented in the code due to its complexity and my time constraints.
Odometry is the process of determining the position of a camera in space by analysing successive images. This cannot only be used to track the vehicle’s movements within a map, but also to construct the map out of 3D points. Performing these two actions simultaneously is known as VSLAM (Visual Simultaneous Localisation and Mapping)
The first step for tracking the camera’s motion is to find a way to compare successive images. This is accomplished by detecting salient feature points —such as corners—, and matching them between frames.
As you can observe from the lines directions, the camera has moved horizontally. Something that is less noticeable is that closer points move faster (longer lines) than those that are farther. This is called the Parallax effect and is used to estimate the depth of the matched points –although this would be better classified as 3D reconstruction.
For further information on how does the keypoint matching work, please visit the research paper.
Avoidance path planning
This has been the most creative part of the project. The path planning algorithm has been designed from scratch. It is a simple approach though, but may work on simple situations.
The algorithm considers the scene as a flat surface divided into squares. Each square can be either free or occupied —in which case it is an obstacle to be avoided—.
When there is are one or more squares occupied, a set of valid curve radius ranges are computed. These curves are the ones that would not make the vehicle collide. The algorithm chooses the curve radius that produces the smallest deviation from the original path.
By concatenating this curves, the car should eventually return to its original path —note that since the odometry algorithm is not finished, it is currently not possible to keep track of the deviation.
When no obstacles are detected –and if it was possible to track the deviation from the track– the algorithm would trace a smooth path towards the original one.
With the graphical simulator it was easier to test and debug the algorithm. It allows you to “paint” obstacles –moving the mouse and pressing ‘a’ to add or ‘s’ to delete– and see the results —pressing the ENTER key. It is posible to set the scenario dimensions as well as the square size (in meters):
General path planning
Although this part of the research is not implemented in code, it is explained in the paper and plays an important role in autonomous navigation. It is not programmed because it requires a map, which in turn requires odometry and, as I wrote before, this was an objective that was crossed out due to time constraints.
Path planning comprises a set of techniques to compute the optimal path to traverse a map from one point to another.
For further information and interactive simulations, I recommend either reading the paper and visiting the following site:
Getting a working copy of the code
Along with the code files is included a CMake file to make it easy to start playing with the project. Note that although the code won’t work properly on Windows machines (if successful to compile) because I haven’t implemented support for handling files in the Windows filesystem, most of the code is still reusable for projects using this OS. Also note that without a DUO3D camera the program will quit immediately by default. This behaviour can be changed by undefining the preprocessor directive DUO3D in the main.cpp file.
To start using the code (Linux and Mac):
Open a terminal window and navigate to the project’s source directory:
Build the the project:
cmake .. -G Xcode
cmake --helpand scrolling to the end.
… or compile the project:
sudo make install
This will install files in /usr/local/var/lib/autonomousCar and will create an executable in the build directory we have just created.
- OpenCV 2.7 (versions 3.0 and newer are not supported)
- PCL 1.7.2 (I haven’t tested other versions).
- You also need to have CMake installed in order to build the project.
- Install Homebrew:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Install XCode from the AppStore.
- Agree XCode license for command line tools:
sudo xcrun cc
- Install CMake:
brew install cmake
- Install OpenCV:
brew tap homebrew/science
brew info opencv
brew install opencv
- Install PCL following the instructions on: http://www.pointclouds.org/documentation/tutorials/installing_homebrew.php
This research project has been a basic introduction into autonomous guidance using computer vision. As can be seen from the more than 3500 lines of code, the complexity of autonomous guidance using cameras has converted this project, which comprises an area of knowledge from which I knew almost nothing at the beginning, into a massive learning process.
The initial objectives where very ambitious, considering that I knew very little about computer vision and that was going to do everything on my free time. I accomplished half of them:
- Program the car to detect obstacles using a stereo camera and avoid them.
- Make all the code as platform-independent as possible to be able to port it later to the aerial platform: avoidance path planning only works in 2D.
- Map an area and later use the map to plan misions: too complex and very little time available.
I will keep updating and improving the project, from time to time.
All in all, it has been a very interesting research from which I have gained experience on computer vision and improved my project management and C++ coding skills.
I write the following list to give an idea of the things that can be learned from a project like this. The difficulty for each entry is ranked easy (•), challenging (••) or very challenging (•••), just to orientate on how one should approach such a challenge.
- (•) Improve C++ coding skills: while C++ can seem dificult at first, having an objective makes you focus more on the problem itself and overcoming coding doubts becomes an almost effortless task.
- (•) Understand images as numeric matrices: when working on pixels, an image has to be converted into a format that allows easy access to the pixel and color data. Matrices are mathematical objects that (usually) make operations on groups of numbers much easier and fast to understand and compute.
- (•) Learn the elemental camera prinnciples: pixel size, focal length, lens distortion, resolution, color, exposure, etc. You should get in touch with all these technic concepts in order to properly understand the documentation and the math behind the algorithms that work with images.
- (••) Learn the stereo camera principles: stereo rectification and pixel depth calculation. This is basic for understanding the importance of the calibration procedure and how objects are detected with their real-world measures. It is also a very interesting topic due to its resemblance with the human’s depth perception.
- (••) Learn to use the OpenCV library: this is a huge collection of code functions and classes mainly aimed at real-time computer vision. It also provides useful data structures, such as
matrices for storing images and performing operations efficiently, and visualization utilities. The difficulty relies on getting some background to know what functions to use or where to look for them. Fortunately, the library is very well documented in its official page and in books and other pages. A quick googling always solves almost any doubt!
- (••) Manage a local and remote GIT repository: GIT is a great tool for managing code versions and share them with a team. GIT can seem a bit confusing at first, but with practice it becomes very easy to use and gives you the security that you always have all the working backups of your work.
- (•••) Understand how the computer vision algorithms work: Even that OpenCV provides most of the algorithms, it is very important to understand what’s going on when you make them run. I try to answer myself these questions when investigating an algorithm:
- What kind of data does it require for input?
- What does it do with the data?
- On what is it based?
- What are the possible errors it can give?
- Is it an intensive computational task?
- How can I improve performance? (optimizing performance is one of the tasks I like most)
- What is the returned value?
- Should I expect the output data to be extremely precise?
- If not, how can I filter the outliers?
- Should I have to handle exceptions?
- Should I expect the output data to be extremely precise?
- (•••) Getting the camera calibration and stereo rectification toolchain to work: this has been one of the most dificult challenges. I don’t think I would have done it without the help of the ADAS Group.
- (•••) Design a path planning algorithm: this part took me quite a while, and also required writing the simulator. During this process I had to think about trigonometry, customising sorting algorithms, handling many different cases, measurement systems (from pixels to meters and vice versa), mouse and keyboard interaction, custom data structures and long debuging sessions.
This project has been done in collaboration with the Advanced Driver Assistance Group (ADAS; www.cvc.uab.es/adas) of the Computer Vision Center at Universitat Autònoma de Barcelona. In particular, I was supported by the researchers Antonio M. López (www.cvc.uab.es/~antonio), David Vázquez (www.cvc.uab.es/~dvazquez), and Germán Ros (www.cvc.uab.es/~gros).
Moreover, the wise supervision of Jordi Campos Miralles has been, among other things, the door-opening for the collaboration with the ADAS Group.
Last but not least, thanks to the advise given by the co-supervisor Manel Martínez Pascual I was able to introduce rich geometrical and mathematical demonstrations into the paper.
Camera calibration and image rectification:
 “Epipolar Geometry”. Wikipedia.
 “Epipolar Geometry” Artificial Intelligence Center.
”Stereo and 3D vision.” University of Washington Computer Science & Engineering, online course lecture presentation.
 “Camera parameters” Computer science, University of Nevada.
 “Extrinsic camera parameters” Article.
 “Intrinsic camera parameters” Article.
 “Camera calibration and 3D reconstruction”. OpenCV documentation.
 “Stereo vision: triangulation”. Luca Iocchi.
 “Visual odometry” Wikipedia.
 Marcos Nieto, “Detection and tracking of vanishing points in dynamic environments”. PhD Thesis, Universidad Politécnica de Madrid, 2010.
 Marcos Nieto, “Stereo visual odometry using OpenCV”. Personal blog.
”FAST Algorithm for corner detection” OpenCV documentation.
”A tutorial on binary descriptors: The BRIEF descriptor.” Gil’s computer vision blog.
”Introduction to A*“. Article.
”Dijkstra’s shortest path algorithm“. Article.
 Gray Bradski & Ardian Kaehler, “Learning OpenCV”, book: