COLMAP
COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. It offers a wide range of features for reconstruction of ordered and unordered image collections. The software is licensed under the new BSD license.
COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. It offers a wide range of features for reconstruction of ordered and unordered image collections. The software is licensed under the new BSD license. The latest source code is available at GitHub. COLMAP builds on top of existing works and when using specific algorithms within COLMAP, please also cite the original authors, as specified in the source code. For convenience, the pre-built binaries for Windows contain both the graphical and command-line interface executables. To start the COLMAP GUI, you can simply double-click the COLMAP.bat batch script or alternatively run it from the Windows command shell or Powershell. The command-line interface is also accessible through this batch script, which automatically sets the necessary library paths. To list the available COLMAP commands, run COLMAP.bat -h in the command shell cmd.exe or in Powershell.
Tutorial
This tutorial covers the topic of image-based 3D reconstruction by demonstrating the individual processing steps in COLMAP. If you are interested in a more general and mathematical introduction to the topic of image-based 3D reconstruction, please also refer to the CVPR 2017 Tutorial on Large-scale 3D Modeling from Crowdsourced Data and [schoenberger_thesis].
Image-based 3D reconstruction from images traditionally first recovers a sparse representation of the scene and the camera poses of the input images using Structure-from-Motion. This output then serves as the input to Multi-View Stereo to recover a dense representation of the scene.
Camera Models
COLMAP implements different camera models of varying complexity. If no intrinsic parameters are known a priori, it is generally best to use the simplest camera model that is complex enough to model the distortion effects:
SIMPLE_PINHOLE
,PINHOLE
: Use these camera models, if your images are undistorted a priori. These use one and two focal length parameters, respectively. Note that even in the case of undistorted images, COLMAP could try to improve the intrinsics with a more complex camera model.SIMPLE_RADIAL
,RADIAL
: This should be the camera model of choice, if the intrinsics are unknown and every image has a different camera calibration, e.g., in the case of Internet photos. Both models are simplified versions of theOPENCV
model only modeling radial distortion effects with one and two parameters, respectively.OPENCV
,FULL_OPENCV
: Use these camera models, if you know the calibration parameters a priori. You can also try to let COLMAP estimate the parameters, if you share the intrinsics for multiple images. Note that the automatic estimation of parameters will most likely fail, if every image has a separate set of intrinsic parameters.SIMPLE_RADIAL_FISHEYE
,RADIAL_FISHEYE
,OPENCV_FISHEYE
,FOV
,THIN_PRISM_FISHEYE
: Use these camera models for fisheye lenses and note that all other models are not really capable of modeling the distortion effects of fisheye lenses. TheFOV
model is used by Google Project Tango (make sure to not initialize omega to zero).
You can inspect the estimated intrinsic parameters by double-clicking specific images in the model viewer or by exporting the model and opening the cameras.txt file.
To achieve optimal reconstruction results, you might have to try different camera models for your problem. Generally, when the reconstruction fails and the estimated focal length values / distortion coefficients are grossly wrong, it is a sign of using a too complex camera model. Contrary, if COLMAP uses many iterative local and global bundle adjustments, it is a sign of using a too simple camera model that is not able to fully model the distortion effects.
You can also share intrinsics between multiple images to obtain more reliable results (see Share intrinsic camera parameters) or you can fix the intrinsic parameters during the reconstruction (see Fix intrinsic camera parameters).
Features
COLMAP reflects these stages in different modules, that can be combined depending on the application. More information on Structure-from-Motion in general and the algorithms in COLMAP can be found in [schoenberger16sfm] and [schoenberger16mvs].
If you have control over the picture capture process, please follow these guidelines for optimal reconstruction results:
- Capture images with good texture. Avoid completely texture-less images (e.g., a white wall or empty desk). If the scene does not contain enough texture itself, you could place additional background objects, such as posters, etc.
- Capture images at similar illumination conditions. Avoid high dynamic range scenes (e.g., pictures against the sun with shadows or pictures through doors/windows). Avoid specularities on shiny surfaces.
- Capture images with high visual overlap. Make sure that each object is seen in at least 3 images – the more images the better.
- Capture images from different viewpoints. Do not take images from the same location by only rotating the camera, e.g., make a few steps after each shot. At the same time, try to have enough images from a relatively similar viewpoint. Note that more images is not necessarily better and might lead to a slow reconstruction process. If you use a video as input, consider down-sampling the frame rate.