Image Quality
Understanding what constitutes quality images and how to optimize the quality of images used in 3VR systems is critical to developing viable solutions. Meeting specific requirements for 3VR facial surveillance is much more challenging than in any traditional CCTV deployment. Accordingly, many partners and users are unaccustomed to being concerned about these issues and can easily overlook them.
By understanding and optimizing image quality, you will be able to:
Better qualify what opportunities are strong fits for 3VR
Set the appropriate level of expectations with partners and users
Design a system that accommodates real world conditions
Deliver a solution that is optimally effective
Key Elements in Optimizing Images for Facial Surveillance
This report will examine and explain the many elements that are critical to using 3VR for facial surveillance. While fundamental principles apply to images from photographic cameras as well as CCTV cameras, the application will differ. The following summarizes the key elements so that the reader may have a quick reference for future use.
Any image, whether video or photo, requires sufficient detail. Detail is determined by
the level of resolution in the image
the size of the person’s face relative to the size of the image
The person’s face must look directly towards the camera. This affects both the class of imported photos that are acceptable and how video cameras must be positioned to capture faces.
Imported images have to meet the requirements listed above. With the exception of mugshots and passport photos, most photos do not meet these requirements.
Video cameras must be positioned specifically to capture faces. The cost may be inexpensive but the skill is not
trivial. Care must be taken to precisely position cameras to capture faces consistently.
Imaging Background
Overview
Two aspects of imaging are most important in understanding and optimizing 3VR facial surveillance:
Resolution
Field of View
Without understanding the impact of these two aspects, it will not be possible to master use of 3VR for either importing images or capturing video. As such, prior to analyzing facial surveillance or its application to 3VR, these aspects shall be explained and relevant
introductory material shall be presented.
Resolution
Resolution, as applied to images used in the 3VR system, is defined as the level of visual detail in the image.
Resolution is commonly defined in two dimensions: horizontal and vertical. For instance, a frequent resolution level cited is 640 x 480 pixels. This means that there are 640 unique pixels across the image horizontally and 480 pixels down the image vertically.
The number of horizontal pixels is important for performing 3VR facial surveillance because it determines the amount of detail available for performing facial analysis. Sufficient horizontal pixels are required to perform facial recognition.
One standard metric used in describing video surveillance images is the Common Intermediate Format (CIF). This is a way to quickly
cite specific resolution levels that are commonly used in digital video. CIF specifies specific horizontal and vertical resolution levels.
The following table provides examples of different CIF levels.
CIF Level | Resolution | Example |
---|---|---|
Quarter CIF | 176 x 44 | |
CIF | 352 x 288 | Typical Internet streaming |
2CIF | 704 x 240 | |
4CIF | 704 x 576 | NTSC camera max resolution |
16CIF | 1408 x 1152 | 1.5 Megapixel camera |
Industry participants commonly cite different CIF levels. These CIF levels have a significant impact on whether imported images or captured video can be used for facial analysis.
To explain further with an example, an image of a person recorded with 4CIF resolution may be able to be used by the 3VR system for facial analysis, however, the same image recorded at CIF resolution may not contain enough detail to be used for facial analysis unless the face is extremely large (at least a quarter of the width of the field of view). See “Camera Placement” on the following page for more information and examples on field of view for facial analysis.
Field of View
Need to Multi Excerpt content from other pages.
Camera Placement for Facial Analysis - 4 Factors
Field of View Determines Size and Resolution of Face
Facial analysis requires a certain minimum resolution level to be effective. This resolution level is measured in pixels.
3VR requires a minimum of 35 horizontal pixels between the eyes (or about 80 - 100 horizontal pixels across the head) to perform facial analysis
3VR performs analysis of all analog NTSC video at 4CIF (704 x 576 pixels)
Given these facts and that the average width between eyes is 3”, and the width of a head is approximately 6 - 7”, an NTSC camera at
4CIF resolution can capture faces in a field of view (FOV) no more than about 4.5 feet.
Place a person standing in the foreground in optimal focus. When they hold their arms stretched out from side-to-side,
you should not be able to see their hands (the image should be cut off at their wrists). If you can see their hands in the
image when they are standing in focus in the foreground, the field of view is too wide for facial analysis.
If you are looking at prerecorded images from a camera already placed, measure the width of the head, and if it is smaller than 1/7 (about 15%) of the field of view, the field of view is too wide for facial analysis.
Poor FOV | Feet | Face |
---|---|---|
5.5 | 1/9 of the image |
Max Acceptable FOV | Feet | Face |
---|---|---|
4.5 | 1/7 of the image |
Excellent FOV | Feet | Face |
---|---|---|
3.5 | 1/6 of the image |
Horizontal Angle
Facial analysis requires a clear image of the full face, directly facing the camera - with minimal turning of the head to the left or right
(horizontal angle) relative to the camera.
The image must simultaneously show both ears of the subject. If one of the ears is not visible, the horizontal angle is too
extreme.
Vertical Angle
Facial analysis requires a clear image of the full face, directly facing the camera - with minimal tilting of the head up or down (vertical
angle) relative to the camera.
The middle of the nose should be higher than or at least the same level as the bottom of the earlobes. If the middle of the
nose appears below the earlobes, the vertical angle is too high.
Cameras need to be mounted low enough or far away enough so that the vertical angle or slope does not exceed 20% above the eye level when subjects are in focus in the foreground. Given an average eye height of 5 feet, a camera 10 feet away can not be mounted higher than 20% of 10 feet (2 feet) above the eye height of 5 feet - so not higher than 5 + 2 = 7 feet. A camera 20 feet away can be mounted as high as 9 feet (20% of 20 = 4 feet above 5 feet, 4 + 5 = 9). See “Determining Camera Mounting Height” later in this section for more details.
Lighting Level
Facial analysis requires even levels of lighting that clearly shows the detail in a face. Facial analysis requires lighting conditions that do not produce shadows and/or dark areas in the face (underexposure) and lighting conditions that do not produce glare and/or washed-out areas in the face (overexposure). A face with lots of detail visible and a wide range of dark and light pixels (referred to as “dynamic range”) is required for facial analysis.
Photo A –Good Lighting | Image |
---|---|
There are no areas of the face in shadow or glare. There is wide dynamic range - lots of both light and dark areas within the face, and lots of detail is visible. |
Photo B –Overexposed | Image |
---|---|
There are no areas of shadow, but there are significant areas with glare (notice the cheeks, nose and |
Photo C –Marginally acceptable | Image |
---|---|
There are some areas of the face in mild shadow and the face appears somewhat darker than desired. There is marginally acceptable dynamic range - moderate amounts of both light and dark areas within the face, and moderate amounts of detail are visible. |
Photo D – Underexposed | Image |
---|---|
There are substantial areas of shadow and/or not enough light. There is a narrower dynamic range - |
Additional Considerations for Megapixel Cameras
Field of View
With the addition of megapixel cameras to your security solution, you can now utilize the higher resolutions available to ultimately provide a wider field of view. In essence, this allows the use of less cameras and more coverage while still capturing face profiles. It is extremely important to understand that the same principles still apply for facial recognition; this includes pixels in between the eyes, horizontal and vertical angles, and lighting.
he number of horizontal pixels is still the key factor in terms of performing 3VR facial surveillance; the advantage can be noted in
the table below.
Resolution | Megapixels | Width for Face (Feet) |
---|---|---|
1024 X 768 | 0.7 | 6.5 |
1280 X 1024 | 1.3 | 8.0 |
1600 X 1200 | 2 | 10.5 |
2048 X 1536 | 3 | 13.5 |
The appearance of the field of view is obviously different with a width of 8.5’.
Pixels Between the Eyes
The same principles that apply to lower resolution cameras apply to megapixel cameras; 3VR still requires 35 pixels between the
eyes. However, analysis is conducted on the full megapixel frame of the camera’s output. For example, with a resolution of 1280 x 1024, 3VR conducts facial analysis at 1280 x 1024 versus an analog camera at 4CIF (704 x 576).
Image Use and Optimization for Facial Analysis
Overview
3VR can analyze faces from two primary sources:
Digital images such as photographs and mugshots can be imported
CCTV cameras can be connected to a 3VR and video can be continuously analyzed
Best practices for each source differ significantly. Both sources are described and discussed below.