Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 45 Next »

Image Quality

Understanding what constitutes quality images and how to optimize the quality of images used in 3VR systems is critical to developing viable solutions. Meeting specific requirements for 3VR facial surveillance is much more challenging than in any traditional CCTV deployment. Accordingly, many partners and users are unaccustomed to being concerned about these issues and can easily overlook them.

By understanding and optimizing image quality, you will be able to:

  • Better qualify what opportunities are strong fits for 3VR

  • Set the appropriate level of expectations with partners and users

  • Design a system that accommodates real world conditions

  • Deliver a solution that is optimally effective

Imaging Background

Overview

Two aspects of imaging are most important in understanding and optimizing 3VR facial surveillance:

  1. Resolution

  2. Field of View

Without understanding the impact of these two aspects, it will not be possible to master use of 3VR for either importing images or capturing video. As such, prior to analyzing facial surveillance or its application to 3VR, these aspects shall be explained and relevant introductory material shall be presented.

Key Elements in Optimizing Images for Facial Surveillance

This report will examine and explain the many elements that are critical to using 3VR for facial surveillance. While fundamental principles apply to images from photographic cameras as well as CCTV cameras, the application will differ. The following summarizes the key elements so that the reader may have a quick reference for future use.

  • Any image, whether video or photo, requires sufficient detail. Detail is determined by

    • the level of resolution in the image

    • the size of the person’s face relative to the size of the image

  • The person’s face must look directly towards the camera. This affects both the class of imported photos that are acceptable and how video cameras must be positioned to capture faces.

  • Imported images have to meet the requirements listed above. With the exception of mugshots and passport photos, most photos do not meet these requirements.

  • Video cameras must be positioned specifically to capture faces. The cost may be inexpensive but the skill is not trivial. Care must be taken to precisely position cameras to capture faces consistently.

Resolution

Resolution, as applied to images used in the 3VR system, is defined as the level of visual detail in the image.

Resolution is commonly defined in two dimensions: horizontal and vertical. For instance, a frequent resolution level cited is 640 x 480 pixels. This means that there are 640 unique pixels across the image horizontally and 480 pixels down the image vertically.

The number of horizontal pixels is important for performing 3VR facial surveillance because it determines the amount of detail available for performing facial analysis. Sufficient horizontal pixels are required to perform facial recognition.

One standard metric used in describing video surveillance images is the Common Intermediate Format (CIF). This is a way to quickly
cite specific resolution levels that are commonly used in digital video. CIF specifies specific horizontal and vertical resolution levels.

The following table provides examples of different CIF levels.

CIF Level

Resolution

Example

Quarter CIF

176 x 44

CIF

352 x 288

Typical Internet streaming

2CIF

704 x 240

4CIF

704 x 576

NTSC camera max resolution

16CIF

1408 x 1152

1.5 Megapixel camera

Industry participants commonly cite different CIF levels. These CIF levels have a significant impact on whether imported images or captured video can be used for facial analysis.

To explain further with an example, an image of a person recorded with 4CIF resolution may be able to be used by the 3VR system for facial analysis, however, the same image recorded at CIF resolution may not contain enough detail to be used for facial analysis unless the face is extremely large (at least a quarter of the width of the field of view). See “Camera Placement” on the following page for more information and examples on field of view for facial analysis.

Here is a quick clip on Facial Surveillance Use Case in Banking

Field of View

Need to Multi Excerpt content from other pages.

Camera Placement for Facial Analysis - 4 Factors

Field of View Determines Size and Resolution of Face

Facial analysis requires a certain minimum resolution level to be effective. This resolution level is measured in pixels.

  • 3VR requires a minimum of 35 horizontal pixels between the eyes (or about 80 - 100 horizontal pixels across the head) to perform facial analysis

  • 3VR performs analysis of all analog NTSC video at 4CIF (704 x 576 pixels)

Given these facts and that the average width between eyes is 3”, and the width of a head is approximately 6 - 7”, an NTSC camera at
4CIF resolution can capture faces in a field of view (FOV) no more than about 4.5 feet.

Place a person standing in the foreground in optimal focus. When they hold their arms stretched out from side-to-side, you should not be able to see their hands (the image should be cut off at their wrists). If you can see their hands in the image when they are standing in focus in the foreground, the field of view is too wide for facial analysis.

If you are looking at prerecorded images from a camera already placed, measure the width of the head, and if it is smaller than 1/7 (about 15%) of the field of view, the field of view is too wide for facial analysis.

Poor FOV

Feet 5.5;

Face: 1/9 of the image

Max Acceptable FOV

Feet 4.5;

Face: 1/7 of the image

Excellent FOV

Feet 3.5;

Face: 1/6 of the image

Horizontal Angle

Facial analysis requires a clear image of the full face, directly facing the camera - with minimal turning of the head to the left or right (horizontal angle) relative to the camera.

The image must simultaneously show both ears of the subject. If one of the ears is not visible, the horizontal angle is too extreme.

Vertical Angle

Facial analysis requires a clear image of the full face, directly facing the camera - with minimal tilting of the head up or down (vertical angle) relative to the camera.

The middle of the nose should be higher than or at least the same level as the bottom of the earlobes. If the middle of the nose appears below the earlobes, the vertical angle is too high.

Cameras need to be mounted low enough or far away enough so that the vertical angle or slope does not exceed 20% above the eye level when subjects are in focus in the foreground. Given an average eye height of 5 feet, a camera 10 feet away can not be mounted higher than 20% of 10 feet (2 feet) above the eye height of 5 feet - so not higher than 5 + 2 = 7 feet. A camera 20 feet away can be mounted as high as 9 feet (20% of 20 = 4 feet above 5 feet, 4 + 5 = 9). See “Determining Camera Mounting Height” later in this section for more details.

Lighting Level

Facial analysis requires even levels of lighting that clearly shows the detail in a face. Facial analysis requires lighting conditions that do not produce shadows and/or dark areas in the face (underexposure) and lighting conditions that do not produce glare and/or washed-out areas in the face (overexposure). A face with lots of detail visible and a wide range of dark and light pixels (referred to as “dynamic range”) is required for facial analysis.

Photo A –Good Lighting

Image

There are no areas of the face in shadow or glare. There is wide dynamic range - lots of both light and dark areas within the face, and lots of detail is visible.

Photo B –Overexposed

Image

There are no areas of shadow, but there are significant areas with glare (notice the cheeks, nose and
forehead). There is a narrower dynamic range - excessive washed-out areas with loss of detail.

Photo C –Marginally acceptable

Image

There are some areas of the face in mild shadow and the face appears somewhat darker than desired. There is marginally acceptable dynamic range - moderate amounts of both light and dark areas within the face, and moderate amounts of detail are visible.

Photo D – Underexposed

Image

There are substantial areas of shadow and/or not enough light. There is a narrower dynamic range -
excessive dark areas with loss of detail.

Additional Considerations for Megapixel Cameras

Field of View

With the addition of megapixel cameras to your security solution, you can now utilize the higher resolutions available to ultimately provide a wider field of view. In essence, this allows the use of less cameras and more coverage while still capturing face profiles. It is extremely important to understand that the same principles still apply for facial recognition; this includes pixels in between the eyes, horizontal and vertical angles, and lighting.

he number of horizontal pixels is still the key factor in terms of performing 3VR facial surveillance; the advantage can be noted in the table below.

Resolution

Megapixels

Width for Face (Feet)

1024 X 768

0.7

6.5

1280 X 1024

1.3

8.0

1600 X 1200

2

10.5

2048 X 1536

3

13.5

The appearance of the field of view is obviously different with a width of 8.5’.

Pixels Between the Eyes

The same principles that apply to lower resolution cameras apply to megapixel cameras; 3VR still requires 35 pixels between the eyes. However, analysis is conducted on the full megapixel frame of the camera’s output. For example, with a resolution of 1280 x 1024, 3VR conducts facial analysis at 1280 x 1024 versus an analog camera at 4CIF (704 x 576).

Image Use and Optimization for Facial Analysis

Overview

3VR can analyze faces from two primary sources:

  1. Digital images such as photographs and mugshots can be imported

  2. CCTV cameras can be connected to a 3VR and video can be continuously analyzed

Best practices for each source differ significantly. Both sources are described and in Imported Images section below.

Imported Images

Any image that is in a supported digital format may be imported into the 3VR system. Supported formats include: BMP, JPEG and GIF. Only faces in images that meet the requirements enumerated in the previous two sections (pixels between the eyes, angles, lighting) may be successfully analyzed by the 3VR system.

Passport Photos/Mugshots

Generally, passport photos and mugshots provide high quality images for facial analysis. The angles and the lighting on these photos are generally within guidelines. Users should check that the pixels between the eyes are sufficient. 3VR uses an import photo process that provides immediate feedback if the number of pixels between the eyes are insufficient.

Images from DVRs

Images from DVRs may not provide sufficient quality for facial analysis. Users should qualify and set appropriate expectations. Video from DVRs are often recorded at resolutions lower than 4CIF such as 2CIF or CIF. Because of the reduction of resolution, the field of view needed for sufficient detail increase significantly. For example, images at CIF resolution would need to have the face in focus in a field of view no wider than 2.25 feet.

Furthermore, the field of view of cameras on traditional DVRs is often 7’ to 12’ which is significantly greater than the 4.5 feet maximum field of view even if the video was recorded at 4CIF.

General Photos from Digital Cameras

Images from digital cameras may provide sufficient quality for facial analysis. To the extent that these photos exhibit the same properties as passport photos, suitability will be more likely.

Photos of people in a group or performing an activity may not suitable if the person’s head is significantly tilted horizontally or vertically. This is quite common and will affect performance.

Digital cameras provide megapixel images. Given the very high resolution of these images (compared to analog photos), the width of the field of view may be very wide (as much as 20’ wide). Users should be careful to verify that the image has not been reduced in resolution. This is a common technique to reduce the file size of an image. However, if this is done it will significantly reduce proportionally the number of pixels between the eyes in faces in the image.

CCTV Video

Many existing camera positions are designed for alarm assessment and activity monitoring. Those cameras must be adapted for use in facial analysis. Existing infrastructure can and may be used for facial analysis but it must be optimized for use in meeting the imaging guidelines enumerated in this document.

The remainder of this section is a lengthy but important treatment of design and deployment options for CCTV video use.

Environment

First, examine the physical layout of the facility to identify natural “choke points” – areas where subjects will appear within a limited field of view and are most likely to look “straight-on” toward the camera(s). Ideal choke points include entrances and hallways. This area should be free from any obstructions that might come between the camera and subject, (including transparent barriers that can create glare/reflection problems). Also, these areas should be devoid of distractions that may entice subject to look away from the camera while passing through. Ideally, subjects will spend 3 seconds passing through a choke-point area.

The recommended width of a single-camera choke point is 4 feet, with a maximum width of 4½ feet. Areas wider than 4½ feet require multiple cameras positioned so the lines of sight slightly overlap. The degree of overlap should enable each camera to capture the full face of a subject who passes halfway between them, instead of each camera capturing only half the subject’s face.

Scene lighting must be sufficient to produce a clear, sharp image. Excessive background lighting, blooming or shadowing conditions must be avoided.

Camera Positioning

Cameras should be centered to increase the opportunities to obtain straight-on (perpendicular) face images. In addition, the distance from the subject should be the greatest allowable (subject to lens specifications that provide a proper Field Of View - less than 4.5 feet, as explained in Section 3). The further the distance between camera and subject, the longer the Depth of Field – the area within which the subject will remain in focus – thus delivering more usable ‘face frames’.

Optimal full-facial recognition occurs when a camera is mounted on a vertical line that is level with (and horizontally perpendicular to), the subject’s face. Harsh camera angles are detrimental, and will seriously degrade the results.

While cameras are often mounted higher than the optimal face-level height, adequate Face Capture can occur provided the maximum Vertical Angle of Incidence is less than a 20% slope, (see illustration below). The greater the distance between camera and subject, the higher the camera can be mounted, while maintaining this threshold.

Determining Camera Mounting Height - No more than 20% slope to eyes in the face

The first step is to determine an average “face-height”. This value is application-specific, (ie: an environment with young children will require a lower value). For this example, we will assume an average face height of 5 feet.

The second step is to measure the “subject distance” (from lens to subject), and mounting height for the camera. In order to maintain a vertical slope of 20%, the camera can be mounted approximately .2 feet above face height for every foot of subject distance as shown below.

Simple Formula: (D x .2 ) + E = H

Multiply the subject distance (D) value by .2 (for 20% slope). Add the result to the eye height (E). This equals the maximum mounting height (H).

Example: assuming a 5’ eye height and 15 foot subject distance:

  • 15’ (subject distance) x .2 (slope) = 3’

  • 3’ + 5’ (eye height) = 8’ mounting height

Remember, a 20% slope is a maximum number – a decreased slope will provide better results, (and mounting the camera with no slope, exactly at the average face height will provide the best results).

Example: A potential entry point offers 2 appropriate mounting positions, one at 12 feet away, the other at 24 feet away. In this case, the furthest mounting location is most appropriate.

Camera/Lens Specifications

Cameras designated for face-based recording must comply with a wide range of specifications, primary among them is the use of high-resolution cameras, (480 TVL or greater). The camera and lens must be appropriate to the scene lighting conditions. For example, black & white low-lux or day/night camera for poorly lit areas, and super-dynamic cameras where severe back-lighting can occur.

An appropriate 4’ wide field of view, (FoV), is the result of the focal length of the lens in relation to the object distance from the camera. While an appropriate “fixed focal length” lens can be utilized, a VariFocal Lens is strongly recommended because the selectable focal length range enables a flexible field-adjustment (fine-tuning) capability.

VariFocal Lenses are available with manual or automatic iris features. The automatic iris format is used in applications where lighting conditions may vary, for example, areas exposed to daytime sunlight.

Lens Selection

Typical VariFocal Lenses fall within the following approximate ranges; 3 to 8 millimeters (mm), 3 to 12mm, 5 to 50mm, and 20 to 100mm. The most common and least expensive are the 3 to 8mm versions, although most Face Capture applications will require the larger size VariFocal Lenses.

Guidelines for Determining Lens Focal Length

A 4’ wide FoV requires 1.2mm per foot of distance from subject; a 4½’ wide FoV requires 1.1mm per foot.

The estimates on the previous page assume the use of 1/3” CCD (the most common size) camera format. Results will vary in applications using different size formats such as ½” or ¼”. Lens calculation tools are available from most camera manufacturers, (online and pocket ‘slide-rule’ versions).

With megapixel cameras, deployment no longer requires such a large telephoto range when comparing to analog and non-megapixel IP cameras. Evidently, this will depend on the particular application. Please refer to the table in the Field of View section for determining the maximum width for each application.

Camera Face Requirement Chart in 3VR

Result

Quality Image

With better understanding on optimization of image quality you can now determine:

  • Better qualify what opportunities are strong fits for 3VR

  • Set the appropriate level of expectations with partners and users

  • Design a system that accommodates real world conditions

  • Deliver a solution that is optimally effective

  • Deeper understanding of the significant advantages of utilizing megapixel cameras

Remember the following principles as you install cameras for face recognition:

  1. Any image, whether video or photo, requires sufficient detail. Detail is determined by (1) the level of resolution in the image and (2) the size of the person’s face relative to the size of the image.

  2. The person’s face must look directly towards the camera. This affects both the class of imported photos that are acceptable and how video cameras must be positioned to capture faces.

  3. Imported images have to meet the requirements listed above. With the exception of mugshots and passport photos, most photos do not meet these requirements.

  4. Video cameras must be positioned specifically to capture faces. The cost may be inexpensive but the skill is not trivial. Care must be taken to precisely position cameras to capture faces consistently.

  5. Megapixel cameras in your security solution allows a significantly wider field of view but also require more storage and have a negative impact on the system performance if a proper equilibrium is not determined.

  6. It should be understood that determining the maximum mounting height is not affected by the increase in resolution of megapixel cameras. The same formula must be used to stay within the 20% slope specification.

  • No labels