VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging

Naoki Hirabayashi1, Masakazu Iwamura2, Zheng Cheng1, Kazunori Minatani3, Koichi Kise2
1Osaka Prefecture University
2Osaka Metropolitan University
3National Center for University Entrance Examinations

Video Presentations



Abstract

Many people with visual impairment would like to take photographs. However, they often have difficulty pointing the camera at the target. In this paper, we address this problem by proposing a novel photo-taking system called VisPhoto. Unlike conventional methods, VisPhoto generates a photograph in post-production. When the shutter button is pressed, VisPhoto captures an omnidirectional camera image that contains the surrounding scene of the camera. In post-production, the system outputs a cropped region as a ``photograph'' that satisfies the user's preference.

Proposed system

Overview

The procedure that generates a photograph is divided into two stages: (1) capture and (2) post-production. In (1), the user captures an omnidirectional camera image by pressing the shutter button. Unlike standard photography, the process is not complete until the photograph is cropped according to the user's preference in (2). In (2), we create two types of interfaces to enable users to generate a photograph: a manual method in which the user manually performs post-production on web pages and an automatic post-production method in which speech recognition allows the user to skip some manual operations.

Screen shots of post-production

Screen shot of image selection
Image selection (Step 2). The user selects one image from all the uploaded images using the VisPhoto web interface, which can be accessed by a device familiar to the user.
Screen shot of object selection
Object selection (Step 3). From the objects detected in the selected image, the user marks those that should be included.
Screen shot of downloading the generated photograph
Downloading the generated photograph (Step 4). The list of objects is generated by applying object recognition to the generated photograph. The pet bottle between the banana and orange is recognized as packaged goods.

Successful examples

Successful example of apple (target within reach)
Apple
Successful example of ball (target within reach)
Ball
Successful example of bottle (target within reach)
Bottle
Successful example of cup (target within reach)
Cup
Successful example of monitor (target within reach)
Monitor
Successful example of teddy bear (target within reach)
Teddy bear

Photo-taking session 1: targets within reach

Successful example of bicycle (target out of reach)
Bicycle
Successful example of bottle (target out of reach)
Bottle
Successful example of clock (target out of reach)
Clock
Successful example of monitor (target out of reach)
Monitor

Photo-taking session 2: targets out of reach

Successful example of banana and orange (multiple targets)
Banana and orange
Successful example of keyboard and mouse (multiple targets)
Keyboard and mouse

Photo-taking session 3: multiple targets

Successful example of dog (moving target)
Dog
Successful example of giraffe (moving target)
Giraffe
Successful example of zebra (moving target)
Zebra

Photo-taking session 4: moving targets

Failure examples

Failure example of apple (target within reach)
Apple (target within reach)

Caused by speech recognition failure

Failure example of monitor (target within reach)
Monitor (target within reach)
Failure example of Keyboard and mouse (multiple targets)
Keyboard and mouse (multiple targets)
Failure example of giraffe (moving target)
Giraffe (moving target)

Caused by object detection failure

Omnidirectional camera images photographed
by blind participants in the real environment

Omnidirectional camera image photographed by 26 years old female
Photographed by a 26-year-old female
Omnidirectional camera image photographed by 47 years old male
Photographed by a 47-year-old male
Omnidirectional camera image photographed by 29 years old male
Photographed by a 29-year-old male
Omnidirectional camera image photographed by 47 years old male
Photographed by a 47-year-old male
Omnidirectional camera image photographed by 23 years old female
Photographed by a 23-year-old female
Omnidirectional camera image photographed by 29 years old male
Photographed by a 29-year-old male

* Bonding boxes represent object detection results

Publications

1. CHI 2020 Late-Breaking Work

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2020. VisPhoto: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image. In Extended Abstracts of the 2020 ACM CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3334480.3382983

@InProceedings{Iwamura_CHI2020_LBW,
  author    = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minatani and Koichi Kise},
  booktitle = {Extended Abstracts of 2020 ACM CHI Conference on Human Factors in Computing Systems},
  title     = {{VisPhoto}: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image},
  doi       = {10.1145/3334480.3382983},
  year      = {2020},
  month     = apr,
}

2. IEICE Transactions on Information and Systems (in Japanese)

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2021. Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production. J104-D, 8, 663–677. In Japanese. A 2021 IEICE Best Paper Award Winner. https://doi.org/10.14923/transinfj.2020JDP7069 (PDF is available at the webpage of our lab.)

@Article{Iwamura_IEICE2021ja,
  author       = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minataniand and Koichi Kise},
  journaltitle = {IEICE Transactions on Information and Systems (Japanese Edition)},
  title        = {Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production},
  doi          = {10.14923/transinfj.2020JDP7069},
  volume       = {J104-D},
  number       = {8},
  pages        = {663--677},
  year         = 2021,
  month.       = aug,
  language     = {Japanese},
}

3. Proc. ASSETS 2023

Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2023. VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. Best Paper Award Winner at ASSETS 2023. https://doi.org/10.1145/3597638.3608422

@InProceedings{Hirabayashi_ASSETS2023,
  author       = {Naoki Hirabayashi and Masakazu Iwamura and Zheng Cheng and Kazunori Minatani and Koichi Kise},
  booktitle    = {Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility},
  title        = {{VisPhoto}: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging},
  doi          = {10.1145/3597638.3608422},
  year         = {2023},
  month        = oct,
}

Comparison of publications

← Horizontally scrollable →

Proposed method Experiment
Object detector used With voice interface With view finding network Who photographed Who evaluated Targets to photograph Compared with Evaluation With statistical test
1. CHI2020 LBW YOLO No Yes N/A
2. IEICE2021 YOLO No Yes 8 blind people N/A No indication
  • Interview
No
10 sighted people 10 sighted people No indication
  • iPhone camera app
  • Quality evaluation
No
Google Cloud API No Yes 10 people (7 blind & 3 low vision) An author 10 targets
  • Photography success rate
No
3. ASSETS2023 Google Cloud API Yes No 24 people (15 blind & 9 low vision) 20 sighted people 15 targets in 4 categories
  • tfCam
  • iPhone camera app
  • Interview
  • Average times to photograph
  • Target quiz
  • Quality evaluation
Yes
(Only for demonstration purpose)
YOLO No Yes 8 blind people N/A No indication No

Code

1. Theta Plug-in

This is the source code of Theta Plug-in, which is used in Step 1 of the proposed system.

2. VisPhoto Web App

This is the source code of VisPhoto Web App, which is used in Steps 2-4 of the proposed system.

3. tensorflow-camera (tfCam)

This is not a part of VisPhoto but is a rival method. The app is available in App Store.