VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging

Naoki Hirabayashi¹, Masakazu Iwamura², Zheng Cheng¹, Kazunori Minatani³, Koichi Kise²

¹Osaka Prefecture University
²Osaka Metropolitan University
³National Center for University Entrance Examinations

Best Paper Award Winner at ASSETS 2023

Paper Code tf-cam App (rival method)

Video Presentations

Abstract

Many people with visual impairment would like to take photographs. However, they often have difficulty pointing the camera at the target. In this paper, we address this problem by proposing a novel photo-taking system called VisPhoto. Unlike conventional methods, VisPhoto generates a photograph in post-production. When the shutter button is pressed, VisPhoto captures an omnidirectional camera image that contains the surrounding scene of the camera. In post-production, the system outputs a cropped region as a ``photograph'' that satisfies the user's preference.

Proposed system

The procedure that generates a photograph is divided into two stages: (1) capture and (2) post-production. In (1), the user captures an omnidirectional camera image by pressing the shutter button. Unlike standard photography, the process is not complete until the photograph is cropped according to the user's preference in (2). In (2), we create two types of interfaces to enable users to generate a photograph: a manual method in which the user manually performs post-production on web pages and an automatic post-production method in which speech recognition allows the user to skip some manual operations.

Screen shots of post-production

Image selection (Step 2). The user selects one image from all the uploaded images using the VisPhoto web interface, which can be accessed by a device familiar to the user.

Object selection (Step 3). From the objects detected in the selected image, the user marks those that should be included.

Screen shot of downloading the generated photograph

Downloading the generated photograph (Step 4). The list of objects is generated by applying object recognition to the generated photograph. The pet bottle between the banana and orange is recognized as packaged goods.

Successful examples

Apple

Ball

Bottle

Successful example of cup (target within reach)

Cup

Monitor

Teddy bear

Photo-taking session 1: targets within reach

Bicycle

Bottle

Clock

Monitor

Photo-taking session 2: targets out of reach

Banana and orange

Keyboard and mouse

Photo-taking session 3: multiple targets

Successful example of dog (moving target)

Dog

Giraffe

Zebra

Photo-taking session 4: moving targets

Failure examples

Apple (target within reach)

Caused by speech recognition failure

Monitor (target within reach)

Keyboard and mouse (multiple targets)

Giraffe (moving target)

Caused by object detection failure

Omnidirectional camera images photographed
by blind participants in the real environment

Omnidirectional camera image photographed by 26 years old female

Photographed by a 26-year-old female

Omnidirectional camera image photographed by 47 years old male

Photographed by a 47-year-old male

Omnidirectional camera image photographed by 29 years old male

Photographed by a 29-year-old male

Photographed by a 47-year-old male

Omnidirectional camera image photographed by 23 years old female

Photographed by a 23-year-old female

Photographed by a 29-year-old male

* Bonding boxes represent object detection results

Publications

1. CHI 2020 Late-Breaking Work

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2020. VisPhoto: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image. In Extended Abstracts of the 2020 ACM CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3334480.3382983

@InProceedings{Iwamura_CHI2020_LBW,
  author    = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minatani and Koichi Kise},
  booktitle = {Extended Abstracts of 2020 ACM CHI Conference on Human Factors in Computing Systems},
  title     = {{VisPhoto}: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image},
  doi       = {10.1145/3334480.3382983},
  year      = {2020},
  month     = apr,
}

2. IEICE Transactions on Information and Systems (in Japanese)

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2021. Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production. J104-D, 8, 663–677. In Japanese. A 2021 IEICE Best Paper Award Winner. https://doi.org/10.14923/transinfj.2020JDP7069 (PDF is available at the webpage of our lab.)

@Article{Iwamura_IEICE2021ja,
  author       = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minataniand and Koichi Kise},
  journaltitle = {IEICE Transactions on Information and Systems (Japanese Edition)},
  title        = {Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production},
  doi          = {10.14923/transinfj.2020JDP7069},
  volume       = {J104-D},
  number       = {8},
  pages        = {663--677},
  year         = 2021,
  month.       = aug,
  language     = {Japanese},
}

3. Proc. ASSETS 2023

Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2023. VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. Best Paper Award Winner at ASSETS 2023. https://doi.org/10.1145/3597638.3608422

@InProceedings{Hirabayashi_ASSETS2023,
  author       = {Naoki Hirabayashi and Masakazu Iwamura and Zheng Cheng and Kazunori Minatani and Koichi Kise},
  booktitle    = {Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility},
  title        = {{VisPhoto}: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging},
  doi          = {10.1145/3597638.3608422},
  year         = {2023},
  month        = oct,
}

Comparison of publications

← Horizontally scrollable →

	Proposed method			Experiment
	Object detector used	With voice interface	With view finding network	Who photographed	Who evaluated	Targets to photograph	Compared with	Evaluation	With statistical test
1. CHI2020 LBW	YOLO	No	Yes	N/A
2. IEICE2021	YOLO	No	Yes	8 blind people	N/A	No indication		Interview	No
	YOLO	No	Yes	10 sighted people	10 sighted people	No indication	iPhone camera app	Quality evaluation	No
	Google Cloud API	No	Yes	10 people (7 blind & 3 low vision)	An author	10 targets		Photography success rate	No
3. ASSETS2023	Google Cloud API	Yes	No	24 people (15 blind & 9 low vision)	20 sighted people	15 targets in 4 categories	tfCam iPhone camera app	Interview Average times to photograph Target quiz Quality evaluation	Yes
	(Only for demonstration purpose)
	YOLO	No	Yes	8 blind people	N/A	No indication			No

VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging

Video Presentations

Abstract

Proposed system

Screen shots of post-production

Successful examples

Failure examples

Omnidirectional camera images photographed
by blind participants in the real environment

Publications

1. CHI 2020 Late-Breaking Work

2. IEICE Transactions on Information and Systems (in Japanese)

3. Proc. ASSETS 2023

Comparison of publications

← Horizontally scrollable →

Code

1. Theta Plug-in

2. VisPhoto Web App

3. tensorflow-camera (tfCam)

VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging

Video Presentations

Abstract

Proposed system

Screen shots of post-production

Successful examples

Failure examples

Omnidirectional camera images photographedby blind participants in the real environment

Publications

1. CHI 2020 Late-Breaking Work

2. IEICE Transactions on Information and Systems (in Japanese)

3. Proc. ASSETS 2023

Comparison of publications

← Horizontally scrollable →

Code

1. Theta Plug-in

2. VisPhoto Web App

3. tensorflow-camera (tfCam)

Omnidirectional camera images photographed
by blind participants in the real environment