Many people with visual impairment would like to take photographs. However, they often have difficulty pointing the camera at the target. In this paper, we address this problem by proposing a novel photo-taking system called VisPhoto. Unlike conventional methods, VisPhoto generates a photograph in post-production. When the shutter button is pressed, VisPhoto captures an omnidirectional camera image that contains the surrounding scene of the camera. In post-production, the system outputs a cropped region as a ``photograph'' that satisfies the user's preference.
The procedure that generates a photograph is divided into two stages: (1) capture and (2) post-production. In (1), the user captures an omnidirectional camera image by pressing the shutter button. Unlike standard photography, the process is not complete until the photograph is cropped according to the user's preference in (2). In (2), we create two types of interfaces to enable users to generate a photograph: a manual method in which the user manually performs post-production on web pages and an automatic post-production method in which speech recognition allows the user to skip some manual operations.
Photo-taking session 1: targets within reach
Photo-taking session 2: targets out of reach
Photo-taking session 3: multiple targets
Photo-taking session 4: moving targets
Caused by speech recognition failure
Caused by object detection failure
* Bonding boxes represent object detection results
Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2020. VisPhoto: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image. In Extended Abstracts of the 2020 ACM CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3334480.3382983
@InProceedings{Iwamura_CHI2020_LBW,
author = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minatani and Koichi Kise},
booktitle = {Extended Abstracts of 2020 ACM CHI Conference on Human Factors in Computing Systems},
title = {{VisPhoto}: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image},
doi = {10.1145/3334480.3382983},
year = {2020},
month = apr,
}
Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2021. Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production. J104-D, 8, 663–677. In Japanese. A 2021 IEICE Best Paper Award Winner. https://doi.org/10.14923/transinfj.2020JDP7069 (PDF is available at the webpage of our lab.)
@Article{Iwamura_IEICE2021ja,
author = {Masakazu Iwamura and Naoki Hirabayashi and Zheng Cheng and Kazunori Minataniand and Koichi Kise},
journaltitle = {IEICE Transactions on Information and Systems (Japanese Edition)},
title = {Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production},
doi = {10.14923/transinfj.2020JDP7069},
volume = {J104-D},
number = {8},
pages = {663--677},
year = 2021,
month. = aug,
language = {Japanese},
}
Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2023. VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. Best Paper Award Winner at ASSETS 2023. https://doi.org/10.1145/3597638.3608422
@InProceedings{Hirabayashi_ASSETS2023,
author = {Naoki Hirabayashi and Masakazu Iwamura and Zheng Cheng and Kazunori Minatani and Koichi Kise},
booktitle = {Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility},
title = {{VisPhoto}: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging},
doi = {10.1145/3597638.3608422},
year = {2023},
month = oct,
}
Proposed method | Experiment | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Object detector used | With voice interface | With view finding network | Who photographed | Who evaluated | Targets to photograph | Compared with | Evaluation | With statistical test | |||
1. CHI2020 LBW | YOLO | No | Yes | N/A | |||||||
2. IEICE2021 | YOLO | No | Yes | 8 blind people | N/A | No indication |
|
No | |||
10 sighted people | 10 sighted people | No indication |
|
|
No | ||||||
Google Cloud API | No | Yes | 10 people (7 blind & 3 low vision) | An author | 10 targets |
|
No | ||||
3. ASSETS2023 | Google Cloud API | Yes | No | 24 people (15 blind & 9 low vision) | 20 sighted people | 15 targets in 4 categories |
|
|
Yes | ||
(Only for demonstration purpose) | |||||||||||
YOLO | No | Yes | 8 blind people | N/A | No indication | No |
This is the source code of Theta Plug-in, which is used in Step 1 of the proposed system.
This is the source code of VisPhoto Web App, which is used in Steps 2-4 of the proposed system.
This is not a part of VisPhoto but is a rival method. The app is available in App Store.