In recent years, the number of high buildings in the city is increasing, and accidents at the edge of high buildings occur frequently. Timely and effective detection and recognization of high-rise dangerous actions can protect people's lives. However, the detection and recognization of high-rise actions is rarely studied because of the following reasons: (1) The background is complex and changeable (e.g. illumination and weather variation); (2) The human body targets at the edge of high-rise buildings in the surveillance video are small, and the human body is partially occluded; (3) There is no benchmark dataset (the most critical factor) specialized for spatio-temporal detection for high-rise human action. To address these issues, we construct a benchmark video dataset termed as STD-HA, the first one for spatio-temporal detection for high-rise human action. Our dataset is diverse, which contains 9 action categories from multiple scenes with four weather conditions.
Download link for dataset: [ Google Drive]