A Scalable Deep Neural Network Architecture for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi Fingerprinting

One of the key technologies for future large-scale location-aware services covering a complex of multi-story buildings --- e.g., a big shopping mall and a university campus --- is a scalable indoor localization technique. In this paper, we report the current status of our investigation on the use of deep neural networks (DNNs) for scalable building/floor classification and floor-level position estimation based on Wi-Fi fingerprinting. Exploiting the hierarchical nature of the building/floor estimation and floor-level coordinates estimation of a location, we propose a new DNN architecture consisting of a stacked autoencoder for the reduction of feature space dimension and a feed-forward classifier for multi-label classification of building/floor/location, on which the multi-building and multi-floor indoor localization system based on Wi-Fi fingerprinting is built. Experimental results for the performance of building/floor estimation and floor-level coordinates estimation of a given location demonstrate the feasibility of the proposed DNN-based indoor localization system, which can provide near state-of-the-art performance using a single DNN, for the implementation with lower complexity and energy consumption at mobile devices.


Introduction
Location fingerprinting using received signal strengths (RSSs) from wireless network infrastructure is one of the most popular and promising technologies for localization in an indoor environment, where there is no line-of-sight signal from the global positioning system (GPS) available [1]: For example, a vector of pairs of a service set identifier (SSID) and an RSS for a Wi-Fi access point (AP) measured at a location can be its location fingerprint. A location of a user/device then can be estimated by finding the closest match between its RSS measurement and the fingerprints of known locations in a database [2]. Note that the location fingerprinting technique does not require the installation of any new infrastructure or the modification of existing devices, but it is just based on the existing wireless infrastructure, which is its major advantage over alternative techniques.
When the indoor localization is to cover a large building complex -e.g., a big shopping mall or a university campus -where there are lots of multi-story buildings under the same management, the scalability of fingerprinting techniques becomes an important issue. The current state-of-the-art Wi-Fi fingerprinting techniques assume a hierarchical approach to the indoor localization, where the building, floor, and position (e.g., a label or coordinates) of a location are estimated in a hierarchical and sequential way using a different algorithm tailored for each task. In [3], for instance, building estimation is done as follows: Given the AP with the strongest RSS in a measured fingerprint, we first build a subset of fingerprints where the same AP has the strongest RSS; then, we count the number of fingerprints associated to each building and set the estimated building to be the most frequent one from the counting. Similar procedures are also proposed to estimate a floor inside the building. For the estimation of the coordinates of the location, we first build a subset of fingerprints belonging to the building and the floor estimated from the previous procedures. Then, take multiple fingerprints from the subset most similar to the measured one, and compute the centroid of the coordinates of the selected fingerprints as the estimated coordinates of the given location. According to the results in [3], the best building and floor hit rates achieved for the UJIIndoorLoc dataset [4] are 100% and 94%, respectively, and the mean error in coordinates estimation is 6.20 m. [1] One of the major challenges in Wi-Fi fingerprinting is how to deal with the random fluctuation of a signal, the noise from multi-path effects, and the device & position dependency in RSS measurements. Unlike traditional solutions relying on complex filtering and time-consuming parameter tuning specific to given conditions, the popular deep neural networks (DNNs) can provide attractive solutions to Wi-Fi fingerprinting due to their less parameter tuning and adaptability to a wider range of conditions with standard architectures and training algorithms [5,6,7]: In [5], a four-layer DNN generates a coarse positioning estimate, which, in turn, is refined to produce a final position estimate by a hidden Markov model (HMM)based fine localizer. The performance of the proposed indoor localization system is evaluated in both indoor and outdoor environments which are divided into hundreds of square grids. In [6], the authors investigate the application of deep belief networks (DBNs) with two different types of Restricted Boltzmann Machines for indoor localization and evaluate the performance of their approaches using data from simulation in heterogeneous mobile radio networks using ray tracing techniques. In both cases, the authors focus only on the localization in a single plane and do not consider the hierarchical nature of multi-building and multi-floor indoor localization. In [7], on the other hand, a DNN consisting of a stacked autoencoder (SAE) and a feedforward multi-class classifier is used for building/floor classification. This work, too, does not take into account the hierarchical nature of building/floor classification, because the classification is done over flattened, one-dimensional labels of combined building and floor identifiers. Also, the floor-level location estimation is not considered at all. In this regard, to the best of our knowledge, the work presented in this paper is the first to apply DNNs for multi-building and multi-floor indoor localization, exploiting its hierarchical nature in classification.
In this paper, we report the current status of our investigation on the use of DNNs for scalable building/floor classification and floor-level location estimation. We propose a new DNN architecture consisting of an SAE for the reduction of feature space dimension and a feed-forward multi-label classifier [8,9] for a multi-building and multi-floor indoor localization system based on Wi-Fi fingerprinting and evaluate its performance using the UJIIndoorLoc dataset [4].
The rest of the paper is organized as follows: In Sec. 2, we describe the problem of indoor localization in a large building complex with its challenges resulting from the existence of multi-story buildings and propose a scalable DNN architecture for the multibuilding and multi-floor indoor localization. Sec. 3 provides and discusses experimental results for the performance of the proposed scalable DNN-based multibuilding and multi-floor indoor localization system. Sec. 4 concludes our work in this paper and suggests areas of further research.

A Scalable DNN Architecture for Multi-Building and Multi-Floor Indoor Localization
Location awareness is one of enabling technologies for future smart and green cities; understanding where people spend their times and how they interact with environments is critical to realizing the vision of smart and green cities [10]. One of the key technologies for future large-scale location-aware services covering a complex of multi-story buildings -e.g., a big shopping mall and a university campus -is a scalable indoor localization technique. Regarding the scalability of the indoor localization, consider the evolution of the Xi'an Jiaotong-Liverpool University (XJTLU) campus in Suzhou, China, where the authors are currently working: As shown in Figure 1 (a), the XJTLU started with just one building in 2006. As of this writing, the XJTLU has two campuses, which are shown in Figure 1 (b), and the number of buildings over two campuses has increased to around 20; this number is still increasing as more buildings and sports facilities are being constructed. Considering all the floors within each building and the locations on each floor, the total number of distinct locations (e.g., offices, lecture rooms, and labs) is already on the order of thousands. If we adopt a grid-based representation of the localization area as in [5], the total number of locations would be even greater. The indoor localization system to cover such a large building complex, therefore, must be scalable. Figure 2 shows two alternative system architectures for large-scale DNN-based multi-building and multifloor indoor localization. In the hierarchical archi-tecture shown in Figure 2 (a), the task of building/floor/location classification is separated into multiple sub-tasks dedicated to the classification at each level of building, floor, and location. This architecture directly corresponds to the state-of-the-art hierarchical Wi-Fi fingerprinting methods (e.g., [3]), where DNNs replace traditional techniques for building, floor, and location estimation. Compared to the methods based on traditional techniques, a major disadvantage in this hierarchical DNN architecture is that the DNNs in the floor and the location levels of the system need to be trained separately with multiple sub-datasets derived from a common dataset (i.e., building-specific datasets for DNNs for floor estimation and buildingfloor-specific datasets for DNNs for location estimation), which poses significant challenges on the management of location fingerprint databases as well as the training of possibly a large number of DNNs. In this paper, therefore, we focus on the integrated architecture shown in Figure 2 (b) where a single DNN handles the classification of building, floor, and location in an integrated way with a common dataset. Figure 3 shows a DNN architecture for the combined estimation of building, floor, and location based on multi-class classification with flattened labels, which is a straightforward extension of the DNN system for building/floor classification proposed in [7]; after training with RSSs as both input and output data as shown in Figure 3 (a), only the gray-colored nodes are used as an encoder for feature space dimension reduction as shown in Figure 3 (b). This DNN architecture based on multi-class classification with flattened labels, however, has the scalability issue that the number of output nodes is equal to the number of locations over the building complex: In case of the UJIIndoorLoc dataset, the number of distinct locations (i.e., also called reference points in [4]) over 3 buildings with 4 or 5 floors is 933. It also does not take into account the hierarchical nature of the building/floor/location classification problem due to its calculating the loss and the accuracy over flattened building/floor/location labels [2] ; the misclassification of building, floor, or location has equal loss during the training phase. To reflect the hierarchical nature of the building/floor/location classification in a DNN classifier, one can use a hierarchical loss function -e.g., a loss function with different weights for building, floor, and location -with the existing multi-class classifier and flattened labels. Because the hierarchical loss function for flattened labels [2] For example, we can form a flattened label "Bi-Fi,j-L i,j,k " by combining a building, a floor, and a location label, where Bi, Fi,j, and L i,j,k denote the ith building, jth floor of the building, and kth location on the floor, respectively.
is quite complicated and does not provide a closedform gradient function, however, training the DNN with the usual backpropagation procedure could be challenging.
To address the scalability issue of the DNN classifier based on multi-class classification and take into account the hierarchical nature of the building/floor/location classification, we propose a scalable DNN architecture based on multi-label classification [3] shown in Figure 4. The building/floor/location classification with the proposed architecture is done as follows: First, building, floor, and location identifiers are mapped to sequential numbers, the latter two of which are meaningful only in combination with higherlevel numbers; those numbers are one-hot encoded independently and combined together into a vector as a categorical variable for multi-label classification as illustrated in Table. 1. Then, the output vector from the multi-label classifier is split into a building, a floor, and a location vector by indexes as shown in Figure 4. Finally, we estimate the building and the floor of a location as the index of a maximum value of the corresponding vector through the arg max function. For the estimation of the location coordinates, we select κ largest elements from the location vector (i.e., L= L 1 , . . . , L max(...,N L (i.j)) in Figure 4), filter out the elements whose values are less than σ× max(L) (σ∈[0, 1]), and calculate the estimated coordinates of the location as either the normal or weighted (with the values of the elements as weights) centroid of the remaining elements as described in detail in Figure 5.
Note that there are two design parameters -i.e., κ and σ -in the location coordinates estimation procedure, the rationale of which is illustrated in Figure 6: If we use only κ as a design parameter as in [3] and sets its value to 5 in Figure 6 (a), we can include the reference points quite close to the new location (i.e., those inside the dotted circle) in the estimation procedure and can generate good estimation. In Figure 6 (b), however, the same value of κ could result in poor estimation because the reference points 4 and 5 have to be considered during the estimation. With κ=3, on the other hand, we can expect good estimation with Figure 6 (b) but not with Figure 6 (a) this time. If we can use both κ and σ as design parameters, however, we can include good reference points by properly setting σ for a threshold value. The actual effects of these design parameters on the location coordinates estimation are investigated in Sec. 3. [3] In multi-class classification (also called single-label classification), an instance is associated with only a single label from a set of disjoint labels; in multi-label classification, on the other hand, an instance can be associated with multiple labels [8].
As for the scalability of the proposed DNN architecture, the number of output nodes becomes much smaller than that of the DNN architecture based on multi-class classification: The number of output nodes for multi-label building/floor/location classification is given by According to (1), the number of output nodes of the proposed DNN architecture for the publicly available UJIIndoorLoc dataset at the University of California, Irvine (UCI), Machine Learning Repository [4] is given by 118 (i.e., the sum of the number of buildings (3), the maximum of the numbers of floors of the buildings (5), and the maximum of the numbers of locations [5] on the floors (110)), which is smaller than the number of output nodes of the DNN architecture based on multi-class classification (i.e., 905 [6] ). Note that the difference could be much larger if the UJIIndoorLoc dataset covers all the buildings on the Jaume I University (UJI) campus where the data were collected. Also, due to the clear mapping between building, floor, and location identifiers and its corresponding one-hot-encoded categorical variable for the DNNbased multi-label classifier, it is easy to carry out different processing for parts of DNN outputs specifically for building, floor, and location as illustrated in [4] https://archive.ics.uci.edu/ml/datasets/ujiindoorloc. [5] In the UJIIndoorLoc dataset, the position of a location is uniquely determined by four identifiers, i.e., BuildingID, Floor, SpaceID, and RELATIVEPOSITION. For convenience, we combine the SpaceID and the REL-ATIVEPOSITION into one and mention it as location throughout the paper so that the three identifiers for building, floor, and location uniquely determine the position of a location. [6] There are slight differences between the statistics of the UJIIndoorLoc dataset described in [4] and those of the publicly available dataset at the UCI Machine Learning Repository. Figure 4. Especially, the use of multiple elements in estimating location coordinates is a huge advantage in terms of computational complexity because trained DNNs can generate multi-dimensional output values in parallel; in traditional approaches, on the other hand, selecting nearest locations based on Euclidean distances are complex and time consuming. This flexibility in handling DNN outputs also makes it easy to apply different weights to the cost of building, floor, and location classification error during the training phase.

Experimental Results
We carried out experiments using the UJIIndoorLoc dataset [4] to evaluate the performance of the proposed DNN-based multi-building and multi-floor indoor localization system. [7] We focus on the effects of the number of largest elements from the output location vector (i.e., κ) and the scaling factor for a threshold (i.e., σ) in the location coordinates estimation procedure described in Sec. 2. Table 2 summarizes DNN parameter values for the experiments, which are chosen experimentally and used throughout the experiments.
As indicated in [7], the publicly available UJIIndoor-Loc dataset includes training and validation data, but not testing data which were provided only to the competitors at the Evaluating Ambient Assisted Living (EvAAL) competition at the International Conference on Indoor Positioning and Indoor Navigation (IPIN) 2015 [3]. Also, unlike the training data, the validation data do not include location information (i.e., SpaceID and RELATIVEPOSITION fields) because the measurements were taken at arbitrary points as would happen in a real localization system. In this regard, we split the training data into new training and validation data with the ratio of 70:30 for DNN training and validation with building/floor/location labels for both. During the evaluation phase, the output from the trained DNN are post-processed as described in Sec. 2 and compared with the building, floor, and coordinates of a given location. In this way, we can compare out results of multi-building and multi-floor indoor localization with the baseline and the best results from [4] and [3]. Table 3 summarizes our experimental results, which show the effects of the number of largest elements from the output location vector (κ) and the scaling factor for a threshold (σ) on the performance of multibuilding and multi-floor indoor localization. We highlight in light gray the rows with success rate (i.e., the [7] Source code for the implemented DNN models based on Keras [11] and TensorFlow [12] is available online: http://kyeongsoo.github.io/research/ projects/indoor_localization/index.html. rate of successful estimation of both building and floor) higher than 90 % and positioning errors (both normal and weighted centroid) lower than 10 m, respectively.
In general, σ in the range of 0.1-0.3 produces the best localization performance for κ≤8; once κ becomes larger than 8, however, higher values of σ (i.e., 0.4 for κ=9 and 0.5 for κ=10) generate better performance. Considering the coordinates location estimation examples shown in Figure 6 with their explanations in Sec. 2 regarding the use of two design parameters, we can explain these results as follows: With a larger value of κ (i.e., 9 and 10), there could be a higher chance of including reference points relatively far from the given location as shown in Figure 6 (b). In such a case, a tighter threshold (i.e., a larger value of σ [8] ) can filter out those reference points.
According to the results shown in Table 3, collectively the best results are achieved when κ=8 and σ=0.2, which are highlighted in gray. These results from the proposed DNN-based multi-building and multi-floor indoor localization system -i.e., 99.82 % for building hit rate, 91.27 % for floor hit rate, 91.18 % for success rate and 9.29 m for positioning error -are favorably comparable to the baseline results -i.e., 89.92 % for success rate and 7.9 m for positioning error -from [4] which are based on the distance-based k-Nearest Neighbors (kNN) algorithm [13]. As discussed in Sec. 2, even though a direct comparison with the results from the EvAAL/IPIN 2015 competition is not possible due to the lack of testing samples in the public version of the UJIIndoorLoc dataset and a slightly different way of calculating the positioning error, our results are also comparable to the competition results summarized in Table 4.
Note that the results presented in this section are not optimized with DNN parameters, including the number of hidden layers and the number of nodes at each layer; we investigated the feasibility of the combined building/floor/location estimation using a single DNN based on multi-label classification framework with focus on the effects of the number of largest elements from the output location vector (κ) and the scaling factor for a threshold (σ) in the location coordinates estimation. This leaves much room for further optimization of the performance.

Conclusions
In this paper we have proposed a new scalable DNN architecture for multi-building and multi-floor indoor [8] When σ=0, there is no filtering (i.e., including all κ reference points); when σ=1, only the reference point with the largest value is considered during the location coordinates estimation. localization based on Wi-Fi fingerprinting, which can cover a large-scale complex of many multi-story buildings under the same management. The proposed DNN architecture consists of an SAE for the reduction of feature space dimension and a feed-forward classifier for multi-label classification of building/floor/location. Reformulating the problem of building/floor/location classification based on the framework of multi-label classification, we can achieve better scalability (i.e., greatly reducing the number of DNN output nodes) and better exploit the hierarchical nature of the building/floor estimation and the floor-level location coordinates estimation through systematic label formation (i.e., providing straightforward mapping between a DNN categorical variable and building, floor & location vectors) compared to existing DNN architectures based on multi-class classification.
The experimental results using the UJIIndoorLoc dataset clearly demonstrate the feasibility of the proposed DNN-based multi-building and multi-floor indoor localization system, which can provide near stateof-the-art performance using a single DNN in an integrated way. Combined with the unique advantage of a DNN-based indoor localization system that, once trained, it does not need the fingerprint database any longer but carries the necessary information for localization in DNN weights, the scalable DNN architecture proposed in this paper could open a door for a future secure and energy-efficient indoor localization solution exclusively running on mobile devices without exchanging any data with the server.
Still, there are several areas for further research in multi-building and multi-floor indoor localization based on DNNs and Wi-Fi fingerprinting. First, the experimental results described in Sec. 3 are preliminary and not based on optimized DNN parameters; there are still much room for further optimization of the performance through DNN parameter tuning and the investigation of the performance and complexity tradeoff. Second, there is no direct match between the cost function used in DNN training/validation and the actual performance in the final evaluation for building/floor detection and the location coordinates estimation due to the additional processing of DNN output (i.e., the use of arg max function for the former and the rather complicated processing with multiple elements of the location vector for the latter). To fully take into account the actual performance of building/floor detection and location coordinates estimation during training/validation, we may consider heuristics like evolutionary algorithms (e.g., genetic algorithm (GA) [14] and particle swarm optimization (PSO) [15]), simulated annealing [16], and quantum annealing [17] for training DNN weights; due to its many tradeoffs between complexity and flexibility resulting from the use of heuristics in DNN weight training, this approach could be an interesting topic for long-term research.  Tables  Table 1 Label formation example for the multi-label classification Figure 6 Two examples of location coordinates estimation, where × denotes a new location whose coordinates are to be estimated and denotes a reference point whose coordinates are stored in the fingerprint database. The numbers over the reference points indicate their closeness to the new location with 1 being the closest.