Smart home: Keeping privacy based on Air‐Padding

Chao Yang, School of Cyber Engineering, Xidian University, Xi'an, China Email: chaoyang@xidian.edu.cn Abstract With the rapid development of the IoT, smart home plays an increasingly important role in daily life. At the same time, privacy issues have gradually aroused the concern of researchers. Past research proves that there are privacy risks when people use smart home devices and some protection methods are proposed to protect people's privacy. However, attackers can still infer people's behaviour and activities through analysing the wireless traffic of smart home devices. Herein, an attack method under complicated WiFi scenarios is demonstrated, including different locations, different buildings and different networks. In addition, a new protection method, Air‐Padding, is developed, which can change the traffic patterns of smart home devices through injecting constructed packets into the link between devices and routers. The results prove that Air‐Padding can prevent people's behaviour and activities from being analysed and inferred.


| INTRODUCTION
IoT is becoming more commonly used around the world. With the development of IoT, the number of smart home devices has increased rapidly. People can control the work of smart devices through the mobile terminals, which greatly facilitate their daily life.
Unfortunately, smart home devices have also leaked a lot of sensitive information inadvertently. In the past years, there are many discussions about privacy and security issues, including enhancing trust between IoT devices [1,2], enhancing the confidentiality and integrity of data [3], separating identity and data [4] and protecting the sensitive information [5][6][7] etc. Smart home devices usually communicate with other network devices through wireless protocols such as WiFi and ZigBee. All these data are exposed to the wireless space. In this case, attackers can monitor and collect these data easily, which has led to some privacy issues. For example, if the categories of smart home devices in people's house are obtained by advertisers, they can push some ads precisely. Besides, an attacker maybe infer when a person leaves the house and returns home by recording the working state of a smart door lock. Past research proves that there is a risk of privacy leakage in a smart home. Therefore, how to protect people's privacy and device sensitive information becomes a key issue. Researchers have proposed several methods to protect people's privacy and device sensitive information. One of the methods is changing the traffic rate through traffic shaping [8]. And another method is padding and fragmenting the packets when devices send them [9].
However, there are so many limitations when performing attacks in the real WiFi environment. Since the WPA2 authentication protocol encrypts the link layer data [10,11], it is hard for attackers to get the plaintext, such as TCP, IP, DNS information [12][13][14]. Besides, WiFi noise has great influence on attacking, which make past attacks [8,15] inactive. For example, control frames and management frames will reduce the accuracy of device identification. Therefore, past attacks are impractical to identify the category and working state of smart home devices.
It is difficult to block traffic through gateways and a VPN tunnel [8,16] to effectively protect people's privacy. Meanwhile, traffic shaping is also flawed. Although the shaped traffic makes it difficult for ISPs to infer people's behaviour, it cannot prevent attacks from WiFi eavesdroppers. Traffic shaping works on routers, which cannot protect the communication data between smart home devices and routers. Moreover, a one second delay will be generated due to the traffic shaping at least. And device designers have to consider modifying the communication protocols in order to perform traffic shaping. Another method of privacy protection [9] is incorporating traffic shaping directly into devices when devices send packets.
However, the author mentions that it cannot protect highlatency, low-bandwidth devices. And those smart home devices which cannot be updated are no longer protected.
In this article, we demonstrate that in a link layer encrypted environment, the category and working state of smart home devices can be identified by analysing the network traffic after WiFi noise elimination. These sensitive informations can be used to infer people's privacy, such as activities and behaviours. As shown in Figure 1, the communication data of smart home devices maybe monitored and collected by attackers. After data pre-processing, noise elimination, feature extracting and data training, the category and working state of devices can be identified accurately. In this paper, we evaluate the identification performance in different scenarios, including different locations, different networks and using different machine learning algorithms. The experimental result shows that compared with other attacks [8,15], there is more than 90% accuracy to identify the device category and 95% accuracy to identify the working state using Decision Tree and Random Forest.
We propose a privacy protection method, Air-Padding to defend against the WiFi eavesdropper to protect people's privacy. Compared with other protecting methods, it is not essential for Air-Padding to modify the original packets and protocols. Because Air-Padding only occupies intranet network bandwidth, the cost will be reduced. In addition, smart devices will not be affected by any delay. Experiment shows that the probability of privacy leakage drops from 95% to 20% after protection, which makes it difficult for attackers to analyse the smart device information and people's privacy.
Contributions: The contributions of this article include: � We propose and prove that attackers can infer people's behaviour and privacy through analysing the traffic of smart home devices in a complicated WiFi environment. � We propose a privacy protection method, Air-Padding, to prevent device sensitive information and people's privacy from being collected and inferred by WiFi eavesdroppers.

| THREAT MODEL
In this section, we present the attack model. As shown in Figure 1, we assume all the smart home devices are connected to a WPA2 WiFi network and the adversary is a WiFi eavesdropper. The attacker's goal is to monitor the encrypted data between devices and routers, and analyse these encrypted data to identify the device category and working state, inferring a person's behaviour and activities. To achieve this goal, the attacker deploys some monitor devices around the house, such as deploying devices near the window and even on the roof. And the attacker is supposed to possess the following capabilities.
No need to access the network. Attackers are not required to access the network, it is only through the passive monitoring that they can achieve the purpose.
Monitoring all the data. Monitoring devices deployed by attackers have enough power to monitor and collect all the communication data of the home network.
Strong concealment. Attackers can deploy monitor devices easily and it is hard for the smart home owner to discover the monitor devices.

| IDENTIFICATION
In this section, we represent the methodology and evaluation procedure of device category and working state identification.

| Methodology
We propose and evaluate the identification of smart home devices passively by recording the communication data between smart home devices and home routers. The identification mechanism analyses the network traffic of the smart home devices.
All the smart home devices communicate with smart phone and other controller devices based on TCP and UDP. Although WPA2 encrypts these data on link layer, some metadata(the header field of packets) still can be monitored by attackers, such as packets length and time interval. In order to implement their own function, smart home devices send and receive some command data. These command data are different. Figure 2 shows the traffic rate of four different smart home devices.
Tplink camera: As shown in Figure 2(a), Tplink camera has two modes, live mode and motion detection mode. In the live mode, Tplink camera uploads the current video on the cloud, and a user can view this video in real time on a mobile application. In the motion detection mode, Tplink camera will send a warning to the application when monitoring a movement. -157 pressure report to the application after measurement, shown in Figure 2(b), and the application stores the blood pressure record to help users check their health condition at any time.
And after the measurement, Mi sphygmomanometer will be disconnected from the home router. Although attackers cannot get the specific value of users' blood pressure, weight and body fat rate due to the WPA2 encryption. Users' activities still can be inferred by devices working state. Ezviz plug: Figure 2(c) shows the traffic rate of Ezviz plug.
Ezviz plug is popular in the market. Users control the switch of the plug by clicking the on or off button on the application. Once users click the button, wireless data will be sent to the plug. There are eight traffic peaks in Figure 2(c), which means users turn on or off the plug eight times.
Mi music: Mi music has only one mode, playing music. It downloads and plays music when users click the play button on the application. Once Mi music receives an order to play music, it will download the music from the network first. The curve peak in Figure 2(d) represents a music play. In order to identify the device category and working state based on these command data, we proposed an identification method that has four major steps. The first step is to monitor the environment by recording the wireless network traffic. The recorded files contained all the communication data of smart home devices and other network devices that are essential to identify device category and working state. The second step is pre-processing the recorded data to eliminate the WiFi noise. Specifically, management frames, control frames and retransmission packets need to be dropped to prevent the identification from being disturbed. The third step is to extract the features, such as the packet length, signal strength, packet interval, etc. Not all the features are available. Three features are chosen to perform the identification. The last step is to classify the smart home devices category and working state to infer persons' activities. Here, four machine learning algorithms, KNN, SVM, Decision Tree and Random Forest, are utilised to perform the analysis. Description of these four steps are represented in the sections below. Figure 3 provides a visual overview of the process.

| Data monitoring
The first step is monitoring and collecting communication data from the wireless network. In order to record the wireless network traffic, a network sniffing tool is used with a network card in monitor mode. Wireshark, a famous tool for network sniffing is used to sniff the network traffic of smart home devices.

| Data pre-processing
Past attacks [8,15] do not work in the wireless environment, especially when the network changes. For example, identification accuracy will reduce sharply when the test wireless network is different with the training network. The reason is that there is some WiFi noise, such as management frames and control frames, which makes the training traffic features different with the test traffic features. Besides, since different brand routers use different 802.11 protocols, some header information of 802.11 packet maybe different, which is another factor that influences the identification result. So, it is vital to perform the data pre-processing.
When wireless network traffic is sniffed, management frames, control frames need to be dropped. To perform the pre-process, the packet whose type/sub-type field is not 0x0020 or 0x0028(this field of management frames and control frames is not 0x0020 or 0x0028) is dropped. Dropping these packets has no impact on the final analysis, because there is no relationship between these packets and the data send and received by smart home devices. Meanwhile, in order to eliminate the influence of different lengths of the 802.11 protocol header, the header length is subtracted when calculating the packets length.

| Features Extraction
After data pre-processing, wireless network traffic is saved as a PCAP file. In monitor mode, wireshark can record the MAC address of smart home devices. Based on the source MAC address and the destination MAC address fields in the packets, network traffic is divided into several streams. Each stream represents a smart home device. For every traffic stream, some fields like packet interval, packet length and signal, etc. are extracted. Previous work [8,15] has proved that in some scenarios, traffic rate can identify the state of smart home devices. However, some different devices have similar traffic rate, which means more features are required to distinguish these devices. As shown in Figure 4, through comparing the packet length of devices with similar traffic rate, we found their maximum and minimum packet lengths are different. After analysis and experiment, only three features are needed for identification, and a 3-tuple [avg, max, min] is generated for training and testing, shown in Figure 3. The avg is the average of data length per 30 s, the max is the maximum packet length per 30 s and the min is the minimum packet length per 30 s.

| Device category identification
When attackers prepare to identify the device working state and infer people's behaviour, they must determine the device category firstly. Every smart home device has their own traffic pattern when they send and receive command data. Of course, there are some devices with similar features, so they need to be distinguished. Figure 4 shows the 3-tuple of six devices. Based on the difference of the 3-tuple, machine learning algorithms can be used to classify the device category. In this article, four simple machine learning algorithms(KNN, SVM, Decision Tree, Random Forest) are used.

| Device working state identification
After the identification of the device category, attackers can perform the last step, identifying the device working state. Figure 2 has shown the traffic rate of smart home device. In a different working state, the traffic rate is different. For example, there will be a traffic peak when a smart plug is turned on or off. And when a camera works in live state, the traffic rate is around 100kb/s. Therefore, the 3-tuple can also be used to perform the working state identification. Attackers can infer people's behaviour by analysing the device category and working state. For example, attackers can infer when people leave home by analysing the working state of smart door lock. And they also can infer if there are monitoring camera around people's house. There will be a high security risk once attackers get these privacy information.

| Evaluation procedure
In this section, we evaluate the effectiveness of our identification method. The goal of the evaluation is to understand whether the attack method is effective if the training scenario is different with the test scenario. We evaluate the method by analysing the network traffic of 14 different smart home devices., and we collected network traffic under a variety of conditions as described in Table 1 by varying the building and the position of monitoring and collecting device.

| Experiment Setup
In the experiment, wireshark was used on a ThinkPad T460 laptop with Kali 2.0 to collect the network traffic. The version of the Wireshark software installed on the laptop was 2.4.2. We tested our method in an apartment as well as in a house with 14 smart home devices of various settings and network conditions. The detailed settings are as follows.
Home scenario: Our experimental environment was an apartment with two 50 square metres rooms and a house with two 60 m rooms. There was a room of them in the apartment and house for pre-training. The other one was used for the test. We placed 14 different smart home devices in different positions. In the apartment scenario, we deployed the monitoring devices downstairs, upstairs and at the corridor. And in the house scenario, monitoring devices were placed outside the window and on the roof.
In the apartment scenario, attackers can easily deploy monitoring devices in the corridor, upstairs and downstairs. It is harder for attackers to deploy monitoring devices in the house scenario. They must hide these devices from being discovered by residents.
Device selection: Devices of the same brand use the same transport protocol. Therefore, the features of the wireless data may be similar. Meanwhile, devices of the same category are very similar in function. Therefore the features of wireless traffic, such as traffic rate, packet size are also similar. In order to prove the effectiveness of the attack method, we used multiple device categories of the same brand and multiple brands of the same category in the experiment. A total of five different device categories of the same brand, shown in Table 2, and four groups of different brands of the same category, shown in Table 3 were used. These 14 smart home devices were deployed in a room. Of course, there were some non-intelligent network devices in the experiment, including some phones and laptops.
Router brands: In order to simulate a different network environment, we used two different routers, a Huawei router and a Tenda router in our experiment.
Window size: If the window size is too large, it will take a long time to perform the identification. And if the window size is too small, the identification accuracy will decrease. In order to determine the best window size, in experiment, we collected network traffic in a time window, and the window size was from 0 to 30 s.
Dataset: In this research, we collected 5 GB network traffic in 24 h for training and 4 GB traffic was collected to perform the identification test. In addition, we conducted a cross-test. The data collected from Huawei routers is used as the training set, and the data from Tenda routers is used as the test set to simulate the changes in the network environment when attackers conduct training and testing.

| Identification accuracy metrics
We calculated the accuracy, true positive rate, false positive rate, true negative rate, precision, recall and F1 score of our method. Here, we define ACC ¼ as true negative rate, Although the identification of device category and working state is not a binary classification, we still use binary classification metrics to evaluate the identification result. In this article, if the device category and working state are correctly identified, they are defined as positive classes. If the device category and working state are misidentified, they are defined as negative classes.  Figure 5, we can find that with the increasing of window size, TPR, precision and F1 score are slightly improved, and the FPR is decreasing sharply. When the time window size up to 30 s, the TPR, precision and F1 score are above 94%.
Apartment Scenario: We evaluate the performance of device category and working state identification in the apartment scenario where 14 smart home devices and 2 routers are deployed. The network traffic was collected at different locations, including downstairs, upstairs and the corridor. The TPR and TNR of device category identification are shown in Figure 6a. The TPR and TNR are improved with increasing time window size. When the window size is 30 s, the TPR and TNR are above 93%. And the TPR and TNR of device working state are shown in Figure 6b. When the window size is 30 s, the TPR and TNR are all above 98%. Table 4 shows the identification accuracy in different locations. The average accuracy of the device category identification in these three locations is above 93%, device working state identification is around 97%. It means there is a high accuracy to identify the device category and working state ( Table 5).
House Scenario: We also evaluate the classification performance of the house scenario. In this case, we collected network traffic outside the window and on the roof. The TPR and TNR of device category identification are shown in Figure 6c. When the window size is 30 s, the TPR and TNR are above 92%. The TPR and TNR of device working state identification are shown in Figure 6d. When the window size is 30 s, the overall TPR and TNR are more than 97%. -161 noise had a huge impact on the experiment. Some devices have completely different traffic rates in different scenarios, such as bull plug. Acar's attack was better, but WiFi noise still had an influence. Compared to their attacks, we eliminated the influence of different lengths of 802.11 protocol header, extracted a 3-tuple features and do tests in complex scenarios. The result shows that our methods work well in different scenarios, including different locations and different networks. The identification accuracy of three different attacks is shown in Table 6.
Overall performance: As shown in Figure 5, when the time window size is 30 s, the TPR, FPR, Precision and F1 score are the best, and in different scenarios, including home environments, locations and networks, our attack works very well, which proves that it is possible to identify the device category and working state. Figure 7 shows that Decision Tree and Random Forest perform best. And for each device, the device category identification accuracy is shown in Table 7. Table 8 shows the accuracy, precision and F1 score of each device in Decision Tree and Random Forest.

| PROTECTION: AIR-PADDING
In this section, we represent the methodology and performance of Air-Padding.

| Requirements
To protect the smart home devices' category and working state from being identified, Air-Padding should satisfy the following requirements.
Usability. Air-Padding should protect the device category and the working state from being identified all the time. Moreover, Air-padding should not be discovered by attackers from the metadata, such as signal power and sequence number.
No permission to change device communication protocol. Air-padding should be effective without the permission to change the transport protocol or the firmware. Some protection methods are hard to perform, -because the IoT device designer must consider to modify the Low overhead. Air-Padding is designed for the smart home network. In some cases, the traffic is billed on bytes. We must ensure that Air-padding cannot cause any extra traffic cost when protection is performed.

TA B L E 4 The overall accuracy in different locations
Lightweight. We envision to implement Air-padding on a small device. Given the limited computing resources, Air-Padding must be lightweight in terms of CPU computing and memory usage.

| Methodology
We propose a method, Air-Padding, to prevent device category and working state from being identified. We design a lightweight device to perform Air-Padding. The main idea of Air-Padding is sending constructed packets to the routers and devices to change the network traffic features of smart home devices. Since the sending device and the receiving device do not establish a TCP and UDP connection, and the receiving device will drop the injecting traffic. At this time, the traffic of smart home devices is changed. Attackers only can collect link layer packets, which means the dropped packets are the same with the real packets in their views. Before the Air-Padding, attackers monitor and collect the network traffic T of smart home devices, and extract traffic features f: Assuming the injecting traffic is T p , and the features of the injecting traffic is f p , Due to attackers cannot distinguish the real traffic and injecting traffic, in their views, the network traffic is T n now, and the traffic features is f n .
If the rate of f p is equal or greater than f, compared with f, f n changes a lot. The attacker cannot classify the device category and working state. In order to reduce injecting overhead, we classify smart home devices into two categories based on the bandwidth: • High-bandwidth devices • Low-bandwidth devices High-bandwidth devices can continuously send and receive large amounts of data in a short period of time, such as wireless cameras. Low-bandwidth devices send and receive only a small amount of data, such as smart power plugs, smart lighting, etc. Obviously, high-bandwidth devices need much more constructed packets to change the traffic features than the lowbandwidth devices.
Air-Padding has two steps. The first step is to construct injecting packets. These packets will be injected into the communication link between smart home devices and the home router, including uplink and downlink. The second step is injecting constructed packets to change the network traffic features of smart home devices.

| Constructing packets
The injecting packets have the same features with the real packets of target devices. Because of the encryption in the link layer, only 802.11 header fields and packets' length need to be modified, and IP header and TCP header are the same as the real packets. In Air-Padding, we just modify the MAC address field, including source MAC address and destination MAC address and packet length field. The source MAC address is modified to be the same with the target device's MAC address, and the destination MAC address is the home router's MAC address. As for packet length field, it is set to the same with the protected device packet length. The setting method is padding the payload of packets to the required length.

| Packet injecting
There are two injecting algorithms. The first one is keeping the traffic rate around a certain threshold, and theother one is disorganising the traffic rate. For the first method, the injection rate r is equal to the traffic rate r 1 of smart home device.
Also, the packets length must be the same as original packets. For example, the traffic rate of a Mi camera is 100kb/s when it is working. The injection rate will also be 100kb/s when the camera does not work. For the second method, the injection rate and packet length are irregular and random.
There is a range of values to reduce the overhead. Here, the linear congruential method (LCG) is used to generate the random value.
r n is the random number, a is the multiplier, r n−1 is the previous c is the increment, and m is the modulus. Air-Padding will continuously adjust the injection rate as the random number is generated. So the injection rate is always equal to the random number. Both uplink and downlink are required to inject, and the difference between these two links is the source and destination MAC address which are inverse. The link from devices to the router is defined as uplink. In order to change the uplink traffic features, the target device MAC address is set as the source address, and the router MAC address is set to the destination address of constructed packet. For downlink, the link from the router to devices is defined as downlink. Similarly, the router MAC address is set to the source address, and the smart home device MAC address is set to the destination address of constructed packets. Then these constructed packets are send to the target device.

| Air-Padding for high-bandwidth devices
For high-bandwidth devices, since the SN field of the packet increases successively from 0 to 4095 and a large amount of packets are sent, there will be two SN streams in the wireless space when Air-Padding is performed, including real traffic SN stream and fake packets SN stream. The SN field in packets cannot be modified. Therefore, it is easy for attackers to detect the injecting packets of a high-bandwidth device through the SN field. It will reduce the effectiveness of the privacy protection.
SN: SN (Sequence Number) field is a 12-bit field in 802.11 packets. Sequence numbers are not assigned to control frames, as the Sequence Control field is not present. This field is used to eliminate duplicate received frames and to reassemble fragments.
In order to prevent the protection method from detection, we use the first injecting algorithms, keeping the traffic rate around a certain threshold, to solve this problem, ensuring that there is only one SN stream at the same time. The algorithm has four steps as follows.
� Injecting constructed packets when smart home devices are in the sleep state. � Once the working state changes from the sleep state to the working state, we first block device communication and record the real SN until the SN of injecting packets is equal to the real SN. And injecting packets into the link. � When smart home devices are in the working state, packet injecting will be not performed. � Once the working state changes from the working state to the sleep state, the constructed packets are injected into the link between devices and home router, whose SN is equal to the SN of the last real packets.
For example, when protecting a WiFi camera, we first collect 1 minute of traffic and record the length of each packet. Then injecting packets will be constructed, whose length is the same with the record length. Assuming the camera is working now, no packets will be injected into the link until the camera stops working. When the camera stops working, Air-Padding will be performed. Approximately 100kb/s traffic will be injected into the link. Therefore, when Air-Padding works, the camera's traffic rate is always around 100kb/s regardless of its working condition and in the attackers' views, the camera is always in a working state.

| Air-Padding for low-bandwidth devices
Low-bandwidth devices send a small number of packets when they work. So, SN has almost no impact on Air-Padding. Therefore, Air-padding will use the second algorithm, disturbing the traffic rate, to construct packets. To reduce the overhead, the maximum injection rate is limited to 3kb/s. That is, the injection rate will be randomly selected from 0 to 3kb/s. Air-Padding makes the traffic rate of low-bandwidth irregular. It is difficult for attackers to identify the device working state, even the device category.

| Performance evaluation
In this section, we evaluate our protection approach in terms of experimentation, availability, latency, and overhead.

| Experiment
In our experiment, the packets injection rate was set to 100kb/ s for high-bandwidth devices and a random rate from 0-3kb/s for low-bandwidth devices to protect their working state and category. The scenario is the same as mentioned in section 3. A laptop was utilised to perform the Air-padding, which was placed near to the smart home device. The iptables component was used to block the communication of smart home devices.

| Usability
We used different machine learning algorithms(KNN, SVM, Decision Tree, Random Forest) to test the all devices, including low-bandwidth devices and high-bandwidth devices to evaluate the usability of the protection method. For low-bandwidth devices, the classification result shows that, with the increasing injection rate, the identification accuracy of low-bandwidth devices is gradually reduced (Figure 8). When injection rate reaches 3kb/s, the identification accuracy is reduced to less than 20%. Therefore, in our experiment, the random injection rate was set to 0-3kb/s. Figure 9 shows that after Air-Padding, the classification accuracy decreases sharply.
For high-bandwidth devices, we tested five different cameras and record the identification accuracy before and after packets injecting. The result shows that the working state of all five cameras are identified as "Live" state.
Besides, we tested the robustness of the Air-Padding. Due to the injecting packets are sent from other network devices instead of smart home devices, attackers can identify the injecting device by three antennas. In our experiment, the injecting device is deployed near the smart home device. The signal Strength of smart home devices and protection devices are very similar. Even three antennas cannot identify the injecting device. On the other hand, Air-Padding makes the traffic feature of smart home devices irregular, which makes attacks inactive. Therefore, Air-Padding can resist the injecting packets detection, even when it is known to the attacker.

| Delay
Previous protection method, such as traffic shaping always causes delays when smart home devices send and receive data. It will not happen when Air-Padding works, which does not make any changes to the original traffic. It depends on constructing packets instead of modifying the original traffic rate of smart home devices. In this paper, we test the delays when Air-Padding works. It turns out that there is no delay in injecting the constructing packets into the link for any device (Table 8).

| Overhead
Overhead is the decisive factor in determining whether the protection method is available. The specific overhead is shown in Table 9. We analyse the overhead from the following aspects. Traffic cost: Whether privacy protection methods will cause additional overhead is of great significance to users. For example, some users' networks are charged by traffic, so protection methods that will generate additional traffic overhead may not be adopted. Air-Padding will not generate any extra cost when it is performed. After constructing the packets, the traffic injection device sends these packets to the smart home device or gateway. Both smart home devices and gateways will drop these injected traffic. Therefore, no injected traffic will be sent to the ISP, which means that users need not pay any extra fee. It should be noted that Air-padding still takes router's bandwidth. As shown in Table 9, Air-Padding injects the packets into the upload link and download link, which will occupy the bandwidth of the home network. Each high-bandwidth device in Table 9 needs to consume more than 100 KB/s bandwidth, while each low-bandwidth device requires lower bandwidth consumption. Mi music is a special device, which needs at least 1 MB/s bandwidth consumption. The consumption of these bandwidths is acceptable to users. Normally, the bandwidth of home routers is 100 MB/s. Injecting device cost: Air-Padding depends on a single wireless device to inject the constructing packets. Due to the location of the injecting device can be identified by three  [16]. Different from these previous works, our identification method can be performed in an encrypted and a noisy environment. As far as we know, we are the first ones to propose that the working state of smart home devices can be identified by wireless attackers in different noisy network scenarios and attackers can use this information to infer people's behaviour. Encryption traffic analysis: Past research proved that user privacy is not secure even in the encrypted environment, such as VPN encryption [19], transport layer encryption [20] and application layer encryption [21]. The attacker can use the TLS/SSL information [22,23] and TCP/IP header information [21] to identify the user's behaviour. Ganzolez et al. described how to use https information to infer the website viewed by users [22]. Dubin et al. [24] identified the video streaming title through the encrypted http information. Conti et al. [25] demonstrated that user actions might be identified based on machine learning. Yao et al. [23] demonstrated that mobile application could be classified by the http and https header information. In our scenario, due to the WPA2 authentication, it is difficult to get these information. Only 802.11 packets can be used to perform the analysis. Privacy protection: Previous work explored how to protect users privacy. Most solutions rely on independent link padding (ILP) or dependent link padding (DLP) [26][27][28][29]. ILP and DLP change traffic features by padding and fragmenting packets and sending cover packets. Apthorpe et al. [8] described a traffic shaping method to change traffic features, which aims to prevent ISP from identifying smart home devices and their working state through traffic shape. The specific implementation is to add an encrypted VPN tunnel at the exit of the home router for traffic injection, and discard the injected traffic at the end of the VPN tunnel. However, traffic shaping cannot prevent WiFi eavesdroppers' attack, because traffic shaping occurs outside the wireless link. Meanwhile, it turned out that simple traffic padding and fragmenting cannot fully protect user privacy. Fu et al. [27] demonstrated that variable interpacket delays can affect user activities more than constant interpacket delays. Datta et al. [9] provided a python library for IoT developers to easily integrate privacy-preserving traffic shaping into their products. However, it is not applicable for low-latency high-bandwidth devices due to the high latency. Compared with the previous methods, Air-Padding protect smart home devices from being identified by WiFi eavesdropper. Moreover, it does not take any additional costs due to bandwidth overhead.

| CONCLUSION
In this article, we explore how to identify the category and working state of smart home devices through the network traffic in an encrypted wireless environment. We propose a method based on machine learning to identify the device category, manufacturer and working state. In this experiment, there is more than 95% accuracy to identify the device category and manufacturer, and the working state. At the same time, we propose a method based on Air-Padding to protect the link between smart home devices and wireless access devices. Experiments prove that our protection method can protect the device category and working state from being identified, which has extraordinary significance for people who use these smart home devices. Meanwhile, we investigate the overhead and other factors of our protection method. The result showed that it will not take users' extra network bandwidth, which means users will not pay any extra fees to protect their sensitive information. The only overhead is router's bandwidth when protecting the users' privacy and the injecting devices. We hope that consumers will be aware of the danger of privacy leakage and we also hope that privacy protection can be considered by manufacturers, who even built a new privacy protection protocol.