Identifying Users - Easy User Association (IDM 2.0 & IDM 1.0)
|
Collect
Selecting the appropriate user ID has a great impact on improving the accuracy of user behavior analysis, especially for user-related analysis functions such as funnel, retention, and Session. Therefore, we should first determine how to identify users before any data access. The following will introduce the principle of Sensors analytics of user identity, and several typical cases of user identity schemes.
Note: Do not directly switch the data receiving address of different items on the online page, which will lead to the first day, ID abnormal. It is recommended that the data be sent to the test project during the offline test, if no problem, the data collected online will be directly sent to the formal project.
1. Basic concept
Sensors analytics uses a unique ID (that is, the user_id in the events table and the id in the users table) to uniquely identify a user for each product. A unique ID is generated based on the distinct_id according to certain rules. In general, a typical distinct_id may have either of the following types:
1.1. Device ID
It is important to note that the device ID is not necessarily the unique identification of the device. For example, the Cookies on the Web side may be cleared (such as various security guards), while the IDFV on the iOS side is different between apps from different manufacturers. However, Shenze's client SDK has done all kinds of processing.
SDK type | rule |
---|---|
Android | For versions earlier than 1.10.5, the default UUID (for example, 550e8400-e29b-41d4-a716-446655440000) is used. The UUID will change when the App is uninstalled and reinstalled. To ensure that the device ID remains unchanged, you can configure AndroidId (for example: 774d56d682e549c3); SDK versions 1.10.5 and later use AndroidId as the device ID by default, and get a random UUID if AndroidId cannot get it. |
iOS | In versions 1.10.18 and later, if an App introduces the AdSupport library, the SDK uses IDFA as the anonymous ID by default. Before 1.10.18, the SDK uses IDFV first (e.g. 1E2DFA10-236A-47UD-6641-AB1FC4E6483F), and if IDFV fails to be obtained, a random UUID is used (e.g. 550e8400-e29b-41d4-a716-446655440000), generally can obtain IDFV. If IDFV or UUID is used, the device ID changes when the user uninstalls and reinstalls the App. You can also configure IDFA (for example, 1E2DFA89-496A-47FD-9941-DF1FC4E6484A). If IDFA is enabled, the SDK will obtain IDFA first. If IDFA fails to be obtained, the SDK will try to obtain IDFV again. Using IDFA prevents the user from changing the device ID after reinstalling the App. |
JavaScript | cookie_id is used by default (e.g. 15ffdb0a3f898-02045d1cb7be78-31126a5d-250125-15ffdb0a3fa40a), cookie_id is generated by the JavaScript SDK by default and stored in the browser cookie, The rule is composed of five fields with different meanings to ensure uniqueness, including two time stamps, a screen width and height, a random number, and a UA value. |
Wechat mini program | The UUID is used by default (for example, 1558509239724-9278730-00c1875d5f63f8-41373096), but the UUID changes when you delete the applet. In order to ensure the device ID remains the same, it is recommended that the access to and use openid (for example: oWDMZ0WHqfsjIz7A9B2XNQOWmN3E). If you choose to use openid, note that fetching an openid is an asynchronous operation, but a cold start event may occur first. In this case the distinct_id for that cold start event is incorrect. So we're going to store the operation that happened first, and we're going to call sa.init() after we get the openid, etc., before we send the data. For details about how to obtain and operate openid, see this documentWechat mini program SDK. |
1.2. login ID
The login ID is usually the primary key or other unique identifier in the business database. So the login ID, relatively speaking, is more accurate and more persistent. However, users are not necessarily registered or logged in at the time of use, which they are not at this timelogin ID 的。
The login ID will be stored in the second_id field in the users table
Once the login ID is determined, try not to change it. If you need to change it, please contact the student on duty.
It should be noted in particular that the user in the analysis of God is the main body of the event, not necessarily the end user, but also an enterprise, a business or even a car, which needs to be flexibly determined according to the specific analysis needs.
1.3. Detailed solution
Please note that the following schemes are not compatible with each other, please be sure to choose the most suitable scheme before the official access.
1.3.1. Solution 1: Use only the device ID
1.3.1.1. Application scenario
It is suitable for products with no user registration system, or very few users will log in to multiple devices, such as tool products, search engines, and some e-commerce. This is the only solution offered by most other data analytics products.
1.3.1.2. limitation
- The same user on different devices is considered a different user(Sensors ID is different, associate withdevice ID ), it has an impact on the subsequent analysis and statistics.
- Different users on the same device are considered to be one user(Sensors ID is the same, becausedevice ID is the same), it also has an impact on the subsequent analysis and statistics.
- However, if cross-device use or multi-user sharing is not a common scenario of the product, the above problems can be ignored.
1.3.1.3. Implementation method
- Use the device ID generated by the client SDK as the distinct_id. No special processing is required. If you do not want to use the default device ID generation rule of the SDK, you can call it directlyidentify Interface to pass in a custom device ID, which is recommended to be called immediately after SDK initialization is complete.
- If the product has a user registration system, you can add the login_user_id attribute to all events to identify the official ID of the user. In this way, you can also filter out specific behaviors of a user, or view user behaviors before and after login separately.
1.3.1.4. Case
Note: Both X and Y have user attributes, so even if second_id has no value, it can be written to the user table as first_id.
Detailed steps are described as follows:
1. A user installs an App on a Xiaomi phone and performs a series of operations. The device ID generated by the SDK is X, and the distinct_id sent is X, corresponding to the assigned unique ID is 1. Store policy ID 1 and device ID X in the id, first_id field of the users table.
2. The user has registered and logged in. The device does not change. The sent distinct_id is still X, so the distinct_ID ID is still 1.
3. After the user logs in, the user performs a set of operations to send a distinct_id that is still X, so the distinct_ID ID is still 1.
4. The user logs out and performs a series of operations. The result that the user sends a distinct_id that is still X, therefore, the distinct_ID ID is still 1.
5. The user sent the phone to a friend who logged in to device X using his or her own account (registered but not connected to the Distinctsystem). The distinct_id sent by the friend will still be X, so the distinct_ID ID will still be 1.
6. After that, the user's friend has been using account B to carry out a series of operations on device X. Since the device has not changed, the ID of the wizard is still 1.
7. The user uses a new iPhone and performs a series of operations. As the device ID changes to Y, the distinct_id that is sent is Y, and the assigned distinctID is 2. Store the wizard ID 2 and device ID Y in the id, first_id field of the users table.
8. The user who logged in to Apple using account A sent a distinct_id set to Y, so the distinctID is still set to 2.
9. The subsequent operations of the user after login are identified by the ID 2, as long as the device is not replaced.
In the above case, the advantage of using only the device ID to identify the user is that the logic is simple, and of course the limitations are obvious:
- After the user changes the mobile phone, the behavior before and after the user changes the mobile phone cannot be correlated.
- When a user gives the phone to a friend, the friend's behavior is still recorded by the user.
1.3.2. Solution 2: Associating Device ID with Login ID (one-to-one)
Using only the Device ID to identify users may be simple, but it may not be accurate enough for certain use cases. Therefore, Sensen Analytics provides another solution to associate the Device ID with the Login ID, which combines the Device ID and Login ID to achieve more accurate user tracking.
1.3.2.1. Applicable Scenarios
After successfully associating the Device ID with the Login ID, the user's behavior on that Device ID or under that Login ID will be linked and considered as one Sensen ID. It will also be counted as one user in user-related analysis such as events, funnels, and retention.
Although associating the Device ID with the Login ID achieves more accurate user tracking, it also increases the complexity of data collection. Therefore, we generally recommend considering ID association only when the following conditions are met:
- There is a need to link the behavior of a user before and after registering on a device.
- There is a need to link the behavior of a registered user after logging in on different devices.
1.3.2.2. Limitations
- One Device ID can only be associated with one Login ID, but in reality, one device may be used by multiple users.
- One Login ID can only be associated with one Device ID, but in reality, one user may log in on multiple devices with the same Login ID.
- If the Sensors API calling sequence is not followed, it may result in abnormal user identification (such as during historical data import), affecting the accuracy of data statistics.
1.3.2.3. Client Integration Implementation
Client integration refers to using SDKs such as iOS/Android/JavaScript for data collection. The specific calling process is as follows:
- After the SDK initialization is complete, Sensors SDK will automatically generate a Device ID as the user identifier.
- When the user registers successfully, logs in successfully, or initializes the SDK (if the Login ID can be obtained), the client actively calls the login(Login ID) interface.
- There are several choices when the user logs out:
- Do nothing, in which case Sensors will continue to use the previous user identifier for tracking. Unless there are special circumstances, this is generally recommended.
- Call the logout() method, which will clear the Login ID and use the Device ID as the user identifier again. In general, there is no need to choose this method.
- For the JavaScript SDK, you can also call the logout(true) method, which, in addition to clearing the Login ID, will also reinitialize the Device ID.
Remark 1:
SDK Type | How to get the ID from the front-end cache |
---|---|
Android | To get the anonymous ID assigned by the Sensors Analytics SDK, use the getAnonymousId() method. String AnonymousId=SensorsDataAPI.sharedInstance().getAnonymousId(); |
iOS | To get the anonymous ID assigned by the Sensors Analytics iOS SDK, use the anonymousId() method. NSString anonymousId = [[SensorsAnalyticsSDK sharedInstance] anonymousId]; (Swift code example: let anonymousId: String = SensorsAnalyticsSDK.sharedInstance().anonymousId();). |
JavaScript | Use the sensors.quick('getAnonymousID') method to get the anonymous ID. Returns the anonymous ID (Supported from SDK version 1.13.4 and above). |
WeChat Mini Program | sensors.getAnonymousID(); |
1.3.2.4. Server Access Implementation Method
Server access includes using Java/Python/PHP SDKs, as well as tools such as BatchImporter/LogAgent/FormatImporter for importing. The specific process is as follows:
- When performing server-side tracking or historical data import, if the distinct_id passed in the track or profile_set interface is a login ID, the parameter value for is_login_id must be true to inform Sensors Analytics that this is a behavior generated by a login ID. Taking the Java SDK as an example:
- If it is a behavior generated by a login ID: sa.track(registerId, true, "SubmitOrderDetail", properties);
- If it is a behavior generated by an anonymous ID: sa.track(deviceId, false, "SubmitOrderDetail", properties);
- For any login ID, once any data has been imported, the login ID cannot be associated with any device ID. Therefore, when importing historical data (data generated before accessing Sensors Analytics), it is recommended to operate as follows:
- First, perform normal SDK access and ensure that all users are associated by the login/track_signup interfaces. After running for a period of time, import historical data because at this time, most active users should have been successfully associated.
- If there is a corresponding relationship between a login ID and its corresponding device ID in the historical data, you can first construct track_signup requests to import this batch of data, and then import specific user behavior or user attribute data.
- Due to the possibility of data loss in client-side tracking, we recommend that developers also call the track_signup method in the server's registration interface to associate the device ID and login ID of new users to achieve more accurate user identification.
1.3.2.5. Case Study
注意:Y 有用户属性,所以即使 second_id 没值,也可以以 first_id 的形式写进用户表。
Detailed steps are described as follows:
1. A user installs an App on a Xiaomi phone and performs a series of operations. The device ID generated by the SDK is X, and the distinct_id sent is X, corresponding to the assigned unique ID is 1. Store policy ID 1 and device ID X in the id, first_id field of the users table.
2. The user has registered and logged in, and its login ID is A. Here, the login (client) or track_signup interface of SDK is invoked, and the device ID X is successfully associated with login ID A. And store the login ID A in the second_id field of the users table, with the wizard ID still being 1.
3. After the user logs in, the user performs A set of operations to send the distinct_id to A, and the distinctID to 1.
4. The user logs out and performs A series of operations. The SDK does not invoke any method to send A distinct_id to identify the current user (because login ID A is bound to DistinctID 1).
5. The user gave the mobile phone to A friend, and the friend used his own account (registered but not connected to the Sensors system) to log in to device X, and the login ID was B. At this time, Sensors SDK tried to associate device ID X with login ID B, but X was already associated with A, so the association failed. At the same time, a new ID 2 will be assigned to identify the user, and the login ID B will be stored in the first_id and second_id fields of the users table at the same time. (The user's friend account has not been associated with other devices before, and the first login device association fails.) The login ID is also recorded on first_id).
6. After that, the user's friend uses account B to perform a series of operations on device X. The application distinct_id of B uses DistinctID 2 to identify the user (because login ID B is bound with DistinctID 2).
7. The user changes to a new iPhone and performs a series of operations. Since the user has not logged in yet, distinct_id is sent with a new device ID Y. The distinct_id is Y, corresponding to the assigned Sensors ID 3. Store the ID 3 and device ID Y in the id, first_id field of the users table.
8. When the user logs in with account A on the iPhone, Oracle will try to associate device ID Y with login ID A. Since A is already associated with X, the association will fail, but it will still switch to the user with login ID A, whose corresponding Oracle ID is still 1.
9. After the user logs in, the distinct_id sent by the user is set to A. Therefore, the distinct_id specified by the user is still marked with a distinctID 1.
In the above cases, user penetration across devices has been achieved to a large extent, but there are still limitations:
- When a user changes phones, although the behavior after logging in to the account is connected with the behavior before the phone change, the behavior before logging in for the first time on the new device is still not connected and is still recognized as the behavior of the new user.
- After the user gives the old phone to a friend, the old phone can no longer be associated with the friend's login ID because the old phone has been associated with its own login ID. Subsequent users of the old phone will be identified as the same user if they do not log in (the login ID that the old phone was successfully associated with).
1.3.3. Solution 3: Associate Device ID and Login ID (multiple to one)
Although the associated device ID and login ID (one-to-one) have achieved cross-device user penetration, they are still not accurate enough for some application scenarios, so Shenyi Analysis provides a new association scheme to support a login ID binding multiple device ids, so as to achieve more accurate user tracking.
1.3.3.1. Application scenario
It is common for a user to log in on multiple devices. For example, you may need to log in on both the Web and App devices. When multiple device ids are associated with one login ID, the user's behaviors under multiple devices are connected, which is considered to be the occurrence of a unique policy ID.
1.3.3.2. limitation
- A device ID can only be associated with a single login ID, when in fact a device may be used by multiple users.
- Once a device ID is associated with a login ID or a login ID is associated with a device ID, it cannot be removed (automatically removed). In fact, the dynamic relationship between device ID and login ID should be more reasonable.
1.3.3.3. Implementation method
The implementation method of the client and server is exactly the same as scheme 2, and the processing behavior of the divine strategy server will be different:
- The login id of an associated device can still be associated with the new device ID and stored in the new field $device_id_list in the users table.
- The routine task reads the list of ids that need to be fixed in the users table each day, which is $device_id_list. Read past 2days of all events data, find the data that needs to be repaired. Change the user_id field to be consistent with the id field in the profile table.
1.3.3.4. Case
Detailed steps are described as follows:
1. A user installs an App on a Xiaomi phone and performs a series of operations. The device ID generated by the SDK is X, and the distinct_id sent is X, corresponding to the assigned unique ID is 1. Store policy ID 1 and device ID X in the id, first_id field of the users table.
2. The user has registered and logged in, and its login ID is A. Here, the login (client) or track_signup interface of SDK is invoked, and the device ID X is successfully associated with login ID A. And store the login ID A in the second_id field of the users table, with the wizard ID still being 1.
3. After the user logs in, the user performs A set of operations to send the distinct_id to A, and the distinctID to 1.
4. The user logs out and performs A series of operations. The SDK does not invoke any method to send A distinct_id to identify the current user (because login ID A is bound to DistinctID 1).
5. The user gave the mobile phone to A friend, and the friend used his own account (registered but not connected to the Sensors system) to log in to device X, and the login ID was B. At this time, Shence SDK tried to associate device ID X with login ID B, but X was already associated with A, so the association failed. At the same time, a new ID 2 will be assigned to identify the user, and the login ID B will be stored in the first_id and second_id fields of the users table at the same time. (The user's friend account has not been associated with other devices before, and the first login device association fails.) The login ID is also recorded on first_id).
6. After that, the user's friend uses account B to perform a series of operations on device X. The application distinct_id of B uses DistinctID 2 to identify the user (because login ID B is bound with DistinctID 2).
7. The user changes to a new iPhone and performs a series of operations. Since the user has not logged in yet, distinct_id is sent with a new device ID Y. The distinct_id is Y, corresponding to the assigned Sensors ID 3. Store the ID 3 and device ID Y in the id, first_id field of the users table.
8. When the user logs in with account A on the iPhone, Shenze associates device ID Y with login ID A. The association is successful, and the corresponding Sensors ID is still 1. Also add the device ID Y to the $device_id_list field of ID 1 in the users table.
9. After the user logs in, the distinct_id sent by the user is set to A. Therefore, the distinct_id specified by the user is still marked with a distinctID 1.
Subsequent fixes are as follows:
- Because device Y is associated with login ID A, restore data on device Y before login: Shenze ID 3 -> Shenze ID 1. Note that for the data to be repaired, a new parquet file is generated with the new user_id. The repaired file is not modified for the time being, only the index is marked which data has been invalid in the source file.
- At the same time, merge the user attributes of the user ID 3 in the users table to the user ID 1, and delete the data of the user ID 3 in the users table. During attribute merging, if the attribute of the user with Shenze ID 1 has a value, the value of the attribute is not modified. If the attribute of the user with Shenze ID 1 has no value, and the attribute of the user with Shenze ID 3 has a value, the corresponding value is merged with the user with Shenze ID 1, and the data of Shenze ID 3 in the users table is deleted.
In the above cases, cross-device user penetration is truly realized, and the behavioral penetration problem before changing mobile phone login in Scheme 2 is solved through repair, but there are still limitations:
- A device can only be associated with one login ID. After a user sends an old mobile phone to a friend, the old mobile phone has been associated with its own login ID and cannot be associated with the friend's login ID. Subsequent users of the old phone will be identified as the same user if they do not log in (the login ID that the old phone was successfully associated with).
- In fact, subsequent anonymous logins on older phones are difficult to identify, and it may be more reasonable to attribute the user who recently logged in before the anonymous login.
1.4. Scheme comparison
When the above three schemes are put together, the differences between the three schemes can be clearly seen, as shown in the following table:
Event Number | Event | Sensors ID_Option 1 | Sensors ID_Option 2 | Sensors ID_Option 3 |
---|---|---|---|---|
1 | Install App | 1 | 1 | 1 |
2 | Login App | 1 | 1 | 1 |
3 | Use App | 1 | 1 | 1 |
4 | Use App | 1 | 1 | 1 |
5 | Login App | 1 | 2 | 2 |
6 | App Usage | 1 | 2 | 2 |
7 | App Usage | 2 | 3 | 3->1 |
8 | App Usage | 2 | 1 | 1 |
9 | App Usage | 2 | 1 | 1 |
- Solution 1: Only use the device ID, regardless of who the user is, as long as the device remains unchanged (the device ID remains unchanged), the Senss ID remains unchanged.
- Solution 2: Associate the device ID with the login ID (one-to-one),
- When a user changes their phone, the behavior after logging in is consistent with the behavior before changing the phone, but the behavior before the first login on the new device still cannot be associated and is still recognized as the behavior of a new user.
- When a user gives the old phone to a friend, since the old phone has been associated with their own login ID, it cannot be associated with the friend's login ID. Users who subsequently use this old phone and operate without logging in will all be recognized as the same user.
- Solution 3: Associate the device ID with the login ID (many-to-one)
- When a user gives the old phone to a friend, since the old phone has been associated with their own login ID, it cannot be associated with the friend's login ID. Users who subsequently use this old phone and operate without logging in will all be recognized as the same user.
- In fact, it is difficult to identify who is the anonymous login on the old phone. It may be more reasonable to attribute the behavior to the user who logged in last before the anonymous login.
In fact, there is no right or wrong solution among the three. It is recommended that customers choose the appropriate solution based on the application scenario and the complexity of data collection.
Accurately identifying users is actually a complex issue. Senss has always been committed to seeking more reasonable and accurate methods to meet various application scenarios.
1.5. Frequently Asked Questions
1.5.1. $is_login_id Parameter Explanation
Common use cases: importing historical data such as events or users data, track or profile interface of server-side SDK.
Meaning of attribute value: true means that the value of the distinct_id field in this data is a real ID (such as the customer's business ID). If this real ID is not associated with the device ID before this data enters the database (for the binding operation, the front-end can refer to the login method, using the JavaScript SDK as an example, call sensor.login(userid); to identify the real user after successful login or registration. For the server-side, you can refer to the tracksignup method, using the Java SDK as an example, for importing historical data, refer to the tracksignup interface in this document.), this real ID will be self-associated. There are two situations:
- If multiple associations are enabled, the new device ID can still be associated with this self-associated real ID.
- If multiple associations are not enabled, which means the real ID cannot be reassociated with the device ID.
Meaning of attribute value: false means that the value of the distinct_id field in this data is a device ID (the ID that identifies the user before customer registration or login), and this device ID can be associated with a real ID in the future.
Note: The content of this document is a technical document that provides details on how to use the Sensors product and does not include sales terms; the specific content of enterprise procurement products and technical services shall be subject to the commercial procurement contract.