The content described in this document belongs to the advanced use of Sensors Analytics, involving more technical details and is suitable for reference by experienced users of relevant functions. If you have any doubts about the content of the document, please consult the Sensors Analytics team for one-on-one assistance.

Starting from version 1.11, Sensors Analytics has supported the virtual property function, and also supports the use of third-party dimension tables to further expand the already accessed events and properties. This function can greatly enhance the ability of Sensors Analytics to handle complex business needs.
Before version 1.17, creating virtual properties and dimension tables requires operations on the server where Sensors Analytics is deployed. Starting from version 1.17, virtual properties can be created within the metadata management.

Please ssh to any machine that has deployed Sensors Analytics, and execute the command to create virtual properties or dimension tables under the sa_cluster account.

1. Virtual Properties

The so-called virtual properties refer to the secondary processing of existing event and user properties through SQL expressions after data is stored, resulting in a new property value.

Note:

  1. The time and event in the events table are special fields. Currently, the secondary processing of the time field is supported. For specific examples, please refer to scenario 5. The secondary processing of the event field to extract virtual properties is not supported.
  2. The SQL expression for creating virtual properties cannot exceed 1024 characters.
  3. The virtual property name cannot be the same as the existing event property/user property in the metadata.

1.1. Scenario 1: Property Extraction

For example, we currently have an event property $url and we want to extract the attribute q= from $url as search_keyword for analysis. We can create a virtual property in the following way:

scaadmin external_view external_property add \ -p default \ -n search_keyword \ -c '搜索关键词' \ -e "parse_url(events.\$url, 'QUERY', 'q')" \ -t STRING
BASH

Here, -p is the project name of Sensors Analytics, -n is the English name of the external property in Sensors Analytics, -c is the Chinese name, -e is the corresponding SQL expression, and -t represents the data type.

We use the parse_url SQL function in the -e parameter to extract the parameter. In this way, the search_keyword property can be used for related analysis requirements.

1.2. Scenario 2: Property Merge

Suppose we have two properties item_id and item_id_1 in the event tracking, but in fact, they have the same meaning and need to be merged when used. We can also use the virtual property function to define it:

scaadmin external_view external_property add \ -p default \ -n new_item_id \ -c 'Item Id' \ -e "coalesce(events.item_id, events.item_id_1)" \ -t STRING
BASH

Here, coalesce is a standard SQL function that returns the first non-NULL value among all parameters. In this way, when used, we only need to use new_item_id for analysis to achieve the purpose of property merging.

1.3. Scenario 3: High-precision Decimal

By default, Sensors Analytics' NUMBER type only supports 3 decimal places after the decimal point. If you need to support high-precision types, you can use extended properties. The specific steps are as follows:

  • To support high-precision content, send the content that needs to support high precision as a string type to Sensors Analytics to avoid precision loss. Taking Java SDK as an example:
Map<String, Object> properties = new HashMap<String, Object>(); properties.put("big_number", "123.12312345"); sa.track(distinctId, true, "TestBigNumber", properties);
JAVA
  • Create a high-precision type of virtual property
# 这里假设传入的原始字符串属性为 big_number,创建的高精度属性为 big_number_decimal # decimal 参数中的 38 表示总的数据位数,16 表示小数点之后的位数 scaadmin external_view external_property add \ -p default \ -n big_number_decimal \ -c '高精度数字' \ -e 'cast(events.big_number as decimal(38,16))' \ -t number
BASH
  • Analyze using the big_number_decimal property

1.4. Scenario 4: Union Deduplication

In Sensors Analytics, we support calculating the deduplication number for a single property, but do not directly support deduplicating two or more properties. If there is such a need, it can be achieved by defining a new property whose value is the combination of the multiple properties that need to be deduplicated.

For example, if we want to calculate the deduplication count of different users viewing different products (i.e., the same user viewing the same product is not counted, but viewing different products needs to be counted), we can define a virtual property as follows:

scaadmin external_view external_property add \ -p default \ -n user_and_product \ -c '用户ID + 商品ID' \ -e "concat(cast(events.user_id as string), events.product_id)" \ -t STRING
BASH

Then in Sensors Analytics, you can view the deduplication count of this property to obtain the corresponding metric.

1.5. Scenario 5: Time Aggregation

By default, Sensors Analytics supports time aggregation methods such as day, week, and month. If you want other time aggregation methods, you can also use virtual properties. For example, if you want to analyze data according to "Week X", you can extract a day_of_week property from the time property:

scaadmin external_view external_property add \ -p default \ -n day_of_week \ -e 'dayofweek(time)' \ -t NUMBER
BASH

The value obtained by the dayofweek function is 1 for Sunday, 7 for Saturday, and 2-6 for Monday to Friday. Similarly, you can use the expression extract(hour from time) to extract the hour part from the time.

1.6. Scenario 6: User Attributes

Sensors Analytics supports extracting virtual properties based on user attributes in the users table. For example, to obtain the user's age attribute based on the user attribute Birthday, refer to the following example:

scaadmin external_view external_property add \ -p default \ -n user_age \ -u true \ -c "用户年龄" \ -e "cast (extract(year FROM now()) -extract(year FROM EPOCH_TO_TIMESTAMP(users.Birthday)) as int)" \ -t NUMBER
BASH

Note:

  • When creating a virtual property using user attributes, you need to add the -u parameter and indicate that the user attributes used in the expression are indeed user attributes.
  • Using user attributes in the user table and joining them with dimension tables is not supported.

1.7. More Applications

Since the -e parameter of the external_property add command supports any Impala SQL expression, virtual properties can be created flexibly based on actual business needs.

2. Dimension Tables

In addition to creating virtual properties directly based on tracked properties, we can also use third-party dimension tables to create more complex virtual properties.

Assuming we have a pay_order event in Sensing Analytics, and this event has properties such as product_id and product_name. Now, we want to use more dimensions of the product for analysis (such as product_manufacturer), but these dimensions have not been recorded in Sensing Analytics when the event was tracked. In this case, we can introduce dimension tables to meet this requirement.

2.1. Use the items table as the dimension table

Note: This feature is only supported in Sensory Analysis 1.14 and later versions.

If you want to enable the "items" table, you first need to report Item information through the itemSet interface provided by the SDK. Take Java SDK as an example:

Map<String, Object> properties = new LinkedHashMap<>(); properties.put("product_name", "iPhone 8"); properties.put("product_manufacturer", "Apple"); properties.put("price", 998.88); // 参数分别为 item_type、item_id、自定义属性 sensorsAnalytics.itemSet("product", "T12345", properties);
JAVA

Note: item_type can be used to distinguish different item types, such as movie, music, etc. item_id differentiates different items under the same item type.

Next, we can use this table in the Custom Query feature:

SELECT * FROM items LIMIT 10
SQL

Next, we establish the association between the "items" and "events" tables:

scaadmin external_view external_dimension_table add \ -p default \ -t items \ -e "events.product_id = items.item_id AND items.item_type = 'product'"
BASH

Note:

  1. Here, we use the product_id field for association and limit the item_type. If there are multiple item_type, then multiple associations need to be established. Please refer to Using a dimension table with different associations.
  2. The field associated with the dimension table must be a primary key. If it is not a primary key and there are duplicate data in the dimension table, it will cause an increase in the data queried in Sensory Analysis.

  3. When associating the "items" table, both item_id and item_type must be used for association. If only item_id is specified without specifying item_type and there are duplicate item_id, it will cause an increase in the data queried in Sensors Analytics.

Finally, use the product_manufacturer in the "items" table to create virtual properties:

scaadmin external_view external_property add \ -p default \ -n product_manufacturer \ -c '产品制造商' \ -e items.product_manufacturer \ -t STRING
BASH

Now, we can use the pay_order for analysis in all the analysis features of Sensory Systems, and see the product_manufacturer property and use this property for any analysis work.

If you need to delete, update, or perform other management operations on dimension tables and virtual properties, you can directly execute the command with no parameters to view the relevant help:

scaadmin external_view external_property 
BASH

2.2. Custom Dimension Table

In addition to referencing the "items" table, we can also manually create dimension tables.

1. Creating a new dimension table, execute the command::

scaadmin dimension_table create -p {项目英文名} -t {维度表名} -k {维度表主键} -c {维度表字段及类型} //sp 1.18+,sca 0.3.2+,低于该版本请切换左上方版本号到 2.2 以下版本
CODE

You can use the following command to view the version number

spadmin upgrader version
CODE


This command will create a new dimension table in the dimension table database. The writing format is as follows:

ParametersParameter DetailsExample
-p Project Name

-p default

Where "default" is the project name in English. When you are unable to determine the project name for creating a dimension table, refer to the screenshot below. The project name is the value after "project=" in the SensData URL.

-t

Dimension Table Name

Please customize the table name, which only supports English letters, underscores, numbers, and is recommended to start with an English letter. The total length should not exceed 100 characters.

-t product_info
-k

Dimension Table Primary Key

If there are multiple primary keys, separate them with commas.

-k product_id


-c

Dimension Table Fields and Field Types

Note to add quotes. The format is field_name field_type, separated by commas

-c 'product_id STRING, product_manufacturer STRING '

The supported data types are BIGINT, BINARY, BOOLEAN, INT, DATE, DECIMAL, REAL, FLOAT, INTEGER, SMALLINT, STRING, TINYINT, VARCHAR, TIMESTAMP

Execution instance reference:

scaadmin dimension_table create \ -p default \ -t product_info \ -k product_id \ -c "product_id STRING, product_manufacturer STRING"
CODE

After the execution is completed, a dimension table (kudu table) will be created. The name of the dimension table can be obtained from the return value, for example: dimension_table_default.product_info. If the execution fails, it may be because the current version does not support this feature. Please contact Sensertec technical support for confirmation.

Note: The dimension table here must be stored in the Kudu or HDFS Parquet file format in order to support all features.

To delete a dimension table, use:

scaadmin  dimension_table drop -t <表名> -p <项目名> 例如:scaadmin  dimension_table drop -t product_info -p default //默认如果存在表关联关系则不支持删除,如果需要强行删除,可加 --skip_check 参数,加上 --skip_check 参数之后,即使表存在关联关系,也仍然强制执行删除。
CODE

To list the dimension tables, use:

scaadmin  dimension_table list -p <项目名> 
CODE


2. After successfully creating the dimension table, such as dimension_table_default.product_info (dimension_table_default is the database name, product_table is the table name), you can view and add data using the following methods:

Use imapla-shell

impala-shell
CODE

View the created dimension table

use dimension_table_default; show tables;
CODE

You can see the product_info table we have created. Then, we need to prepare the data for this dimension table, which should usually be imported from other business databases or data warehouses. You can import SQL files using impala-shell or JDBC, and use various methods. For example, we can directly insert a few pieces of data:

INSERT INTO dimension_table_default.product_info VALUES ('124', 'Xiaomi'), ('123', 'Apple');
SQL

If you need to insert a large amount of data, it is recommended to use batch file import. First, create a text format table and specify the delimiter, for example:

CREATE TABLE dimensions.raw_csv_product_info ( 	product_id STRING, 	product_manufacturer STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED by ',' LOCATION '/tmp/raw_csv_product_info/';
SQL

Then upload the comma-separated text data file that has been prepared:

hdfs dfs -put data.csv /tmp/raw_csv_product_info/
BASH

Refresh the CSV table and execute INSERT to import the data into Kudu.

REFRESH dimensions.raw_csv_product_info; INSERT INTO dimension_table_default.product_info SELECT * FROM dimensions.raw_csv_product_info;
SQL

3. After preparing the data for the dimension table, add the dimension table to the Sensors system

scaadmin external_view external_dimension_table add \ -p default \ -t dimension_table_default.product_info \ -e 'events.product_id = dimension_table_default.product_info.product_id'
BASH

Where -p is the project name of the Sensertec system, -t parameter is the full name of the dimension table, and -e parameter represents the relationship between the dimension table and the event table (events), that is, the JOIN condition in SQL.

4. After defining the dimension table, we can add the specific fields of the dimension table (product_info) as virtual attributes to the Sensertec system

scaadmin external_view external_property add \ -p default \ -n product_manufacturer \ -c '产品制造商' \ -e dimension_table_default.product_info.product_manufacturer \ -t STRING
BASH

By doing this, when analyzing using pay_order in any analysis feature of the Sensertec system, the product_manufacturer attribute can be seen and used for any analysis work.

2.3. Using different association conditions for a dimension table

If the same dimension table needs to use different association conditions, aliases need to be used when adding a new dimension table. The specific way is to add #1, #2, #a, #b, etc. after the original table name. Note that the alias separator should use the number sign (#). For example:

scaadmin external_view external_dimension_table add \ -p default \ -t dimension_table_default.product_info#1 \ -e 'events.item_id = dimension_table_default.product_info#1.item_id'
BASH

After adding, you also need to use dimension_table_default.product_info#1 when referencing this table in other commands.

3. Limitations and Constraints

3.1. Query Performance

Due to the use of JOIN for associated dimension tables, although the query engine of Sensors Data has optimized this type of JOIN to a certain extent, there is still a significant performance decrease compared to directly using the original event properties. The specific performance depends on the size of the dimension table, JOIN conditions, and other factors. Therefore, we recommend not using dimension tables when the requirements can be met with event properties alone. At the same time, the number of rows in the dimension table should be within millions to minimize the additional performance impact of JOIN.

3.2. Cache Consistency

Up to now, the cache mechanism of Sensors Data is still based on the changes in event data, and this mechanism does not consider the impact of changes in the dimension table data. Therefore, if the data in the dimension table changes (such as Update or Insert), the query result may still use the old cached data. In this case, a forced refresh is required to obtain the correct result.

4. Common Issues

4.1. Invisible New Virtual Properties on the Page

After adding virtual properties, you may find that you cannot see the corresponding virtual properties in the attribute selection interface of certain events, but you can see them in custom queries. This is because by default, virtual properties are only bound to events that include the referenced properties.

For example, in Scenario 2, the virtual property "new_item_id" references "item_id" and "item_id_1", so the "new_item_id" property can only be seen in events that contain both of these properties.

Starting from version 1.14, you can specify a separate associative property when adding virtual properties, such as "item_name". In this way, the new "new_item_id" property can be seen in any event that contains "item_name". For example:

scaadmin external_view external_property add \ -p default \ -n new_item_id \ -c 'Item Id' \ -e "coalesce(events.item_id, events.item_id_1)" \ -r "item_name" \ -t STRING
BASH

Where -r is not filled in by default (In this case, virtual properties will only be bound to events that contain all referenced properties), when giving a specific attribute value after -r (Setting the associated attribute -r means that the property will be added to events that contain this attribute), when -r is empty (""); in this case, the property can be seen in all events.

4.2. Hiding Original Properties After Adding New Virtual Properties

For example, in the scenario of property merging, we merged "item_id" and "item_id_1" into "new_item_id". In this case, if you want to hide the old "item_id" and "item_id_1", you only need to hide them directly in metadata management. Hiding the referenced properties will not affect the normal operation of virtual properties.

4.3. How to Create a Date (DATETIME) Type of Virtual Property

When creating a virtual property with a time type, the expression (-e) used to create the virtual property should result in a bigint type timestamp, and then set the virtual property type (-t) as DATETIME type. For example, if there is an original numeric type attribute "oldtimestamp" that stores a timestamp, you can create a new datetime type attribute using virtual property for easy date comparison analysis. If the virtual property is set after uploading the dimension table, then the uploaded dimension table should already contain the timestamp.

scaadmin external_view external_property add \ -p default \ -n newdate \ -c 'newdate' \ -e "cast(events.oldtimestamp as bigint)" \ -t DATETIME
BASH

4.4. How to Create a Boolean (BOOL) Type of Virtual Property

When creating a virtual property with a Boolean type, the expression (-e) used to create the virtual property should result in a bigint type with a value of 0 or 1, and then set the virtual property type (-t) as BOOL type. For example, for all payment records, you can directly distinguish between "valid payments" and "invalid payments". "Invalid payments" refers to orders with a payment amount of 0 or null, while others are considered "valid payments".

scaadmin external_view external_property add \ -p default \ -n is_valueable_paid \ -c '是否有效支付' \ -e "cast((case when (events.ActualPaidAmount is not null or events.ActualPaidAmount=0) then '1' else '0' end) as bigint)" \ -t BOOL
BASH

4.5. How to Create a List Type of Virtual Property

The attribute type in the dimension table corresponds to the STRING type, and the virtual attribute is set to the LIST type. Example for creating virtual attributes of the LIST type:

Create dimension table, set list_name as STRING type:

CREATE DATABASE dimensions; CREATE TABLE dimensions.listceshi ( 	list_id STRING NOT NULL, 	list_name STRING NULL, 	... 	PRIMARY KEY (list_id) ) PARTITION BY HASH (product_id) PARTITIONS 3 STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='${kudu_master_host}:7051');
SQL

When passing values, use line break \n to separate multiple values of the LIST:

INSERT INTO dimensions.listceshi VALUES ('124', 'apple\nbanana\npeer'), ('123', 'apple\nbanana\nbeer')
SQL

After exiting Impala, establish the association between the dimension table and the events table using tools, and then create a LIST type virtual attribute:

scaadmin external_view external_property add \ -p default \ -n list_name \ -c '列表项目' \ -e dimensions.listceshi.list_name \ -t LIST
BASH

4.6. Test the association condition of the dimension table

Note: Available after version 1.14

For complex association conditions, it is recommended to first test them using impala-shell to ensure the accuracy of the results. JOIN syntax can be used directly, for example:

/*sa(test_project)*/ SELECT dimensions.product_info.product_manufacturer FROM events LEFT JOIN dimensions.product_info ON events.product_id = dimensions.product_info.product_id WHERE date = CURRENT_DATE() LIMIT 100
SQL

4.7. How to update virtual attributes/dimension tables

Update virtual attributes

scaadmin external_view external_property update \ -p default \ -n new_item_id \ -c 'Item Id' \ -e "coalesce(events.item_id, events.item_id_1)" \ -r "item_name" \ -t STRING
BASH

Where -p is the project name in the Sensors Analytics system, -n is the English name of the external attribute in the Sensors Analytics system, -c is the Chinese name, -e is the corresponding SQL expression, and -t indicates the data type.

Update dimension tables

scaadmin external_view external_dimension_table update \ -p default \ -t dimensions.product_info \ -e 'events.product_id = dimensions.product_info.product_id'
BASH

Where -p is the project name in the Sensors Analytics system, -t parameter is the complete name of the dimension table, -e parameter represents the association relationship between the dimension table and the event table (events), that is, the condition for JOIN in SQL.

4.8. How to delete virtual attributes/dimension tables

Delete virtual attributes

scaadmin external_view external_property remove \ -p default \ -n new_item_id 
BASH

Where -p is the project name in the Sensors Analytics system, -n is the English name of the external attribute in the Sensors Analytics system.

Delete dimension tables and the association relationship with the events table

Note

Before deleting the association between the dimension table and the events table, please delete the virtual attributes created using the dimension table.

You can use the following expression to directly view the virtual attributes of the associated dimension table in the server: scaadmin external_view external_property list -p default

scaadmin external_view external_dimension_table remove \ -p default \ -t dimensions.product_info
BASH

Where -p is the project name in the Sensors Analytics system, -t parameter is the complete name of the dimension table.