Recent years have seen the emergence of self-service and data visualization vendors such as Tableau, Power BI and Qlik Sense that has forced enterprise BI players such as MicroStrategy to improvise their tools to compete with these vendors. I have been a solutions architect in the field of BI been using MicroStrategy for more than 10 years. From my perspective as an architect, I appreciate the power of the underlying SQL engine in MicroStrategy and do not foresee self-service vendors replacing MicroStrategy as an enterprise BI tool in the coming years. What thrills me most as a Solutions Architect in Microstrategy is implementing workaround solutions to meet complex reporting requirements. In other words, I am building additional logic to change the default behavior of the MicroStrategy SQL engine. One example that comes to mind is the effective use of Cartesian joins of dimension tables in certain advanced reporting scenarios. As per conventional best practices, it is not recommended to use Cartesian joins in MicroStrategy. However, there are certain business cases where the execution of Cartesian joins is required. In this blog, I have illustrated the use of Cartesian joins of dimension tables using two scenarios.
In this scenario, I used a public data set of construction spending data across educational and religious sectors for the year 2016. I massaged this data set further by deleting construction spending data of religious sector for the month May 2016. Shown below is a simple bar chart visualization displaying this scenario.
As seen in this graph, since there is no data in religious sector for May 2016, the bars are more spread apart in comparison with the educational sector data. Moreover, reference line calculations such mean, median ignore the fact that there was no construction spending data in the religious sector for May 2016, thereby misleading the audience. This issue becomes evident if the bar chart is converted to a line graph.
Using this graph, we may wrongly conclude that construction spending was consistently high during the middle of the year and slows down during the start and end of year.
This issue can be resolved by entering dummy data in the fact table to ensure all months appear across any other dimension during time series analysis. Nevertheless, not all organizations provide flexibility to manually enter fictitious data in the data warehouse to satisfy certain reporting scenarios. In such cases we may have to implement workaround solutions within the reporting layer to handle these complex issues.
In order to resolve this issue using MicroStrategy, I enabled the VLDB property- ‘Cartesian Join Warning’ to ‘Execute’. I also created an additional dummy metric that serves the same purpose as introduction of dummy data within the data warehouse. The sql statement for this metric is shown below.
CREATE TEMPORARY TABLE ZZMD00 as
select a12.month_key month_key,
a11.sector_key sector_key,
sum(a11.spend_amt) WJXBFS1
from f_construction_spending a11
join d_date a12
on (a11.calendar_date = a12.calendar_date)
group by a12.month_key,
a11.sector_key
CREATE TEMPORARY TABLE ZZMD01 as
select a11.month_key month_key,
a12.sector_key sector_key,
max(a11.month_key) WJXBFS1
from d_month a11
cross join d_sector a12
where a11.month_key between 201601 and 201612
group by a11.month_key,
a12.sector_key
select coalesce(pa11.month_key, pa12.month_key) month_key,
max(a14.month_short_desc) month_short_desc,
coalesce(pa11.sector_key, pa12.sector_key) sector_key,
max(a13.category) category,
max(pa11.WJXBFS1) WJXBFS1,
max(pa12.WJXBFS1) WJXBFS2
from ZZMD00 pa11
full outer join ZZMD01 pa12
on (pa11.sector_key = pa12.sector_key and
pa11.month_key = pa12.month_key)
join d_sector a13
on (coalesce(pa11.sector_key, pa12.sector_key) = a13.sector_key)
join d_month a14
on (coalesce(pa11.month_key, pa12.month_key) = a14.month_key)
group by coalesce(pa11.month_key, pa12.month_key),
coalesce(pa11.sector_key, pa12.sector_key)
drop table ZZMD00
drop table ZZMD01
This highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sector with an additional filter restricting data to 2016. This sql pass thereby ensures there is a row for all months of 2016 for all sectors irrespective of the actual construction spending data.
The revised bar chart and line chart with the introduction of this metric is shown below.
As seen in the graphs, the revised average line considers the month of May 2016, dropping the average from 311 to 285.
For this scenario, let us consider a hypothetical example of ‘Sales Commissions data’. A sales man ‘TOM’ belongs to the ‘CENTRAL’ region. On Jan 15, 2017, he moved from ‘CENTRAL’ region to ‘SOUTH EAST’ region. The dimension data corresponding to ‘Tom’ is shown below.
As seen above, Type 2 strategy has been used to maintain history for the ‘Sales Rep’ dimension.
The Commissions fact data for this sales rep is shown below
A simple report to show distribution of commissions across regions has been shown below.
Let us make this scenario a bit more complicated. What if the requirement for the month-end report is to allocate all commissions for each month to the region where the sales rep was assigned at the end of the month? To satisfy this requirement, I had to implement a workaround solution in which execution of Cartesian Joins was enabled. In this solution, the month-end region had to be created as a metric. An alias table for the Type 2 dimension was also created. The sql is shown below.
CREATE TEMPORARY TABLE ZZMD00 as
select a11.sls_rep_id sls_rep_id,
a12.month_key month_key,
sum(a11.comm_amt) WJXBFS1
from f_commissions a11
join d_date a12
on (a11.earnings_date = a12.calendar_date)
where a12.month_key = 201701
group by a11.sls_rep_id,
a12.month_key
CREATE TEMPORARY TABLE ZZMD01 as
select a11.sls_rep_id sls_rep_id,
a12.month_key month_key,
max(a11.sls_rep_nm) WJXBFS1,
max(a11.sls_rep_region) WJXBFS2
from d_salesrep a11
cross join d_month a12
where (((a11.sls_rep_key)
in (select r12.sls_rep_key
from d_month r11
cross join d_salesrep r12
where r11.month_end_date between r12.effective_start_dttm and r12.effective_end_dttm
group by r12.sls_rep_key))
and a12.month_key = 201701)
group by a11.sls_rep_id,
a12.month_key
select coalesce(pa11.sls_rep_id, pa12.sls_rep_id) sls_rep_id,
coalesce(pa11.month_key, pa12.month_key) month_key,
a14.month_short_desc month_short_desc,
pa11.WJXBFS1 WJXBFS1,
pa12.WJXBFS1 WJXBFS2,
pa12.WJXBFS2 WJXBFS3
from ZZMD00 pa11
full outer join ZZMD01 pa12
on (pa11.sls_rep_id = pa12.sls_rep_id and
pa11.month_key = pa12.month_key)
join d_month a14
on (coalesce(pa11.month_key, pa12.month_key) = a14.month_key)
drop table ZZMD00
drop table ZZMD01
The highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sales Rep identifying all sales rep versions that are active for the date- ‘201701’.
The revised report results is shown below.
As seen in both the scenarios, execution of Cartesian joins can prove to be very useful to meet complex reporting scenarios. What was used in both these scenarios was essentially a non-equi join with additional where clauses. It is recommended though that enabling execution of Cartesian joins be enabled only for certain complex reports. Enabling this setting at an overall project level can result in incorrect results and performance degradation especially when the Cartesian joins are implemented without any filters.