DAX Engines – The Formula Engine and The Storage Engine

0
57
dax-engines
It is not that difficult as it seems!

In this blog, we are going to discuss DAX engines. But first, let us gain some knowledge about the DAX.

Data Analysis Expressions is a formula expression language used in Power Pivot in Excel, Power BI, and Analysis Services. It performs advanced calculations and queries on data by including functions, operators, and values.

DAX Engines

The DAX is powered by two internal engines. The formula engine and the storage engine. By collaborating, these two engines evaluate DAX queries and combine and compress raw data.

It is critical to understand the distinction between these two engines. Each of them is designed for different purposes. Appropriately using the engine and your knowledge will provide you with faster and more accurate results. Let’s study and compare these engines based on their different qualities.

The following table contains all the differences between the formula engine and the storage engine on a different basis.

Formula Engine vs Storage Engine

Basis The Formula EngineThe Storage Engine
DefinitionThe higher-level execution unit of the query engine is known as the formula engine.The engine that is in charge of data retrieval is known as the storage engine.
NameThe formula engine is called DirectQuery.The storage engine is called VertiPaq (also called xVelocity)
Expression TypeThe formula engine is a complex expression.The storage engine is a simple expression.
Data UsedIt uses an uncompressed data cache from SE.It scans compressed data in the memory.
MultithreadedNo. It is single-threaded.Yes. It contains one thread per segment, per column.
Cached Results No, it does not give any cached results.Yes, it does give cached results.
OptimizationThe formula engine is optimized for complexity.The storage engine is optimized for speed.
SpeedThe formula engine is slow.The storage engine is fast.

 

From the table presented above, the following points can be concluded.

  1. The storage engine handles simple requests. As a result, it is faster than the formula engine. Also, a storage engine is capable of doing many tasks by itself. For example, solving simple queries that require SUM, MIN, or MAX.
  1. The formula engine can handle very complex functions. The number of formula engines in your expression depends on the complexity of your query. You will have a large number of formula engines in your expression if you are dealing with a very complex query.
  1. The storage engine uses a compressed data cache. On the other hand, the formula engine uses an uncompressed data cache from the storage engine. Compressed data is easy to manage, whereas uncompressed data takes a lot of time and memory.
  1. The storage engine is faster than the formula engine. The reason behind this is that the storage engine uses more than one core at a time. In other words, the storage engine is a multi-threaded engine. It is capable of scanning the table segments based on the number of cores one has. Moreover, the formula engine can only do one core at a time.
  1. The answer produced by your measure using the storage engine gets stored in the memory. This is not the case with the formula engine. The answer produced by your measure using the formula engine does not get stored in the memory.
  1. The formula engine processes the request for a query plan and then generates and executes one. On the other hand, the storage engine answers the request made by the formula engine by retrieving data from the tabular model.
  1. The storage engine periodically refreshes the copy of data present in the memory from the data source. On the other hand, the formula engine directly forwards every requested query to the original data source.

This was all about the differences between the formula engine and the storage engine. Let us now discuss their tasks individually.

Tasks of the formula engine

The formula engine is known as DirectQuery. It basically converts the DAX query into a query plan. For this, it has to execute many physical steps. The Formula engine executes specific operations in such a way that they all have a corresponding step in the query plan.

Filtering with complex conditions, joins between the tables, aggregation, and lookups are some typical operators of the formula engine. These operators sometimes require data from the columns in the data model. In situations like these, the formula engine sends a request to the storage engine. The storage engine further provides an answer by returning a data cache.

Additional information

The storage engine creates a temporary storage area known as the datacache. They contain the result of the storage engine query. Datacaches are not compressed. They are plain in-memory tables. Regardless of the storage engine that they come from, datacaches are stored in an uncompressed format.

As mentioned in the above table, the formula engine is single-threaded. Hence, regardless of the number of cores available, any operation in the formula engine only uses a single thread or a single core. Also, the formula engine only sends a single query at a time to the storage engine. There is a certain degree of parallelism available for each request made to the storage engine. It can take advantage of multiple cores at a time as it has a different architecture.

Tasks of the storage engine

The storage engine is known as VertiPaq. It is also referred to as xVelocity in the Power Pivot. However, VertiPaq is the commonly used name for the storage engine.

There are two main goals of the storage engine. First, it is subjected to a scan of the database. Second, it has to produce datacaches that are required by the formula engine.

Surprisingly, the storage engine is independent of DAX. The storage engine allows its own set of operators to execute queries exclusively. The set of operators in the storage engine might range from very limited to very rich. However, it completely depends on the kind of storage engine used.

Every table in a model can have a different storage engine technology depending upon the mode of datasets.

  • Import: The content of the table is stored by the VertiPaq engine.
  • DirectQuery: The content of the table is read from the data source at query time, and it is not stored in memory during data refresh.
  • Dual: During data refresh, the table is loaded in memory by the VertiPaq engine. But at query time, the table may also be read in DirectQuery mode with the most up-to-date information.

The storage engine might also feature a parallel implementation. As discussed in one of the above paragraphs, the storage engine receives a request from the formula engine. It receives sequential requests and sends them synchronously. As a result, the formula engine has to wait for the storage engine to finish the query before sending another one. Thus, the parallelism in the storage engine might tend to reduce due to a lack of parallelism in the formula engine.

Conclusion

This was all about the storage engine and the formula engine. We don’t think that we need to answer which one is better. It’s quite clear from the above information. What are your thoughts? Let us know in the comment section.

Looking For DAX Training for you or your employees? Kindly Connect at partner@edu4sure.com or call us at +91-9555115533.

Previous articleHow To Create Quality Content?
Next articleTop 21 Differences Between Measure & Calculated Column
A learner, dreamer, and passionate about modern tools and technology like Excel 365, Power BI, SQL, HR Analytics, Six Sigma, WordPress and Visuals to name a few. Total Experience 15+ years including more than a decade of experience in S&P Global and ongoing 4+ years of experience with fast-growing startups & few MNCs. Clients served: Startups, MSMEs to a few of the Big 4 Consulting firms. Trained more than 2500 professionals in the last 4 years on various skills (Analytics, Content, WordPress, Finance, Entrepreneurship, etc). Last but the most important - mother of an awesome kid!

LEAVE A REPLY

Please enter your comment!
Please enter your name here