Central Services Portal is in Beta. We need your feedback to ensure the best possible user experience. Any feedback you provide will help us help you! Please use the feedback tool at the bottom of any page.

Search Icon

Central Services Portal is in Beta. We need your feedback to ensure the best possible user experience. Any feedback you provide will help us help you! Please use the feedback tool at the bottom of any page.

Query Fabric

What is it?

Query Fabric is a query engine that allows you to access data across multiple systems in a fast, uniform, and SQL-compliant manner. It is a distributed and highly parallel platform that enables querying at scale. Query Fabric also simplifies data formats across multiple technologies to easily analyze data with ANSI SQL.

The platform is built to support the data analyst community using ad-hoc queries within the dx Data Foundry. Specifically, Query Fabric is connected to data stored in the NDW, dx Cloud Cloud Data Lake (AWS S3), dx On-Prem Data Lake (MinIO), and supports other Comcast data technologies such as SQL Server, MySQL, Postgres, and MongoDB. 

Query Fabric is powered by Trino. Trino is the underlying query engine (driver) that allows users to query data where it lives. A single query can combine data from multiple sources, allowing for analytics across the entire organization. Trino does not store data; it makes accessing data stored elsewhere faster and easier. It is designed to handle data warehousing and analytics: data analysis, aggregating substantial amounts of data, and producing reports. Trino’s speed is a result of in-memory cluster computing. When running a query, Trino automatically breaks it into several smaller jobs, sends those smaller pieces to workers, and then retrieves and reassembles the results without storing the data on a disk.

Who is it for?

Query Fabric offers a simplified, single endpoint to access and join data across multiple systems. It connects easily to tools like DBeaver, Tableau, and Python. The platform removes the need to copy data across systems for analysis. Additionally, Query Fabric leverages ANSI SQL syntax that reduces the need-to-know multiple flavors of SQL, based on their specific technology. Finally, Query Fabric is fully supported with 24/7 operational support if issues are encountered.

How to use it?

Query Fabric is intended for low-to-medium complexity ad-hoc queries that run in a late stage of a data pipeline. All data within the dx Data Foundry are automatically available via Query Fabric.

When shouldn’t I use Query Fabric?

You should not use Query Fabric as an ETL tool or with overly complex queries. If unsure, please get in touch with the Query Fabric team.

Considerations When Using Query Fabric

Query Fabric is not a general-purpose relational database. It does not store data. Query Fabric does not replace databases like MySQL, PostgreSQL, or Oracle.

Query Fabric is designed for analytical queries, not OLTP (Online Transaction Processing) type use cases. It is intended for analytical use cases where queries are run in series or in small batches and where a response time of several seconds to minutes is acceptable.

Queries that involve large tables or datasets (i.e. several hundred GBs), may cause Query Fabric to fail your query with an “out of memory” error, as Query Fabric keeps data in memory to maximize performance. To avoid this, bucketing or partitioning tables will allow you to query much larger datasets without issue.

If the target system is on a delayed network, Query Fabric may overwhelm that target with data. This could cause a buffer overflow, network congestion, incomplete results, or other undesirable effects.

For queries that join on several large tables, it is recommended to query them directly on the source system if possible.

It is recommended that Hive view be queried directly when possible. If necessary, a view can be created that mimics the behavior of a Hive view. For more information on creating Query Fabric views, visit Presto View Creation.

Query Fabric can access data from systems primarily intended for OLTP use cases for sake of reporting and other use cases that are less sensitive to response times. For more information, visit this link.

wpChatIcon
wpChatIcon
Provide Feedback