What is Microsoft Fabric? A big tech stack for big data

Martin Heller is a contributing editor and reviewer for InfoWorld. Formerly a web and Windows programming consultant, he developed databases, software, and websites from 1986 to 2010. More recently, he has served as VP of technology and education at Alpha Software and chairman and CEO at Tubifi. The Fabric Real-Time Analytics sample gallery currently offers half https://remotemode.net/ a dozen examples, with data sizes ranging from 60 MB for weather analytics to almost 1 GB for New York taxi rides. Real-Time Analytics and Azure Data Explorer use Kusto Query Language (KQL) databases and queries. Querying data in Kusto is much faster than the transactional RDBMS, such as SQL Server, especially when the data size grows to billions of rows.

The total run time for the fitting and predictions was 147 seconds, not quite three minutes. Most of what I discussed in the OneLake section above actually falls under data engineering. Data Engineering in Microsoft Fabric includes the lakehouse, Apache Spark job definitions, notebooks (in Python, R, Scala, and SQL), and data pipelines (discussed in the Data Factory section above). Getting from there to having tables in the lakehouse can (currently) be more work than you might expect. You would think that the Load to Tables pop-up menu item would do the job, but it failed for my initial tests.

Technical Support Engineer

I eventually discovered, with help from Microsoft Support, that the Load to Tables function doesn’t (as of this writing) know how to handle column titles with embedded spaces. All the competing lakehouses handle that without a hitch, but Fabric is still in preview. I am assured that this capability will be added in the released product. Microsoft Fabric is an end-to-end, software-as-a-service (SaaS) platform for data analytics. It is built around a data lake called OneLake, and brings together new and existing components from Microsoft Power BI, Azure Synapse, and Azure Data Factory into a single integrated environment.

network engineer microsoft

Get notified about new Microsoft Network Engineer jobs in United States.

Cloud Optical Network Engineer

Note the messages about the automatically created Power BI dataset at the top. Spark isn’t the only way to run SQL queries against the lakehouse tables. You can access any Delta-format table on OneLake via a SQL endpoint, which is created automatically when you deploy the lakehouse.

  • When you select a file, you get a three-dot menu for performing operations on that file, for example loading it into a table.
  • Data Science in Microsoft Fabric includes machine learning models, experiments, and notebooks.
  • Microsoft Fabric Data Engineering combines Apache Spark with Data Factory, allowing notebooks and Spark jobs to be scheduled and orchestrated.
  • That turns out to be another taxi trip dataset (from a different year), but this time factored into warehouse tables.
  • Here we are using Spark SQL to display the contents of a OneLake lakehouse table.

As you’ll see later, OneLake can host Synapse Data Warehouses as well as lakehouses. Data warehouses are best for users with T-SQL skills, although Spark users can also read data in warehouses. You can create shortcuts in OneLake so that lakehouses and data warehouses can access tables without duplicating data. Microsoft Fabric encompasses data movement, data storage, data engineering, data integration, data science, real-time analytics, and business intelligence, along with data security, governance, and compliance. In many ways, Fabric is Microsoft’s answer to Google Cloud Dataplex. I created a new warehouse and loaded it with Microsoft-provided sample data.

What is Microsoft Fabric? A big tech stack for big data

As a temporary workaround, you can copy your on-prem data to the cloud and load it from there. Here we are using Spark SQL to display the contents of a OneLake lakehouse table. I did get that conversion to work with cleaned-up CSV files. I was also able networking with windows server 2016 to run a Spark SQL query in a notebook against a new table. Unfortunately, Photoshop has the well-earned reputation of not only having a lot of power, but being a bear to learn. Whether Fabric will develop a similar reputation remains to be seen.

OneLake is built on Azure Data Lake Storage (ADLS) Gen2 and can support any type of file. It doesn’t matter whether the data was generated by Spark or SQL, it still goes into a single data lake in Delta format. Data Science in Microsoft Fabric includes machine learning models, experiments, and notebooks. I chose to run the time series forecasting model sample, which uses Python, the Prophet library (from Facebook), MLflow, and the Fabric Autologging feature.

Scroll to Top
GDPR Cookie Consent with Real Cookie Banner