Here are 150 public repositories matching this topic "datalake"
Repository Created on January 19, 2019, 6:38 am
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Last updated on February 7, 2023, 4:20 pm
Repository Created on June 9, 2021, 7:11 am
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Last updated on February 7, 2023, 7:35 am
Repository Created on December 14, 2016, 3:53 pm
Upserts, Deletes And Incremental Processing on Big Data.
Last updated on February 7, 2023, 2:24 am
Repository Created on September 12, 2019, 11:46 am
lakeFS - Data version control for your data lake | Git for data
Last updated on February 7, 2023, 12:31 pm
Repository Created on July 14, 2022, 1:54 am
Arctic is a streaming lake warehouse service open sourced by NetEase
Last updated on February 7, 2023, 12:02 pm
Repository Created on August 9, 2019, 6:17 am
Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai
Last updated on February 7, 2023, 10:52 am
Repository Created on October 8, 2020, 6:49 pm
A Data Platform built for AWS, powered by Kubernetes.
Last updated on January 10, 2023, 4:14 pm
Repository Created on November 2, 2022, 8:59 pm
Repository intended to upload the codes challenges and notes, through the path of the bootcamp
Last updated on February 6, 2023, 12:22 pm
Repository Created on September 13, 2022, 8:18 am
Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb
Last updated on November 28, 2022, 7:55 pm
Repository Created on May 19, 2020, 3:54 am
Ballerina Websub module.
Last updated on January 31, 2023, 6:12 pm
Repository Created on December 28, 2021, 8:53 am
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Last updated on February 7, 2023, 3:29 am
Repository Created on November 17, 2022, 1:48 pm
Segundo bootcamp da dados do curso da Blue Edtech
Last updated on January 27, 2023, 4:42 pm
Repository Created on January 29, 2023, 7:16 pm
Companion to my Linked In Learning 'Serverless Architecture' course
Last updated on February 5, 2023, 1:36 am
Repository Created on October 24, 2018, 2:12 pm
Dynamic Conformance Engine
Last updated on February 1, 2023, 7:25 pm
Repository Created on August 3, 2022, 2:17 am
Self-managed thirdparty dependencies for Apache Doris
Last updated on February 2, 2023, 3:17 am
Repository Created on August 17, 2021, 4:01 am
on demand databases deployment, varuios kinds, adding more as i use them!
Last updated on November 18, 2022, 2:49 pm
Repository Created on September 27, 2019, 11:13 am
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Last updated on February 3, 2023, 1:32 pm
Repository Created on August 25, 2021, 10:15 am
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Last updated on February 4, 2023, 8:38 am
Repository Created on October 7, 2022, 1:10 pm
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Last updated on December 13, 2022, 4:47 pm
Repository Created on December 28, 2022, 5:14 pm
Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.
Last updated on January 20, 2023, 7:12 pm
Repository Created on December 25, 2022, 9:48 am
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
Last updated on January 4, 2023, 3:40 pm
Repository Created on October 11, 2022, 1:59 pm
Collection of data on Formula One Racing
Last updated on December 22, 2022, 6:26 am
Repository Created on March 21, 2022, 9:47 pm
Data Pipeline from the Global Historical Climatology Network DataSet
Last updated on January 31, 2023, 4:07 pm
Repository Created on March 25, 2021, 9:23 am
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Last updated on January 31, 2023, 1:16 am
Repository Created on April 21, 2022, 7:45 pm
The project aims at developing a traveller insight dashboard that can help Swiss Online Travel Agencies(OTAs) to improve conversion cross traveller’s digital decision making process
Last updated on October 19, 2022, 7:11 pm