Here are 150 public repositories matching this topic "datalake"
Repository
Created on January 19, 2019, 6:38 am
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Last updated on February 7, 2023, 4:20 pm
Repository
Created on June 9, 2021, 7:11 am
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Last updated on February 7, 2023, 7:35 am
Repository
Created on December 14, 2016, 3:53 pm
hudi
apachehudi
datalake
bigdata
apachespark
incremental-processing
stream-processing
data-integration
apacheflink
Upserts, Deletes And Incremental Processing on Big Data.
Last updated on February 7, 2023, 2:24 am
Repository
Created on September 21, 2018, 3:24 pm
Apache Doris Website
Last updated on January 31, 2023, 5:05 pm
Repository
Created on September 12, 2019, 11:46 am
data-engineering
data-versioning
go
object-storage
data-lake
aws-s3
data-quality
azure-blob-storage
google-cloud-storage
golang
lakeFS - Data version control for your data lake | Git for data
Last updated on February 7, 2023, 12:31 pm
Repository
Created on July 14, 2022, 1:54 am
Arctic is a streaming lake warehouse service open sourced by NetEase
Last updated on February 7, 2023, 12:02 pm
Repository
Created on August 9, 2019, 6:17 am
datasets
deep-learning
machine-learning
data-science
pytorch
tensorflow
data-version-control
python
ai
ml
Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai
Last updated on February 7, 2023, 10:52 am
Repository
Created on October 8, 2020, 6:49 pm
A Data Platform built for AWS, powered by Kubernetes.
Last updated on January 10, 2023, 4:14 pm
Repository
Created on November 2, 2022, 8:59 pm
Repository intended to upload the codes challenges and notes, through the path of the bootcamp
Last updated on February 6, 2023, 12:22 pm
Repository
Created on September 13, 2022, 8:18 am
Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb
Last updated on November 28, 2022, 7:55 pm
Repository
Created on May 19, 2020, 3:54 am
Ballerina Websub module.
Last updated on January 31, 2023, 6:12 pm
Repository
Created on March 8, 2021, 3:59 am
Awesome list for datapipeline
Last updated on February 6, 2023, 2:40 am
Repository
Created on December 28, 2021, 8:53 am
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Last updated on February 7, 2023, 3:29 am
Repository
Created on December 11, 2019, 2:19 am
hudi
apachehudi
apache
hudi-resources
datalake
bigdata
stream-processing
incremental-processing
data-integration
汇总Apache Hudi相关资料
Last updated on February 7, 2023, 8:22 am
Repository
Created on November 17, 2022, 1:48 pm
airflow
arima-model
data-science
datalake
pipeline
prophet
pyspark
sentiment-analysis
twitter-api
coronavirus
Segundo bootcamp da dados do curso da Blue Edtech
Last updated on January 27, 2023, 4:42 pm
Repository
Created on January 29, 2023, 7:16 pm
Companion to my Linked In Learning 'Serverless Architecture' course
Last updated on February 5, 2023, 1:36 am
Repository
Created on October 24, 2018, 2:12 pm
Dynamic Conformance Engine
Last updated on February 1, 2023, 7:25 pm
Repository
Created on August 3, 2022, 2:17 am
Self-managed thirdparty dependencies for Apache Doris
Last updated on February 2, 2023, 3:17 am
Repository
Created on December 12, 2022, 9:41 am
Last updated on December 14, 2022, 2:14 pm
Repository
Created on August 17, 2021, 4:01 am
on demand databases deployment, varuios kinds, adding more as i use them!
Last updated on November 18, 2022, 2:49 pm
Repository
Created on September 27, 2019, 11:13 am
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Last updated on February 3, 2023, 1:32 pm
Repository
Created on August 25, 2021, 10:15 am
fuzzymatch
fuzzy-matching
deduplication
dedupe
masterdata
dataengineering
data-transformation
analytics-engineering
entity-resolution
identity-resolution
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Last updated on February 4, 2023, 8:38 am
Repository
Created on October 7, 2022, 1:10 pm
bigdata
bioinformatics
datalake
delta-lake
genomics-data
lakehouse
parquet
spark
spark-sql
datawarehousing
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Last updated on December 13, 2022, 4:47 pm
Repository
Created on June 8, 2020, 11:53 am
datalake
open-source
openstack-swift
apache
apache-airflow
database
software-development
software-design
architecture-design
datamanagement
Datalake
Last updated on February 3, 2023, 2:23 pm
Repository
Created on December 28, 2022, 5:14 pm
Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.
Last updated on January 20, 2023, 7:12 pm
Repository
Created on December 25, 2022, 9:48 am
apachespark
bigdata
bigdataproject
capstone-project
datacleaning
dataengineering
datalake
datamodeling
datapipeline
dataprocessing
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
Last updated on January 4, 2023, 3:40 pm
Repository
Created on October 11, 2022, 1:59 pm
Collection of data on Formula One Racing
Last updated on December 22, 2022, 6:26 am
Repository
Created on March 21, 2022, 9:47 pm
Data Pipeline from the Global Historical Climatology Network DataSet
Last updated on January 31, 2023, 4:07 pm
Repository
Created on March 25, 2021, 9:23 am
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Last updated on January 31, 2023, 1:16 am
Repository
Created on April 21, 2022, 7:45 pm
customer-insights
database
datalake
datawarehousing
python
travelling
instagram-api
data-visualization
The project aims at developing a traveller insight dashboard that can help Swiss Online Travel Agencies(OTAs) to improve conversion cross traveller’s digital decision making process
Last updated on October 19, 2022, 7:11 pm