Here are 7,532 public repositories matching this topic "spark"
Repository Created on December 5, 2022, 4:32 pm
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Last updated on October 2, 2023, 4:38 am
Repository Created on May 14, 2020, 10:56 pm
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Last updated on October 1, 2023, 10:03 am
Repository Created on September 1, 2022, 10:40 am
Gateway into the John Snow Labs Ecosystem
Last updated on September 20, 2023, 7:28 am
Repository Created on March 3, 2014, 4:08 pm
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Last updated on October 2, 2023, 12:55 pm
Repository Created on December 2, 2014, 8:50 pm
Official code repository for GATK versions 4 and up
Last updated on October 2, 2023, 2:12 pm
Repository Created on January 22, 2023, 4:33 pm
This repo contains the Data Engineering exercises I took in Datacamp.
Last updated on May 10, 2023, 4:56 pm
Repository Created on June 2, 2016, 8:23 am
some personal code snippet to learn new programming skill
Last updated on August 1, 2023, 3:00 am
Repository Created on July 20, 2015, 10:15 pm
Compile-time Language Integrated Queries for Scala
Last updated on October 2, 2023, 9:02 am
Repository Created on February 25, 2014, 8:00 am
Apache Spark - A unified analytics engine for large-scale data processing
Last updated on October 2, 2023, 12:45 pm
Repository Created on August 28, 2015, 8:09 pm
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Last updated on September 26, 2023, 11:08 pm
Repository Created on March 1, 2021, 5:37 pm
Quill for Scala 3
Last updated on September 25, 2023, 2:05 pm
Repository Created on August 28, 2023, 2:01 pm
Data Lake project using AWS Services for a Data Engineering bootcamp
Last updated on October 2, 2023, 11:29 am
Repository Created on September 18, 2021, 5:20 am
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Last updated on September 15, 2023, 5:00 pm
Repository Created on March 3, 2016, 4:01 pm
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Last updated on September 29, 2023, 3:40 pm
Repository Created on August 15, 2022, 3:26 pm
spark - unified analytics engine for large-scale data processing
Last updated on October 1, 2023, 12:22 pm
Repository Created on May 16, 2022, 10:11 pm
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Last updated on October 2, 2023, 10:16 am
Repository Created on September 22, 2023, 7:17 am
Spark vs. MongoDB Atlas
Last updated on October 2, 2023, 9:55 am
Repository Created on December 15, 2017, 3:45 am
Spark data source for Cognite Data Fusion
Last updated on February 3, 2022, 12:41 am
Repository Created on April 9, 2020, 6:39 pm
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Last updated on September 27, 2023, 2:22 pm
Repository Created on August 10, 2017, 12:13 pm
Apache Doris is an easy-to-use, high performance and unified analytics database.
Last updated on October 2, 2023, 1:13 pm
Repository Created on October 13, 2021, 7:10 am
REST API for Apache Spark on K8S or YARN
Last updated on September 6, 2023, 10:06 am
Repository Created on April 22, 2019, 6:56 pm
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Last updated on October 2, 2023, 2:49 pm
Repository Created on September 21, 2023, 3:00 am
A real-time cryptocurrency data streaming pipeline.
Last updated on September 26, 2023, 3:54 pm
Repository Created on November 30, 2019, 12:02 pm
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Last updated on October 2, 2023, 7:05 am
Repository Created on October 11, 2019, 4:14 pm
Vijayaraaghavan Manoharan Website
Last updated on July 22, 2023, 3:21 pm
Repository Created on August 4, 2015, 6:32 am
Code Library for My Blog
Last updated on September 26, 2023, 7:51 am
Repository Created on December 6, 2022, 5:41 pm
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Last updated on September 8, 2023, 7:11 pm