Here are 7,532 public repositories matching this topic "spark"
Repository
Created on December 5, 2022, 4:32 pm
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Last updated on October 2, 2023, 4:38 am
Repository
Created on May 14, 2020, 10:56 pm
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Last updated on October 1, 2023, 10:03 am
Repository
Created on September 1, 2022, 10:40 am
Gateway into the John Snow Labs Ecosystem
Last updated on September 20, 2023, 7:28 am
Repository
Created on March 3, 2014, 4:08 pm
h2o
machine-learning
data-science
deep-learning
big-data
ensemble-learning
gbm
random-forest
naive-bayes
pca
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Last updated on October 2, 2023, 12:55 pm
Repository
Created on December 2, 2014, 8:50 pm
Official code repository for GATK versions 4 and up
Last updated on October 2, 2023, 2:12 pm
Repository
Created on January 22, 2023, 4:33 pm
This repo contains the Data Engineering exercises I took in Datacamp.
Last updated on May 10, 2023, 4:56 pm
Repository
Created on June 2, 2016, 8:23 am
some personal code snippet to learn new programming skill
Last updated on August 1, 2023, 3:00 am
Repository
Created on August 28, 2015, 8:09 pm
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Last updated on September 26, 2023, 11:08 pm
Repository
Created on July 11, 2018, 9:55 pm
Lecture: Big Data
Last updated on August 16, 2023, 3:39 pm
Repository
Created on March 1, 2021, 5:37 pm
Quill for Scala 3
Last updated on September 25, 2023, 2:05 pm
Repository
Created on September 24, 2017, 7:36 pm
nlp
natural-language-processing
spark
spark-ml
pyspark
named-entity-recognition
sentiment-analysis
lemmatizer
spell-checker
entity-extraction
State of the Art Natural Language Processing
Last updated on October 2, 2023, 1:56 pm
Repository
Created on August 28, 2023, 2:01 pm
Data Lake project using AWS Services for a Data Engineering bootcamp
Last updated on October 2, 2023, 11:29 am
Repository
Created on September 18, 2021, 5:20 am
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Last updated on September 15, 2023, 5:00 pm
Repository
Created on March 3, 2016, 4:01 pm
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Last updated on September 29, 2023, 3:40 pm
Repository
Created on August 15, 2022, 3:26 pm
spark - unified analytics engine for large-scale data processing
Last updated on October 1, 2023, 12:22 pm
Repository
Created on May 16, 2022, 10:11 pm
machine-learning
artificial-intelligence
data
data-engineering
data-science
python
elt
etl
pipelines
data-pipelines
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Last updated on October 2, 2023, 10:16 am
Repository
Created on September 22, 2023, 7:17 am
Spark vs. MongoDB Atlas
Last updated on October 2, 2023, 9:55 am
Repository
Created on December 15, 2017, 3:45 am
Spark data source for Cognite Data Fusion
Last updated on February 3, 2022, 12:41 am
Repository
Created on April 9, 2020, 6:39 pm
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Last updated on September 27, 2023, 2:22 pm
Repository
Created on October 13, 2021, 7:10 am
REST API for Apache Spark on K8S or YARN
Last updated on September 6, 2023, 10:06 am
Repository
Created on April 22, 2019, 6:56 pm
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Last updated on October 2, 2023, 2:49 pm
Repository
Created on September 21, 2023, 3:00 am
A real-time cryptocurrency data streaming pipeline.
Last updated on September 26, 2023, 3:54 pm
Repository
Created on October 11, 2019, 4:14 pm
Vijayaraaghavan Manoharan Website
Last updated on July 22, 2023, 3:21 pm
Repository
Created on August 4, 2015, 6:32 am
Code Library for My Blog
Last updated on September 26, 2023, 7:51 am
Repository
Created on December 23, 2020, 1:25 pm
🏆 Spark4You Design patterns
Last updated on July 18, 2023, 9:13 am
Repository
Created on December 6, 2022, 5:41 pm
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Last updated on September 8, 2023, 7:11 pm