mcdd-big-data-study

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)

MIT License

Stars
2
Committers
2

Features ✨

Supported Technologies:

  • Hadoop 3.3.6 (with JDK 8.0.352-zulu, Maven 3.6.3)
    • Zookeeper 3.9.2
    • Kafka 2.12-3.7.1

Installation 📦

  1. Clone the repository:
    git clone https://github.com/mcddhub/mcdd-big-data-study.git --depth=1 && cd mcdd-big-data-study
    
    1. Build the Docker image:
      cd docker
      docker build -t caobaoqi1029/big-data-study:x.x.x .
      

Note: Replace x.x.x with the appropriate version number.

  1. Start the containers:
    docker compose up -d
    

Configuration 🛠

  1. Connect to the remote server via VS Code and attach to a running container.
  1. Install the Java Dev extension in VS Code.
  1. Restart the extension host to apply changes.
  1. Initialize Hadoop environment:
    docker exec -it master bash
    hdfs namenode -format
    
  1. Start Hadoop services:
    start-all.sh
    
  1. Use the following commands to interact with Hadoop:
    vim input.txt
    hdfs dfs -put -f ./input.txt /
    hdfs dfs -ls /
    
  1. Build and run the Hadoop job:
    mvn clean package
    cd target/
    hadoop jar big-data.jar
    

Tip: You can set the environment variable to run Java directly:

export CLASSPATH=$CLASSPATH:/tmp/
# Add this to .bashrc for persistence.
  1. View the output:
    hdfs dfs -ls /output
    hdfs dfs -cat /output/part-r-00000
    

Contributing 🤝

We welcome contributions! Feel free to submit a pull request. For more details, see the Contribution Guide.


License 📄

This project is licensed under the MIT License. See the LICENSE file for details.


Support 💖

If you find this project helpful, consider giving it a ⭐️ on GitHub!


Star History ⭐

Badges
Extracted from project README's
License GitHub stars
Related Projects