5. Development Distribution¶
5.1. Preparing the development environment¶
To prepare the development environment you have to install:
and clone the Parrot Stream CSD GitHub repository with:
git clone https://github.com/parrot-stream/parrot.git
5.2. Full Environment¶
To start a Full Development Environment you have just to run:
./parrot start
At the first time all needed Docker images will be downloaded to prepare your development environment and this will take a while so be patient and go grab a beverage!
The Full Development Environment includes the minimum components needed for running Parrot Stream which are:
- Apache ZooKeeper
- Apache Kafka
plus all the components needed to the Parrot Sink connectors, such as:
- Apache Hadoop (needed for running Hive and Impala)
- PostgreSQL (needed for configuring the Hive metastore)
- Apache Hive (for running the Parrot HBase/Kudu connect with Hive external table option)
- Apache Impala (for running the Parrot HBase/Kudu connect with Impala external table option)
- Apache HBase (for running the Parrot HBase Sink connector)
- Apache Kudu (for running the Parrot Kudu Sink connector)
plus all the components needed to test a full data streaming pipeline with Debezium such as:
- MongoDB
- MySQL
plus useful tools to explore the data and the Parrot Stream configurations, such as:
- Hue
- Kafka Topics UI
- Kafka Connect UI
- Schema Registry UI
This is a lot of stuff which requires around 7 GiB of free memory (but 10 GiB are recomended) so take this into account before trying to start this kind of environment.
5.3. Minimum Environment¶
To start a Minimum Development Environment you have just to start ZooKeeper, Kafka and Parrot Stream itself.
Start ZooKeeper and Kafka:
./parrot start -s=zookeeper,kafkaCheck if ZooKeeper is up with:
curl http://localhost:8080/commands/configuration
You should get a json respone like the following:
{ "client_port" : 2181, "data_dir" : "/tmp/zookeeper/version-2", "data_log_dir" : "/tmp/zookeeper/version-2", "tick_time" : 2000, "max_client_cnxns" : 60, "min_session_timeout" : 4000, "max_session_timeout" : 40000, "server_id" : 0, "command" : "configuration", "error" : null }
Check if Kafka is up with:
curl http://localhost:8082/topics
You should get a json empty array:
[]because there are still no Kafka topics in the cluster.
Start Parrot:
./parrot start
Check if Parrot is up with:
curl http://localhost:8083/connectors
You should get a json empty array:
[]because there are still no Parrot connectors defined.
Check that all Kafka Connect topics have been correctly created running:
curl http://localhost:8082/topics
you should now get a json array which lists all the the Kafka topics created by Parrot Stream:
[ "__consumer_offsets", "_schemas", "connect-configs", "connect-offsets", "connect-statuses" ]