81 lines
2.2 KiB
Markdown
81 lines
2.2 KiB
Markdown
Building and Running the Project
|
|
================================
|
|
|
|
These instructions will work using the files at this commit:
|
|
|
|
git checkout 800b1f59edaa20a9b65f32a815605307e1102baa
|
|
|
|
First, you need to download the small sample of the stack overflow data that
|
|
can be found here:
|
|
|
|
https://drive.google.com/open?id=0B0uip08Km2LPVTFTRFhrdHF2WW8
|
|
|
|
Put it in a directory at the project's root called ./stackoverflow_dataset
|
|
|
|
Next, the following programs need to be installed on your system (homebrew was
|
|
used for easy installation on OSX)
|
|
|
|
Spark:
|
|
|
|
brew install apache-spark
|
|
|
|
Scala:
|
|
|
|
brew install scala
|
|
|
|
Maven:
|
|
|
|
brew install maven
|
|
|
|
To build and run the project locally you need to set versions in the pom.xml
|
|
file to match those of the programs installed on your system.
|
|
the following lines need to be updated in the pom.xml file:
|
|
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>org.scala-lang</groupId>
|
|
<artifactId>scala-library</artifactId>
|
|
<version>2.11.8</version> <<<<
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_2.11</artifactId> <<<<
|
|
<version>2.0.2</version> <<<<
|
|
</dependency>
|
|
</dependencies>
|
|
|
|
running:
|
|
|
|
spark-shell
|
|
|
|
should give you output that will tell you your versions similar to this:
|
|
|
|
Welcome to
|
|
____ __
|
|
/ __/__ ___ _____/ /__
|
|
_\ \/ _ \/ _ `/ __/ '_/
|
|
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
|
|
/_/
|
|
|
|
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
|
|
Type in expressions to have them evaluated.
|
|
Type :help for more information.
|
|
|
|
Having edited this pom.xml file, run the following from the root of the
|
|
project to compile:
|
|
|
|
mvn clean package
|
|
|
|
This should run successfully (and will probably download and install a whole bunch of
|
|
stuff the first time you run it...)
|
|
|
|
To run the compiled application:
|
|
|
|
cd target
|
|
spark-submit --class ClusterSOData.Main --master local KMeans-0.0.1.jar
|
|
|
|
That should run without errors, producing an output folder. Check that
|
|
something has been generated by running:
|
|
|
|
cat output/part-00000
|