AWS - Otopy Intelligence Search Quick Start Guide

Welcome to the Otopy Dashboard!

Detailed Documentation

Overview

Otopy Intelligent Search can concept match like no other search engine, and significantly outperforms them when using long search phrases. Purchase this AMI through Amazon and the dashboard will bootup as a web app ready to ingest your data. It works with any text documents you have collected in an S3 folder. Once your data is ready for search, you may search it with the web app or by API request. Either way you’ll get conceptually related terms, with their frequency and relation.

Once you purchase an EC2 Instance will be created in your account. When it’s ready you’ll be able to create a Dataset and select your datasource from S3. Then Otopy will ingest your data and prepare it for search. Please note this process uses AWS EMR, and will automatically bootup a cluster of 3 instances and shut it back down. This cluster should only be up for a few minutes to a few hours, though the full ingestion process can take much longer depending on the size of your dataset and the power of the machine you choose when purchasing.

Features

  1. Create Dataset - after initial account creation, most users will choose to register their dataset with the system by creating new dataset.
  2. Ingest Dataset - Once you’ve created the dataset, Otopy Intelligent Search goes to work preparing your data for search.
  3. Search Dataset - When the ingestion process is complete, you’ll see a search bar and may search right through the web app.

Security

  1. Initial Key Pair - When purchasing you will be asked to provide a Key Pair, or create a new one. This is the only way you will be able to SSH into the primary instance.
  2. EMR New Key Pair - When creating a dataset you will be asked to provide the name of a new Key Pair. This will be created for you and used to access the EMR machines. You shouldn’t ever need this, the primary instance will use it on your behalf to collect your index before shutting down the EMR machines.
  3. Self Signed TLS - All web traffic to and from the Dashboard, including the API, is secured using self-signed TLS. This is important because it keeps your data safe, especially keys and passwords, but also your search itself. However since there is no validation from an outside agency, the browser will give a warning when you visit the page. Please proceed anyway, it is secure, and there is no way to get a validated certificate for such a situation.

Support

Please contact support@otopy.com if you run into any problems.

Walk Through

Prepare Data

Purchase Flow

When purchasing you will be asked to provide the following information:

  1. Stack name -- This may be any name you want, this is just a way for you to identify the resources created by Otopy.
  2. AccessCIDR -- This is the block of IP Addresses that may access your primary Otopy instance. To allow all IP Addresses to access the instance please enter 0.0.0.0/0
  3. InstanceType -- This is the type of AWS EC2 Instance for the primary Otopy Instance. The more data you wish to process, and the more heavy your general usage, the more powerful of a machine you’ll want to select. We recommend t2.medium at a minimum.
  4. KeyPair -- This is the AWS Key Pair you wish to use with your primary instance. You must have access to the private key if you wish to SSH into your instance.
  5. myS3Bucket -- This is the S3 Bucket that contains both your input and output folders. Your input folder must have your source data to be ingested and searched, and your output folder must be empty and ready to receive the index during ingestion.
  6. MySubnet -- This is simply a subnet in your AWS account that is present in the VPC you will select below.
  7. VpcId -- This is simply a VPC in your AWS account that has appropriate access. Make sure it has your subnet in it.

stack-name-parameter-values-700.png

After selecting these details Amazon will ask if you want any tags associated with these resources, it’s a good idea to provide a name you’ll remember, perhaps just the stack name (example, Key: Name, Value: OtopyIntelligentSearch). That will make it easier to identify the primary instance if you have many instances in your account.

Finally AWS will warn you that this Cloud Formation Stack comes with I AM Roles. These are necessary to give the instance permission to access S3 and create an EMR Cluster on your behalf. We have trimmed the permissions down to just what is required while consulting with AWS.

After you’ve accepted the warning and continued to the next page, Otopy Intelligent Search will begin setting itself up on your AWS account. Once the stack creation is complete, you will have a new EC2 instance which is the primary instance for the Otopy software. Once it is fully booted up, meaning all status checks are reporting OK, you may access Otopy through a normal browser window.

First find the public IP Address of your new instance, and copy it. Then in a new browser tab type “https://” then paste the IP address and hit enter. You will likely see a security warning telling you the site is unsafe. Do not be concerned, this is expected. The reason is that we encrypt the site (that’s why it’s HTTPS instead of HTTP), however the SSL / TLS certificate cannot possibly be verified by an outside agency, so your browser is concerned it might be fake.

Simply click advanced options, and say proceed anyway, even though your browser warns it might be unsafe. Once in you will find yourself ready to create a new Dataset.

Create Dataset

Once you’ve arrived at the Otopy web app, simply click Create Dataset and fill out the required information. Once you’ve submitted this form Otopy will begin ingesting your data and preparing it for you to search. Here are the form fields you need to fill out:

  1. Name -- You may name this however you’d like, just pick something that will make sense to you and your users.
  2. Description -- This is optional, just a place for you to provide additional information your users may want to see while searching the dataset.
  3. Source Data S3 URI (data to be searched) -- This is the S3 Bucket and folder that contains the data you wish to search. It must be in the same S3 Bucket you entered when purchasing.
  4. Output S3 URI (to store the index) -- This is where Otopy will store the index of your dataset. You shouldn’t need to do anything with it, it just needs to be there during the ingestion process. Just create an empty folder for this purpose and enter it here. It also must be in the same S3 Bucket you entered when purchasing.
  5. New Private Key Name -- Just provide any name you want for the Key Pair that will be created by Otopy to access the EMR Cluster it boots up on your behalf. You should never need it, and can enter anything you like as long as it follows the requirements in the note.

Ingest Dataset

Once you’ve created the dataset, Otopy Intelligent Search goes to work preparing your data for search.

Just wait on it, and keep refreshing the page until it says it’s ready to search. Wait times will vary dramatically depending on data size and the power of the instance type you chose when purchasing. You should expect it to take a few hours to a few days.

While Otopy is busy ingesting your data you should see something like this:

Screen Shot 2016-06-15 at 6.27.38 PM.png

Search Dataset

Once the dataset has finished ingesting, you may search from the web interface, or by JSON API Endpoint. To search by API use this URL with a GET request:

https://[myIP]/api/dataset/[datasetID]/search?query=what%20im%20looking%20for

Be sure to use the IP address from the instance you launch, the datasetID which you can get from the URL bar after ingestion. And of course the query you are actually interested in.

IMPORTANT NOTES:

  1. In order to hit the API you must add the TLS certificate to the trusted root store.
  2. Specify %20 for a space, as shown in the GET request example

© 2012-2017 Otopy, Inc. All rights reserved. Privacy Policy