redis-developer/redis-arXiv-search

Repository files navigation

This repository is the official codebase for the arxiv paper search app hosted at: https://docsearch.redisvl.com

Redis is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in the technical blog post published by our partners, Data Science Dojo.

The arXiv papers dataset was sourced from the the following Kaggle link. arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers.

This app was built as a Single Page Application (SPA) with the following components:

Some inspiration was taken from this tiangolo/full-stack-fastapi-template and turned into a SPA application instead of a separate front-end server approach.

/backend
    /arxivsearch
        /api
            /routes
                papers.py # primary paper search logic lives here
        /db
            load.py # seeds Redis DB
            redis_helpers.py # redis util
        /schema
            # pydantic models for serialization/validation from API
        /tests
        /utils
        config.py
        spa.py # logic for serving compiled react project
        main.py # entrypoint
/frontend
    /public
        # index, manifest, logos, etc.
    /src
        /config
        /styles
        /views
            # primary components live here

        api.ts # logic for connecting with BE
        App.tsx # project entry
        Routes.tsk # route definitions
        ...
/data
    # folder mounted as volume in Docker
    # load script auto populates initial data from S3

Embeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports HuggingFace, OpenAI, and Cohere embeddings out of the box.

ProviderEmbedding ModelRequired?
HuggingFacesentence-transformers/all-mpnet-base-v2Yes
OpenAItext-embedding-ada-002Yes
Cohereembed-multilingual-v3.0Yes

Interested in a different embedding provider? Feel free to open a PR and make a suggested addition.

Want to use a different model than the one listed? Set the following environment variables in your .env file (see below) to change:

  • SENTENCE_TRANSFORMER_MODEL
  • OPENAI_EMBEDDING_MODEL
  • COHERE_EMBEDDING_MODEL
  1. Before running the app, install Docker Desktop.
  2. Clone (and optionally fork) this repo to your machine.
    $ git clone https://.com/redis-developer/redis-arxiv-search
  3. Make a copy of the .env.template file:
    $ cd redis-arXiv-search/
    $ cp .env.template .env
make deploy
docker run -d --name redis -p 6379:6379 -p 8001:8001 redis:8.0-M03

To run the backend locally

  1. cd backend
  2. poetry install
  3. poetry run start-app

poetry run start-app runs the initial db load script and launch the API

It's typically easier to build front end in an interactive environment, testing changes in realtime.

  1. Deploy the app using steps above.
  2. Install packages
    $ cd frontend/
    $ npm install
  3. Use npm to serve the application from your machine
    $ npm run start
  4. Navigate to http://localhost:3000 in a browser.

All changes to your frontend code will be reflected in your display in semi realtime.

Every once and a while you need to clear out some Docker cached artifacts. Run docker system prune, restart Docker Desktop, and try again.

This project is maintained by Redis on a good faith basis. Please, open an issue here on and we will try to be responsive to these.

About

Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •