Skip to main content
LazyLLM is an open-source low-code large-model application development tool developed by SenseTime’s LazyAGI team. It provides one-stop tooling—from application building, data preparation, model deployment, fine-tuning to evaluation—so you can rapidly build AI applications at very low cost and continuously iterate to improve results.

API Signup & Environment Setup

1. Create an account

2. Environment setup

See docs: Quick Start – LazyLLM

API Usage Test

###0. Set Environment Variables You can use the following command to set the corresponding environment variable, or explicitly pass it in code:
export LAZYLLM_SILICONFLOW_API_KEY=<your API key>

1. Implement chat and image recognition

Text Q&A demo

After filling in the api_key, run the code below to quickly call the model and generate a Q&A-style front-end interface:
    import lazyllm
    from lazyllm import OnlineChatModule,WebModule
    api_key = 'sk-' #replace with your issued API key
    # # Test the chat module
    llm = OnlineChatModule(source='siliconflow', api_key=api_key, stream=False)
    w = WebModule(llm, port=8846, title="siliconflow")
    w.start().wait()
We ask “What is LazyLLM?”, and the result is as follows:

Multimodal Q&A demo

Pass an image through the lazyllm_files parameter in the input and ask about the image content to achieve multimodal question answering.
    import lazyllm
    from lazyllm import OnlineChatModule
    api_key = 'sk-' # replace with your issued API key
    llm = OnlineChatModule(source='siliconflow', api_key=api_key,
    model='Qwen/Qwen2.5-VL-72B-Instruct')
    print(llm('Hello, what is this?', lazyllm_files=['your_picture.png']))
Here we use this image to test multimodal Q&A:
Output in the command line:

2. Implement text-to-image and text-to-speech

Use OnlineMultiModalModulefor text-to-image and text-to-speech. After running, it will output the path of the generated files.
    import lazyllm
    from lazyllm import OnlineMultiModalModule
    api_key = 'sk-xxx'
    # Test text-to-image  function=text2image
    llm =OnlineMultiModalModule(source='siliconflow',api_key=api_key,function='text2image')
    print(llm("Generate a cute puppy"))
    # Test text-to-speech function=tts
    llm = OnlineMultiModalModule(source='siliconflow',api_key=api_key,function='tts')
    print(llm("Hello, what is your name?",voice='fnlp/MOSS-TTSD-v0.5:anna'))
Run result:
The generated audio is as follows:
tmpck44zfds.mp355.13 KB2025-10-27 23:13

3. Knowledge-base Q&A in 10+ lines of code

Implement Embed and Rerank functions

Run the code below to perform vector embeddings with OnlineEmbeddingModule; settype='rerank'to call a reranking model.
    import lazyllm
    from lazyllm import OnlineEmbeddingModule
    api_key = 'sk-'
    
    # Test the embed module
    llm = OnlineEmbeddingModule(source='siliconflow', api_key=api_key)
    print(llm("apple"))
    
    # Test the rerank module
    llm = OnlineEmbeddingModule(source='siliconflow', api_key=api_key, type='rerank')
    print(llm(["apple", ['apple','banana','orange']]))
The vectorized result is as follows:
    [-0.0024823144, -0.0075530247, -0.013154144, -0.031351723, -0.024489744, 0.009692847, 0.008086464, -0.037946977, 0.013251133, -0.046675995, -0.011390155, -0.011111312, 0.016779112, 0.054168403, 0.04849454, 0.014742341, 0.02341074, -0.015542501, 0.059939254, -0.024223024, 0.0065467632, -0.041244607, -0.022925794, -0.024804957, 0.006752865, -0.047548898, -0.03685585, 0.0513557...., -0.070656545, -0.01997975, 0.023398615, 0.008735079]
The word similarity scores are as follows:
    [{'index': 0, 'relevance_score': 0.9946065545082092}, {'index': 2, 'relevance_score': 0.014802767895162106}, {'index': 1, 'relevance_score': 0.0004139931406825781}]

Knowledge-base import

We use Chinese classical texts as an example knowledge base. After downloading, place them in the database folder. Sample dataset download link: Sample Dataset Download
First define the embed model, then use LazyLLM’s Document component to create a document management module for importing the knowledge base.
    import lazyllm
    api_key='sk-'
    embed_model = lazyllm.OnlineEmbeddingModule(source="siliconflow", api_key=api_key)
    documents = lazyllm.Document(
    dataset_path = "database",
    embed = embed_model
    )

Knowledge-base retrieval

Now that we have an external knowledge base, we can use the Retriever component in LazyLLM to retrieve the knowledge base and recall relevant content. Usage example:
    import lazyllm
    from lazyllm.tools import Retriever, Document, SentenceSplitter
    api_key='sk-'
    embed_model = lazyllm.OnlineEmbeddingModule(source="siliconflow", api_key=api_key)

    documents = Document(dataset_path='database', embed=embed_model, manager=False)
    rm = Retriever(documents, group_name='CoarseChunk', similarity='bm25', similarity_cut_off=0.01, topk=6)
    rm.start()
    print(rm("user query"))

Knowledge-base Q&A

Combining the above model, document management, and retrieval modules, we can build a complete dataflow using LazyLLM’s built-in Flow component. The full code is as follows:
   import lazyllm

   from lazyllm import (
   OnlineEmbeddingModule, OnlineChatModule, Document, SentenceSplitter,
   Retriever, Reranker, ChatPrompter, pipeline
   )
   # Initialize api key and prompt
   api_key = 'sk-'
   prompt = """
   You will play the role of an AI Q&A assistant and complete a dialogue task.
   In this task, you need to provide your answer based on the given context and question.
   """
   # Initialize models
   embed_model = OnlineEmbeddingModule(source="siliconflow", api_key=api_key)
   rerank_model = OnlineEmbeddingModule(source="siliconflow", api_key=api_key, type="rerank")
   llm = OnlineChatModule(source="siliconflow", api_key=api_key)
   # Define the document management module and create node groups
   doc = Document(dataset_path="/home/xxx/database", manager=False, embed=embed_model)
   doc.create_node_group(name="block", transform=SentenceSplitter, chunk_size=1024, chunk_overlap=100)
   doc.create_node_group(name="line", transform=SentenceSplitter, chunk_size=128, chunk_overlap=20, parent="block")
   # Build the RAG pipeline (multi-route retrieval → rerank → prompt formatting → LLM answering)
   with pipeline() as ppl:
   with lazyllm.parallel().sum as ppl.prl:
       prl.r1 = Retriever(doc, group_name='line', similarity="cosine", topk=6, target='block')
       prl.r2 = Retriever(doc, group_name='block', similarity="cosine", topk=6)
   ppl.reranker = Reranker('ModuleReranker', model=rerank_model, output_format='content',
                           join=True) | bind(query=ppl.input)
   ppl.formatter = (lambda context, query: dict(context_str=str(context), query=query)) | bind(query=ppl.input)
   ppl.llm = llm.prompt(lazyllm.ChatPrompter(prompt, extra_keys=["context_str"]))
   ppl.start()
   query = "What is the Way of Heaven?"
   
   print(ppl(query))
You can see that the RAG component successfully retrieves content related to the “Way of Heaven” from the Tao Te Ching and other sources, and passes it to the LLM for answering.