Reactive Machines

Ukushumeka kwe-Multimodal esikalini: Ichibi ledatha ye-AI yomthwalo wemidiya nokuzijabulisa

Lokhu okuthunyelwe kukubonisa ukuthi ulwakha kanjani uhlelo lokusesha lwevidiyo ye-multimodal olunezimo eziningi oluvumela ukusesha kolimi lwemvelo kuwo wonke amadathasethi amakhulu wevidiyo usebenzisa amamodeli we-Amazon Nova kanye ne-Amazon OpenSearch Service. Uzofunda ukuthi ungadlulela kanjani ngale kokumaka okwenziwa ngesandla kanye nosesho olusekelwe egameni elingukhiye ukuze unike amandla ukusesha kwe-semantic othwebula ukunotha okuphelele kokuqukethwe kwevidiyo.

Sibonisa lokhu ngezikali ngokucubungula amavidiyo angu-792,270 asuka kudathasethi ye-AWS Open Data Registry: I-Multimedia Commons (amavidiyo angu-787,479, isilinganiso samasekhondi angu-37) kanye ne-MEVA (amavidiyo angu-4,791, isilinganiso semizuzu emi-5). Ukucubungula amahora angu-8,480 wokuqukethwe kwevidiyo (30.5M imizuzwana) kuthathe amahora angu-41. Izindleko eziphelele zonyaka wokuqala: $27,328 (ne-OpenSearch efunekayo) noma $23,632 (nge-OpenSearch Service Reserved Instances). Izindleko bezibandakanya ukungenisa ngesikhathi esisodwa ($18,088) kanye ne-Amazon OpenSearch Service yonyaka ($9,240 on-demand noma $5,544 Reserved).

Ukuhlukaniswa kokungenisa imi kanje:

  • I-Amazon Elastic Compute Cloud (i-Amazon EC2) ikhomputha (4× c7i.48xindawo enkulu ngo-$2.57/ihora × amahora angu-41): $421
  • I-Amazon Bedrock Nova Multimodal Embeddings (30.5M imizuzwana × $0.00056/inani leqoqo lesibili): $17,096
  • Ukumaka kwe-Nova Pro (amavidiyo angu-792K × amathokheni angu-600(isilinganiso.)): $571

Isixazululo sikhiqiza ukushumeka komsindo nokubukwayo kusetshenziswa AUDIO_VIDEO_COMBINED Imodi (bona i-schema ye-Nova Multimodal Embeddings API), ikugcina kusevisi ye-OpenSearch, futhi isekela umbhalo-kuya-ividiyo, ividiyo-kuya-ividiyo, kanye nokusesha okuxubile.

Uhlolojikelele lwesixazululo

Isakhiwo siqukethe ukugeleza komsebenzi okubili okuyinhloko-ukungenisa nokusesha-okusebenza ndawonye ukuze kunikwe amandla ukusesha kwevidiyo ye-multimodal esikalini:

Ipayipi lokungenisa ividiyo:

Ipayipi lokungenisa lisebenzisa izikhathi ezine ze-Amazon EC2 c7i.48xlarge nabasebenzi abahambisanayo abangu-600 ukucubungula amavidiyo angu-19,400 ngehora. I-async API inomkhawulo wemali evumelanayo wemisebenzi engama-30 ngesikhathi esisodwa nge-akhawunti ngayinye (bona izilinganiso ze-Amazon Bedrock), ngakho umugqa usebenzisa umugqa wemisebenzi nokuvota. Abasebenzi bahambisa imisebenzi kuze kufike umkhawulo wemali, inhlolovo ukuze iqedwe, futhi bathumele imisebenzi emisha njengoba izikhala zitholakala. I-Amazon Nova Multimodal Embeddings iphatha ukucutshungulwa kwevidiyo ngokuhambisanayo, ihlukanisa amavidiyo abe yiziqephu eziyi-15-isekhondi (elungiselelwe ukuthwebula izinguquko zesigcawu ngenkathi kugcinwa izibalo zokushumeka zilawuleka) futhi ikhiqize ukushumeka kwe-1024-dimensional. Lokho kushumeka kukhethwe ngaphezu kuka-3072-dimensional ukonga izindleko ezingu-3x endaweni yokubuka enomthelela wokunemba omncane. Izindleko zokukhiqiza zokushumeka ziyi-agnostic ekushumekeni kobukhulu. I-Amazon Nova Pro yengeza amathegi achazayo ayi-10-15 ngevidiyo ngayinye evela ku-taxonomy echazwe ngaphambilini.

Qaphela: I-Amazon Nova 2 Lite inikeza ukunemba okuthuthukisiwe ngezindleko eziphansi zemisebenzi yokumaka. Sincoma ukuthi uyicabangele ekusetshenzisweni okusha. Uhlelo lugcina okushumekiwe kunkomba ye-OpenSearch k-NN yokusesha kwe-semantic kanye nomaka bemethadatha kunkomba yombhalo ehlukile yokufanisa igama elingukhiye. Ukuze useshe, ungabuza amavidiyo ngezindlela ezintathu: guqula ulimi lwemvelo lube okushumekiweyo ukuze useshe umbhalo uye-kuvidiyo, uqhathanise okushumekiwe kwevidiyo ngokuqondile ukuze useshe ividiyo-kuya-ividiyo, noma uhlanganise zombili izindlela ekusesheni okuxubile.

Izinhlobo zokusesha ezinikwe amandla yilesi sixazululo:

  1. Sesha umbhalo uye kuvidiyo – Imibuzo yolimi lwemvelo iguqulelwe ekushumekeni ukuze kufaniswe ukufana kwe-semantic
  2. Ukucinga kwevidiyo ukuya kuvidiyo – Thola okuqukethwe okufanayo ngokuqhathanisa okushumekiwe kwevidiyo ngokuqondile
  3. Ukusesha okuhlanganisiwe – Ihlanganisa ukufana kwe-vector (isisindo esingu-70%) nokufanisa igama elingukhiye (30% isisindo) ukuze uthole ukunemba okuphezulu

Ipayipi lokungenisa ividiyo

Umdwebo olandelayo ubonisa ukungeniswa kwevidiyo kanye nepayipi lokucubungula:

Umfanekiso 1: Ipayipi lokungenisa ividiyo elibonisa ukugeleza ukusuka kwisitoreji sevidiyo ye-S3 ngokusebenzisa i-Nova Multimodal Embeddings kanye ne-Nova Pro kuya kuzinkomba ezimbili ze-OpenSearch

Umsebenzi wokucubungula ividiyo umi kanje:

  1. Layisha amavidiyo ku-Amazon Simple Storage Service (Amazon S3).
  2. Cubungula amavidiyo usebenzisa i-Nova Multimodal Embeddings async API, ehlukanisa amavidiyo ngokuzenzakalelayo futhi ikhiqize okushumekiwe. Ukuvota kwe-orchestrator ukuze kuqedwe umsebenzi (i-async API inomkhawulo wemisebenzi efanayo engama-30 nge-akhawunti ngayinye, bheka izilinganiso ze-Amazon Bedrock) futhi ithola imiphumela ku-Amazon S3.
  3. Khiqiza amathegi achazayo usebenzisa i-Nova Pro (noma i-Nova Lite ukuze uthole ukunemba okungcono ngezindleko eziphansi) kusukela ku-taxonomy echazwe ngaphambilini ukuze uthole amakhono okusesha athuthukisiwe.
  4. Ukushumeka kwenkomba kunkomba ye-OpenSearch k-NN kanye nomaka kunkomba yombhalo.

Isakhiwo sosesho lwevidiyo

Umdwebo olandelayo ubonisa isakhiwo sosesho esiphelele:

Umfanekiso 2: Isakhiwo sosesho lwevidiyo esibonisa izindlela zokusesha ezintathu – umbhalo-kuya-vidiyo, ividiyo ukuya kuvidiyo, kanye nosesho oluxubile oluhlanganisa i-k-NN ne-BM25

I-architecture yosesho inika amandla izindlela ezintathu:

  1. Umbhalo ukuya kuvidiyo – Imibuzo yolimi lwemvelo
  2. Ividiyo-kuya-ividiyo – Ukutholwa kokuqukethwe okufanayo
  3. Ingxubevange – Kuhlanganiswe i-semantic negama elingukhiye

Okudingekayo

Ngaphambi kokuthi uqale, uzodinga:

  1. I-akhawunti ye-AWS enokufinyelela ku-Amazon Bedrock in us-east-1 (Amamodeli e-Nova anikwe amandla ngokuzenzakalela ngezimvume ezifanele ze-IAM)
  2. I-Python 3.9 noma kamuva ifakiwe
  3. I-AWS Command Line Interface (AWS CLI) ilungiselelwe ngemininingwane efanele
  4. Isizinda sesevisi ye-Amazon OpenSearch (i-r6g.enkulu noma enkulu iyanconywa)
  5. Ibhakede le-Amazon S3 lokugcina ividiyo nokushumeka imiphumela
  6. I-AWS Identity and Access Management (IAM) ye-Amazon Bedrock, OpenSearch Service, kanye ne-Amazon S3

Isixazululo sisebenzisa:

  1. I-Amazon Bedrock ene-Nova Multimodal Embeddings (amazon.nova-2-multimodal-embeddings-v1:0)
  2. I-Amazon Bedrock ene-Nova Pro (us.amazon.nova-pro-v1:0) noma i-Nova Lite (us.amazon.nova-2-lite-v1:0) yokumaka
  3. I-Amazon OpenSearch Service 2.11 noma eyakamuva nge-plugin ye-k-NN
  4. I-Amazon S3 yokugcina ividiyo nokushumeka

Ukuhamba phambili

Isinyathelo 1: Dala izindima nezinqubomgomo ze-IAM

Dala indima ye-IAM ngezimvume zokucela amamodeli we-Amazon Bedrock, bhala kuzinkomba ze-OpenSearch, futhi ufunde/ubhale izinto ze-S3.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:StartAsyncInvoke",
        "bedrock:GetAsyncInvoke",
        "bedrock:ListAsyncInvoke"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-multimodal-embeddings-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/us.amazon.nova-pro-v1:0"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "es:ESHttpPost",
        "es:ESHttpPut",
        "es:ESHttpGet"
      ],
      "Resource": "arn:aws:es:us-east-1:ACCOUNT_ID:domain/DOMAIN_NAME/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-video-bucket/*",
        "arn:aws:s3:::amzn-s3-demo-embedding-bucket/*"
      ]
    }
  ]
}

Isinyathelo sesi-2: Setha izinkomba Zesevisi ye-OpenSearch

Dala izinkomba ezimbili ze-OpenSearch Service: eyodwa eyokushumekwa kwe-vector (k-NN) neyodwa yemethadatha yombhalo. Lesi sakhiwo sisekela ukusesha kwe-semantic kanye nemibuzo eyingxube.

from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3

session = boto3.Session()
credentials = session.get_credentials()
awsauth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    session.region_name,
    'es',
    session_token=credentials.token
)

opensearch_client = OpenSearch(
    hosts=[{'host': 'YOUR_OPENSEARCH_ENDPOINT', 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

# Create k-Nearest Neighbors (k-NN) index for embeddings
knn_index_body = {
    "settings": {
        "index.knn": True,
        "number_of_shards": 2,
        "number_of_replicas": 1
    },
    "mappings": {
        "properties": {
            "video_id": {"type": "keyword"},
            "segment_index": {"type": "integer"},
            "timestamp": {"type": "float"},
            "embedding": {
                "type": "knn_vector",
                "dimension": 1024,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimilarity",
                    "engine": "faiss"
                }
            },
            "s3_uri": {"type": "keyword"}
        }
    }
}

opensearch_client.indices.create(
    index="video-embeddings-knn",
    body=knn_index_body
)

# Create text index for metadata
text_index_body = {
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1
    },
    "mappings": {
        "properties": {
            "video_id": {"type": "keyword"},
            "segment_index": {"type": "integer"},
            "tags": {"type": "text", "analyzer": "standard"}
        }
    }
}

opensearch_client.indices.create(
    index="video-embeddings-text",
    body=text_index_body
)

Isinyathelo sesi-3: Cubungula amavidiyo nge-Nova Multimodal Embeddings

I-Amazon Bedrock async API icubungula amavidiyo futhi ikhiqize ukushumeka. Ihlukanisa amavidiyo abe yiziqephu zemizuzwana engu-15 futhi ihlanganise ulwazi lomsindo nolwazi olubonakalayo.

import boto3
import json
import time

bedrock = boto3.client('bedrock-runtime', region_name="us-east-1")

def generate_video_embeddings(video_s3_uri, output_s3_uri):
    """Generate embeddings for a video using Nova MME async API."""
    
    # Start async job
    response = bedrock.start_async_invoke(
        modelId="amazon.nova-2-multimodal-embeddings-v1:0",
        modelInput={
            "taskType": "SEGMENTED_EMBEDDING",
            "segmentedEmbeddingParams": {
                "embeddingPurpose": "GENERIC_INDEX",
                "embeddingDimension": 1024,
                "video": {
                    "format": "mp4",
                    "embeddingMode": "AUDIO_VIDEO_COMBINED",
                    "source": {"s3Location": {"uri": video_s3_uri}},
                    "segmentationConfig": {"durationSeconds": 15}
                }
            }
        },
        outputDataConfig={"s3OutputDataConfig": {"s3Uri": output_s3_uri}}
    )
    
    # Poll for completion
    invocation_arn = response["invocationArn"]
    while True:
        job = bedrock.get_async_invoke(invocationArn=invocation_arn)
        if job["status"] == "Completed":
            return read_embeddings_from_s3(job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"])
        elif job["status"] in ["Failed", "Expired"]:
            raise RuntimeError(f"Job failed: {job.get('failureMessage')}")
        time.sleep(10)

def manage_concurrent_jobs(bedrock_client, video_queue, max_concurrent=30):
    """Manage 30 concurrent async jobs within quota limits."""
    active_jobs = {}
    
    while video_queue or active_jobs:
        # Submit new jobs up to limit (uses same start_async_invoke call as above)
        while len(active_jobs) < max_concurrent and video_queue:
            video_info = video_queue.pop(0)
            response = bedrock_client.start_async_invoke(
                modelId="amazon.nova-2-multimodal-embeddings-v1:0",
                modelInput={...},  # Same model_input structure as generate_video_embeddings()
                outputDataConfig={"s3OutputDataConfig": {"s3Uri": video_info['output_uri']}}
            )
            active_jobs[response["invocationArn"]] = video_info
        
        # Poll all active jobs
        for arn in list(active_jobs.keys()):
            job = bedrock_client.get_async_invoke(invocationArn=arn)
            if job["status"] == "Completed":
                video_info = active_jobs.pop(arn)
                embeddings = read_embeddings_from_s3(job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"])
                # Process embeddings...
            elif job["status"] in ["Failed", "Expired"]:
                active_jobs.pop(arn)
        
        if active_jobs:
            time.sleep(10)

def read_embeddings_from_s3(s3_uri):
    """Read JSONL embeddings from S3. Returns list of {startTime, endTime, embedding} dicts."""
    # Download and parse JSONL from s3_uri (standard S3 GetObject + json.loads per line)

Isinyathelo sesi-4: Khiqiza omaka bemethadatha nge-Nova Pro noma i-Nova Lite

Khiqiza amathegi achazayo wamavidiyo usebenzisa i-Nova Pro (noma i-Nova Lite ukuze uthole ukunemba okungcono ngezindleko eziphansi) ukuze unike amandla ukusesha okuyingxubevange okuhlanganisa ukufaniswa kwe-semantic namagama angukhiye.

VALID_TAGS = [
    "person", "vehicle", "animal", "building", "nature", "indoor", "outdoor",
    "walking", "running", "sitting", "standing", "talking", "driving",
    "day", "night", "sunny", "cloudy", "urban", "rural", "beach", "forest",
    "sports", "music", "food", "technology", "crowd", "solo"
]

def generate_tags(video_s3_uri, sample_frame_count=3):
    """Generate descriptive tags using Nova Pro or Nova Lite."""
    
    prompt = f"""Analyze this video and select 10-15 tags from this predefined list that best describe the content:
{', '.join(VALID_TAGS)}

Only return tags from this list as a comma-separated list. Do not invent new tags."""
    
    response = bedrock.converse(
        modelId="us.amazon.nova-pro-v1:0",  # Or use us.amazon.nova-2-lite-v1:0
        messages=[{
            "role": "user",
            "content": [{
                "video": {
                    "format": "mp4",
                    "source": {"s3Location": {"uri": video_s3_uri}}
                }
            }, {
                "text": prompt
            }]
        }]
    )
    
    # Parse tags from response and validate against taxonomy
    tags_text = response['output']['message']['content'][0]['text']
    tags = [tag.strip().lower() for tag in tags_text.split(',')]
    
    # Filter to only valid tags from our taxonomy
    valid_tags = [tag for tag in tags if tag in VALID_TAGS]
    
    return valid_tags

Isinyathelo sesi-5: Ukushumeka kwenkomba nomaka kusevisi ye-OpenSearch

Gcina okushumekiwe namathegi ku-OpenSearch Service usebenzisa inkomba yenqwaba ukuze usebenze kahle.

from opensearchpy import helpers

def index_video_data(video_id, s3_uri, embeddings, tags):
    """Index embeddings and tags in OpenSearch."""
    
    # Prepare bulk actions for k-NN index
    knn_actions = []
    for idx, emb in enumerate(embeddings):
        doc_id = f"{video_id}_{idx}"
        knn_actions.append({
            "_index": "video-embeddings-knn",
            "_id": doc_id,
            "_source": {
                "video_id": video_id,
                "segment_index": idx,
                "timestamp": emb['start_time'],
                "embedding": emb['embedding'],
                "s3_uri": s3_uri
            }
        })
    
    # Bulk index embeddings
    helpers.bulk(opensearch_client, knn_actions)
    
    # Prepare bulk actions for text index
    text_actions = []
    for idx in range(len(embeddings)):
        doc_id = f"{video_id}_{idx}"
        text_actions.append({
            "_index": "video-embeddings-text",
            "_id": doc_id,
            "_source": {
                "video_id": video_id,
                "segment_index": idx,
                "tags": " ".join(tags)
            }
        })
    
    # Bulk index tags
    helpers.bulk(opensearch_client, text_actions)
    
    print(f"Indexed {len(embeddings)} segments for video {video_id}")

Isinyathelo sesi-6: Sebenzisa umsebenzi wokusesha

Ngemuva kokuqeda ukungeniswa, sesha amavidiyo anezikhombo ngezindlela ezintathu. Ukuqaliswa kuqondise imibuzo yokubambezeleka okuphansi.

Qalisa iklayenti le-OpenSearch Service ukuze useshe

Okokuqala, dala iklayenti le-OpenSearch Service ukuze uthole imisebenzi yosesho:

from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3

def create_opensearch_client():
    """Create OpenSearch client with AWS authentication."""
    session = boto3.Session(region_name="us-east-1")
    credentials = session.get_credentials()
    awsauth = AWS4Auth(
        credentials.access_key,
        credentials.secret_key,
        'us-east-1',
        'es',
        session_token=credentials.token
    )
    
    return OpenSearch(
        hosts=[{'host': 'YOUR_OPENSEARCH_ENDPOINT', 'port': 443}],
        http_auth=awsauth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        timeout=30
    )

# Create client
opensearch_client = create_opensearch_client()

Ukusesha kombhalo ukuya kuvidiyo

Guqula imibuzo yolimi lwemvelo ibe okokushumeka usebenzisa i-API yokuvumelanisa, bese wenza usesho lokufana lwe-k-NN:

def search_text_to_video(query_text, opensearch_client, k=10):
    """Search videos using natural language query converted to embedding."""
    
    bedrock_client = boto3.client('bedrock-runtime', region_name="us-east-1")
    
    # Use SINGLE_EMBEDDING task type for text-to-embedding conversion
    # VIDEO_RETRIEVAL purpose optimizes embeddings for searching video content
    request_body = {
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "VIDEO_RETRIEVAL",
            "embeddingDimension": 1024,
            "text": {
                "truncationMode": "END",
                "value": query_text
            }
        }
    }
    
    response = bedrock_client.invoke_model(
        modelId='amazon.nova-2-multimodal-embeddings-v1:0',
        body=json.dumps(request_body),
        accept="application/json",
        contentType="application/json"
    )
    
    response_body = json.loads(response['body'].read())
    # Response structure: {"embeddings": [{"embeddingType": "TEXT", "embedding": [...]}]}
    query_embedding = response_body['embeddings'][0]['embedding']
    
    # Perform k-NN search against video embeddings
    search_body = {
        "query": {
            "knn": {
                "embedding": {
                    "vector": query_embedding,
                    "k": k
                }
            }
        },
        "size": k,
        "_source": ["video_id", "segment_index", "timestamp", "s3_uri"]
    }
    
    response = opensearch_client.search(
        index="video-embeddings-knn",
        body=search_body
    )
    
    # Extract results
    return [{'score': hit['_score'], 
             'video_id': hit['_source']['video_id'],
             'segment_index': hit['_source']['segment_index'],
             'timestamp': hit['_source'].get('timestamp', 0)} 
            for hit in response['hits']['hits']]

Usesho lombhalo nge-BM25 (ukufanisa igama elingukhiye)

Sebenzisa amaphuzu e-OpenSearch BM25 ukuze ufanise igama elingukhiye kumathegi ngaphandle kokukhiqiza okushumekiwe:

def search_text_bm25(search_term, opensearch_client, k=10):
    """Search videos using BM25 keyword matching on tags field."""
    
    # Search text index using match query on tags
    search_body = {
        "query": {
            "match": {
                "tags": search_term
            }
        },
        "size": k,
        "_source": ["video_id", "segment_index", "tags"]
    }
    
    response = opensearch_client.search(
        index="video-embeddings-text",
        body=search_body
    )
    
    return response['hits']['hits']  # Extract results (same pattern as above)

Ukusesha kwevidiyo ukuya kuvidiyo

Buyisa okushumekiwe okukhona kwevidiyo kusevisi ye-OpenSearch bese usesha okuqukethwe okufanayo—akukho kholi ye-Amazon Bedrock API edingekayo:

def search_video_to_video(query_video_id, query_segment_index, opensearch_client, k=10):
    """Find similar videos using a reference video segment."""
    
    # Get the embedding from the reference video segment
    sample_query = {
        "query": {
            "bool": {
                "must": [
                    {"term": {"video_id": query_video_id}},
                    {"term": {"segment_index": query_segment_index}}
                ]
            }
        },
        "_source": ["video_id", "segment_index", "embedding"]
    }
    
    sample_response = opensearch_client.search(
        index="video-embeddings-knn",
        body=sample_query
    )
    
    if not sample_response['hits']['hits']:
        return []
    
    sample_doc = sample_response['hits']['hits'][0]['_source']
    query_embedding = sample_doc.get('embedding')
    
    # Perform k-NN search with the embedding
    search_body = {
        "query": {
            "knn": {
                "embedding": {
                    "vector": query_embedding,
                    "k": k
                }
            }
        },
        "size": k,
        "_source": ["video_id", "segment_index", "timestamp"]
    }
    
    response = opensearch_client.search(
        index="video-embeddings-knn",
        body=search_body
    )
    
    return response['hits']['hits']  # Extract results as needed

Ukusesha okuhlanganisiwe

Hlanganisa i-semantic k-NN kanye ne-BM25 yokufaniswa kwegama elingukhiye ngokuthola imiphumela kuzo zombili izinkomba futhi uhlanganise namagoli anesisindo:

def search_hybrid(query_text, opensearch_client, k=10, vector_weight=0.7, text_weight=0.3):
    """Hybrid search combining k-NN semantic search and BM25 text matching."""
    
    # Generate query embedding (use same code as search_text_to_video above)
    query_embedding = generate_query_embedding(query_text)  # See text-to-video example
    
    # Get k-NN results (same query as search_text_to_video)
    knn_response = opensearch_client.search(
        index="video-embeddings-knn",
        body={"query": {"knn": {"embedding": {"vector": query_embedding, "k": 20}}}, "size": 20}
    )
    
    # Get BM25 text results (same query as search_text_bm25)
    text_response = opensearch_client.search(
        index="video-embeddings-text",
        body={"query": {"match": {"tags": query_text}}, "size": 20}
    )
    
    # Combine results with weighted scoring
    knn_hits = knn_response['hits']['hits']
    text_hits = text_response['hits']['hits']
    
    combined = {}
    
    for hit in knn_hits:
        vid = hit['_source']['video_id']
        seg = hit['_source']['segment_index']
        key = f"{vid}_{seg}"
        combined[key] = {
            'video_id': vid,
            'segment_index': seg,
            'tags': hit['_source'].get('tags', ''),
            'vector_score': hit['_score'],
            'text_score': 0,
            'combined_score': hit['_score'] * vector_weight
        }
    
    for hit in text_hits:
        vid = hit['_source']['video_id']
        seg = hit['_source']['segment_index']
        key = f"{vid}_{seg}"
        if key in combined:
            combined[key]['text_score'] = hit['_score']
            combined[key]['combined_score'] += hit['_score'] * text_weight
        else:
            combined[key] = {
                'video_id': vid,
                'segment_index': seg,
                'tags': hit['_source'].get('tags', ''),
                'vector_score': 0,
                'text_score': hit['_score'],
                'combined_score': hit['_score'] * text_weight
            }
    
    # Sort by combined score and return top k
    sorted_results = sorted(combined.values(), key=lambda x: x['combined_score'], reverse=True)[:k]
    
    return sorted_results

# Usage example - search with natural language query
query = "person walking on beach at sunset"
hybrid_results = search_hybrid(query, opensearch_client, k=10)

for r in hybrid_results:
    print(f"Combined: {r['combined_score']:.4f} (Vector: {r['vector_score']:.4f}, Text: {r['text_score']:.4f})")
    print(f"  Video: {r['video_id']}, Segment: {r['segment_index']}")
    print(f"  Tags: {r['tags']}n")

Sesha ukusebenza esikalini

Ngemva kokukhomba wonke amavidiyo angu-792,218, silinganise ukusebenza kokusesha kuzo zontathu izindlela.

Izibalo ezilinganiselwe zemibuzo kumavidiyo angu-792,218 zimi kanje:

  • Usesho lwe-Semantic k-NN: ~76ms (kusetshenziswa ukukala kwe-logarithmic ye-HNSW)
  • Usesho lombhalo lwe-BM25: ~30ms
  • Usesho oluhlanganisiwe: ~106ms

Ngemva kokufaka ohlwini nokugcina wonke amavidiyo angu-792,218 nokukhiqiza okushumekiwe, izidingo zokulondoloza zimi kanje:

  • Inkomba ye-k-NN: 28.8 GB yamavidiyo angu-792K
  • Inkomba yombhalo: 1.0 GB kumavidiyo angu-792K
  • Ingqikithi: 29.8 GB (ilawuleka kumaqoqo esimanje we-OpenSearch)

I-algorithm ye-Hierarchical Navigable Small World (HNSW) esetshenziselwa ukusesha kwe-k-NN inikeza inkimbinkimbi yesikhathi se-logarithmic, okusho ukuthi izikhathi zokusesha zikhula kancane njengoba idathasethi ikhula. Zontathu izindlela zokusesha zigcina izikhathi zokuphendula ezingaphansi kuka-200 ms ngisho nasezingeni levidiyo elingu-792K, zihlangabezana nezidingo zokukhiqiza zezinhlelo zokusebenza zokusesha ezisebenzisanayo.

Izinto okufanele uzazi

Ukucatshangelwa kokusebenza nezindleko

Isikhathi sokucubungula ividiyo sincike kubude bevidiyo. Ekuhlolweni kwethu, ividiyo yamasekhondi angu-45 ithathe cishe imizuzwana engama-70 ukucubungula kusetshenziswa i-async API. Ukucutshungulwa kufaka phakathi ukuhlukaniswa okuzenzakalelayo, ukushumeka kwesizukulwane sengxenye ngayinye, kanye nokuphumayo ku-Amazon S3. Imisebenzi yosesho ikhula kahle—ukuhlola kwethu kubonisa ukuthi nakumavidiyo angu-792K, ukusesha kwe-semantic kugcina kungaphansi kuka-80 ms, ukusesha umbhalo ngaphansi kuka-30 ms, kanye nokusesha okuyingxube ngaphansi kuka-11 0ms.Sebenzisa ukushumeka kwe-1024-dimensional esikhundleni sika-3072 ukuze unciphise izindleko zokulondoloza kuyilapho ugcina ukunemba. Izindleko zokushumeka kwe-Nova Multimodal ngesekhondi ngayinye yokokufaka kwevidiyo ($0.00056/iqoqo lesibili), ngakho ubude bevidiyo—hhayi ukushumeka ubukhulu noma ukuhlukaniswa—bunquma izindleko zokucubungula. I-async API ibiza kakhulu kunokucubungula ozimele ngabanye. Ngesevisi ye-OpenSearch, ukusebenzisa izimo ze-r6g kunikeza ukusebenza kwentengo okungcono kunezinhlobo zezibonelo zangaphambili, futhi ungasebenzisa ukuhlanganisa ukuze uhambise idatha ebandayo ku-Amazon S3 ukuze uthole ukonga okwengeziwe.

Ukukala kuya ekukhiqizeni

Ekusetshenzisweni kokukhiqiza okunamalabhulali amakhulu evidiyo, cabanga ukusebenzisa i-AWS Batch ukuze ucubungule amavidiyo ngokuhambisana ezimweni eziningi zekhompyutha. Ungahlukanisa isethi yedatha yevidiyo yakho futhi unikeze amasethi amancane kubasebenzi abahlukahlukene. Gada iqoqo le-OpenSearch Service yezempilo kanye nezikali zedatha njengoba inkomba yakho ikhula. Izilinganiso ze-architecture ezinezinkomba ezimbili kahle ngoba i-k-NN nosesho lombhalo lungathuthukiswa ngokuzimela.

Ukushuna ukunemba kokusesha

Shuna izisindo zosesho eziyingxube ezisuselwe esimweni sakho sokusebenzisa. Ukuhlukaniswa okuzenzakalelayo okungu-0.7/0.3 (ivekhtha/umbhalo) kuthanda ukufana kwe-semantic kuzimo eziningi. Uma unomaka bemethadatha bekhwalithi ephezulu, ukukhulisa isisindo sombhalo siye ku-0.5 kungathuthukisa imiphumela. Sincoma ukuthi uhlole ukulungiselelwa okuhlukile ngokuqukethwe kwakho okuthile ukuze uthole ibhalansi.

Hlanza

Ukuze ugweme izindleko eziqhubekayo, susa izinsiza ozidalile:

  1. Susa isizinda Sesevisi ye-OpenSearch kukhonsoli ye-Amazon OpenSearch Service
  2. Thulula futhi ususe amabhakede e-S3 asetshenziselwa amavidiyo nokushumeka
  3. Susa noma yiziphi izindima ze-IAM ezidalelwe lesi sixazululo

Qaphela ukuthi izindleko ze-Amazon Bedrock zisekelwe ekusetshenzisweni, ngakho-ke akukho ukuhlanza okudingekayo kumamodeli we-Amazon Bedrock ngokwawo.

Isiphetho

Lokhu kuhamba kuhlanganisa ukwakha isistimu yosesho lwevidiyo ye-multimodal yemibuzo yolimi lwemvelo kukho konke okuqukethwe kwevidiyo. Isixazululo sisebenzisa amamodeli we-Amazon Bedrock Nova ukwenza okushumekiwe. Lokhu kushumeka kuthwebula kokubili ukwaziswa okulalelwayo nokubukwayo, kugcinwe kahle Kusevisi ye-OpenSearch kusetshenziswa i-architecture enezinkomba ezimbili, futhi kunikeza izindlela zokusesha ezintathu ezimweni ezihlukene zokusetshenziswa.Indlela yokucubungula ye-async ikala ukuze kuphathwe amalabhulali amakhulu evidiyo, kanye nekhono lokusesha eliyingxubevange lihlanganisa ukufanisa okusemantic kanye namagama angukhiye ukuze kube nokunemba okukhulu. Ungakwazi ukunweba lesi sisekelo ngokungeza izici ezifana nokusesha okufana kwevidiyo nevidiyo, ukusebenzisa ukulondoloza isikhashana emibuzweni eseshwa njalo, noma ukuhlanganisa ne-AWS Batch ukuze kucutshungulwe ngokuhambisanayo kwamadathasethi amakhulu.

Ukuze ufunde kabanzi mayelana nobuchwepheshe obusetshenziswe kulesi sixazululo, bona I-Amazon Nova Multimodal Embeddings kanye Nosesho Oluhlanganisiwe nge-Amazon OpenSearch Service.


Mayelana nababhali

UHammad Ausaf

U-Hammad ungumakhi Oyinhloko Wezixazululo Kwezokuxhumana Nezokuzijabulisa. Ungumakhi onentshiseko futhi ulwela ukunikeza izixazululo ezingcono kakhulu kumakhasimende e-AWS.

Rajat Jain

URajat unguMphathi we-Akhawunti Yezobuchwepheshe kuMedia Nokuzijabulisa. Ungumshisekeli we-GenAI/ML futhi uthanda ukwakha izixazululo ezintsha.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button