ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๋ฐ˜์‘ํ˜•

๐Ÿงฉ MLOps ๊ธฐ๋ฐ˜ ํ€€ํŠธ AI ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• – MLflow + Docker + Streamlit Monitoring

์ง€๊ธˆ๊นŒ์ง€ ๋งŒ๋“  AI ํ€€ํŠธ ๋ชจ๋ธ์€
๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ → ์˜ˆ์ธก(Transformer) → ์ž์œจ ์šด์šฉ(RL) ๊นŒ์ง€ ์™„์„ฑ๋์Šต๋‹ˆ๋‹ค.

์ด์ œ ๋‚จ์€ ํ•œ ๋‹จ๊ณ„๋Š”, ์ด ๋ชจ๋“  ๋ชจ๋ธ์„
**“์ง€์†์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , ํ‰๊ฐ€ํ•˜๊ณ , ๋ฐฐํฌํ•˜๋Š” ์ž๋™ํ™”๋œ MLOps ์‹œ์Šคํ…œ”**์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๐ŸŽฏ ์ด๋ฒˆ ๊ธ€์˜ ๋ชฉํ‘œ

MLflow๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ ์‹คํ—˜, ์„ฑ๋Šฅ ๊ฒ€์ฆ, ๋ฒ„์ „ ๊ด€๋ฆฌ, ์ž๋™ ๋ฐฐํฌ๊นŒ์ง€ ์—ฐ๊ฒฐ


โš™๏ธ 1๏ธโƒฃ ํ•ต์‹ฌ ๊ตฌ์„ฑ๋„

๐Ÿ“Š PostgreSQL    →  ๋ฐ์ดํ„ฐ ์ €์žฅ  
๐Ÿค– MLflow        →  ๋ชจ๋ธ ํ•™์Šต ๋ฐ ๋ฒ„์ „ ๊ด€๋ฆฌ  
๐Ÿณ Docker        →  ๋ชจ๋ธ ์„œ๋น™ ํ™˜๊ฒฝ ํ†ตํ•ฉ  
๐Ÿ“ˆ Streamlit     →  ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋ง ๋Œ€์‹œ๋ณด๋“œ  

์ด์ œ ๋‹จ์ˆœํ•œ “์‹คํ—˜์šฉ ์ฝ”๋“œ”๊ฐ€ ์•„๋‹ˆ๋ผ
์šด์šฉ ๊ฐ€๋Šฅํ•œ AI ํˆฌ์ž ํ”Œ๋žซํผ์œผ๋กœ ์ง„ํ™”ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿงฑ 2๏ธโƒฃ MLflow ์„ค์น˜ ๋ฐ ์„œ๋ฒ„ ์‹คํ–‰

pip install mlflow psycopg2-binary

MLflow๋Š” **๋ชจ๋ธ ์‹คํ—˜ + ๋ฉ”ํŠธ๋ฆญ + ์•„ํ‹ฐํŒฉํŠธ(๋ชจ๋ธํŒŒ์ผ)**์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

mlflow server \
    --backend-store-uri postgresql://quant_user:quant_pass@localhost:5432/mlflow \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 --port 5001

์ด์ œ http://localhost:5001 ์—์„œ MLflow UI์— ์ ‘์† ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.


๐Ÿงฉ 3๏ธโƒฃ MLflow ํŠธ๋ž˜ํ‚น ์ฝ”๋“œ ์ถ”๊ฐ€

Transformer ๋˜๋Š” RL ๋ชจ๋ธ ํ•™์Šต ์ฝ”๋“œ์— ์•„๋ž˜์ฒ˜๋Ÿผ ์ถ”๊ฐ€ํ•˜๋ฉด
๊ฐ ์‹คํ—˜์ด MLflow์— ์ž๋™์œผ๋กœ ๊ธฐ๋ก๋ฉ๋‹ˆ๋‹ค.

import mlflow
import mlflow.pytorch

mlflow.set_tracking_uri("http://localhost:5001")
mlflow.set_experiment("quant_ai_experiments")

with mlflow.start_run():
    model = FactorTransformer(input_dim=4)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.MSELoss()

    for epoch in range(50):
        pred = model(X)
        loss = criterion(pred, y.mean(dim=1, keepdim=True))
        optimizer.zero_grad(); loss.backward(); optimizer.step()

    mlflow.log_metric("loss", loss.item())
    mlflow.pytorch.log_model(model, "model")
    print("โœ… ๋ชจ๋ธ ์ €์žฅ ์™„๋ฃŒ:", mlflow.active_run().info.run_uuid)

๐Ÿงฎ 4๏ธโƒฃ ๋ชจ๋ธ ๋ฒ„์ „ ๊ด€๋ฆฌ

๋ฐ˜์‘ํ˜•

MLflow๋Š” ๊ฐ ํ•™์Šต ์‹คํ–‰(run)๋งˆ๋‹ค ์ž๋™์œผ๋กœ ๋‹ค์Œ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:

ํ•ญ๋ชฉ ์„ค๋ช…

Parameters ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ (lr, epoch ๋“ฑ)
Metrics ์„ฑ๋Šฅ ์ง€ํ‘œ (loss, R², Sharpe ๋“ฑ)
Artifacts ํ•™์Šต๋œ ๋ชจ๋ธ ํŒŒ์ผ (.pt, .pkl ๋“ฑ)
Tags ๋ฒ„์ „ ์ •๋ณด (e.g. model_type=transformer, phase=prod)

๐Ÿ‘‰ ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ณผ๊ฑฐ ์‹คํ—˜์„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜ค๊ฑฐ๋‚˜,
“์„ฑ๊ณตํ•œ ๋ชจ๋ธ๋งŒ ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌ”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๐Ÿš€ 5๏ธโƒฃ Docker๋ฅผ ์ด์šฉํ•œ ๋ชจ๋ธ ์„œ๋น™

๐Ÿณ Dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install mlflow torch flask
CMD ["mlflow", "models", "serve", "-m", "models:/quant_ai/latest", "-h", "0.0.0.0", "-p", "6000"]

๐Ÿš€ ๋ชจ๋ธ ๋ฐฐํฌ

mlflow models serve -m models:/quant_ai/Production -p 6000

API ์—”๋“œํฌ์ธํŠธ ์˜ˆ์‹œ:

curl -X POST http://localhost:6000/invocations \
    -H "Content-Type: application/json" \
    -d '{"inputs": [[0.02, 0.01, -0.03, 0.04]]}'

→ ์˜ˆ์ธก ๊ฒฐ๊ณผ(JSON)์œผ๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.


๐Ÿงฉ 6๏ธโƒฃ Streamlit ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋ง

# monitor.py
import streamlit as st
import pandas as pd
import requests

API_URL = "http://localhost:6000/invocations"

st.title("๐Ÿ“Š Quant AI Monitoring Dashboard")

inputs = st.text_input("Enter factor vector (comma-separated):", "0.02,0.01,-0.03,0.04")
x = [[float(i) for i in inputs.split(",")]]

if st.button("Predict"):
    response = requests.post(API_URL, json={"inputs": x})
    st.json(response.json())

# ์ตœ๊ทผ ํ•™์Šต ๋กœ๊ทธ ํ‘œ์‹œ
logs = pd.read_csv("./mlruns/meta.yaml", sep=":")
st.subheader("Recent Training Metadata")
st.code(logs.head().to_string())

์ด Streamlit ๋Œ€์‹œ๋ณด๋“œ๋Š” ์‹ค์‹œ๊ฐ„ ์ถ”๋ก  ๊ฒฐ๊ณผ + ๋ชจ๋ธ ๋ฒ„์ „ ์ •๋ณด๋ฅผ ๋™์‹œ์— ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.


๐Ÿง  7๏ธโƒฃ MLOps ์›Œํฌํ”Œ๋กœ์šฐ ์ž๋™ํ™”

๋‹จ๊ณ„ ๋„๊ตฌ ์„ค๋ช…

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ Airflow / APScheduler ์ •๊ธฐ ์ˆ˜์ง‘
๋ชจ๋ธ ํ•™์Šต MLflow ์‹คํ—˜ ๋ฐ ๋ฒ„์ „ ๊ด€๋ฆฌ
๊ฒ€์ฆ pytest + MLflow metrics ์ž๋™ ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ
๋ฐฐํฌ Docker + MLflow Serve API ๋ฐฐํฌ
๋ชจ๋‹ˆํ„ฐ๋ง Streamlit + Slack ์‹ค์‹œ๊ฐ„ ๊ฐ์‹œ ๋ฐ ์•Œ๋ฆผ

๐Ÿงฐ 8๏ธโƒฃ ์šด์˜ ์ž๋™ํ™” (์˜ˆ์‹œ ์ฝ”๋“œ)

# ํ•˜๋ฃจ 1ํšŒ ์ž๋™ ํ•™์Šต + MLflow ๊ธฐ๋ก
0 6 * * * python train_model.py
# ๋ฐฐํฌ ์ž๋™ ๊ฐฑ์‹ 
0 7 * * * mlflow models serve -m models:/quant_ai/Production -p 6000

cron ์Šค์ผ€์ค„์„ ๊ฑธ๋ฉด, ๋งค์ผ ์ƒˆ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ณ 
์„ฑ๋Šฅ์ด ์ผ์ • ๊ธฐ์ค€ ์ด์ƒ์ด๋ฉด ์ž๋™์œผ๋กœ “Production” ํƒœ๊ทธ๋กœ ์Šน๊ฒฉ๋ฉ๋‹ˆ๋‹ค.


๐Ÿงฉ 9๏ธโƒฃ MLflow → Slack ์•Œ๋ฆผ ์—ฐ๊ฒฐ

def notify_slack(msg):
    requests.post(os.getenv("SLACK_WEBHOOK_URL"), json={"text": f"๐Ÿš€ {msg}"})

if loss.item() < 0.002:
    mlflow.register_model(f"runs:/{mlflow.active_run().info.run_id}/model", "quant_ai")
    notify_slack(f"์ƒˆ๋กœ์šด ๋ชจ๋ธ์ด ํ”„๋กœ๋•์…˜์œผ๋กœ ๋ฐฐํฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.")

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋œ ๋ชจ๋ธ๋งŒ ์ž๋™์œผ๋กœ ๋ฐฐํฌ๋˜๊ณ ,
Slack์œผ๋กœ ๋ฐฐํฌ ์„ฑ๊ณต ๋ฉ”์‹œ์ง€๊ฐ€ ๋ฐ”๋กœ ์˜ต๋‹ˆ๋‹ค.


๐Ÿ“Œ ์ •๋ฆฌ

๋‹จ๊ณ„ ๊ธฐ๋Šฅ ๋„๊ตฌ

1 ๋ชจ๋ธ ์‹คํ—˜·๋ฒ„์ „ ๊ด€๋ฆฌ MLflow
2 ๋ชจ๋ธ ์ž๋™ ๋ฐฐํฌ Docker + MLflow Serve
3 ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง Streamlit
4 ์ž๋™ ์•Œ๋ฆผ Slack
5 ์ง€์†์  ํ•™์Šต Cron / Airflow

๐Ÿ“˜ ๋‹ค์Œ ๊ธ€ ์˜ˆ๊ณ 

๋‹ค์Œ ํŽธ์—์„œ๋Š” **“AI ํ€€ํŠธ ์šด์˜ ์ž๋™ํ™” ์™„์„ฑ – Airflow DAG์œผ๋กœ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ํ†ตํ•ฉ”**์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
์ฆ‰, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ → ํ•™์Šต → ๊ฒ€์ฆ → ๋ฆฌํฌํŠธ → ๋ฐฐํฌ๊นŒ์ง€ ์ „ ๊ณผ์ •์„
ํ•˜๋‚˜์˜ DAG(Directed Acyclic Graph) ๋กœ ์ž๋™ํ™”ํ•ฉ๋‹ˆ๋‹ค.


 

MLflow,ํ€€ํŠธAI,๋ชจ๋ธ๋ฒ„์ „๊ด€๋ฆฌ,ํŒŒ์ด์ฌMLOps,์ž๋™๋ฐฐํฌ,Streamlit,PostgreSQL,๋„์ปค๋ชจ๋ธ์„œ๋น™,๋”ฅ๋Ÿฌ๋‹์šด์šฉ,AIํˆฌ์ž

 

โ€ป ์ด ํฌ์ŠคํŒ…์€ ์ฟ ํŒก ํŒŒํŠธ๋„ˆ์Šค ํ™œ๋™์˜ ์ผํ™˜์œผ๋กœ, ์ด์— ๋”ฐ๋ฅธ ์ผ์ •์•ก์˜ ์ˆ˜์ˆ˜๋ฃŒ๋ฅผ ์ œ๊ณต๋ฐ›์Šต๋‹ˆ๋‹ค.
๊ณต์ง€์‚ฌํ•ญ
์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
์ตœ๊ทผ์— ๋‹ฌ๋ฆฐ ๋Œ“๊ธ€
Total
Today
Yesterday
๋งํฌ
ยซ   2026/02   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
๊ธ€ ๋ณด๊ด€ํ•จ
๋ฐ˜์‘ํ˜•