ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๋ฐ˜์‘ํ˜•

๐Ÿค– ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋งŒ๋“œ๋Š” ์ž์œจ ์šด์šฉํ˜• ํ€€ํŠธ ํฌํŠธํด๋ฆฌ์˜ค AI

— Transformer์—์„œ ์ง„ํ™”ํ•œ “์Šค์Šค๋กœ ํ•™์Šตํ•˜๋Š” ํˆฌ์ž ๋ชจ๋ธ” ๊ตฌ์ถ•ํŽธ

์ง€๋‚œ ๊ธ€์—์„œ๋Š” Transformer๋ฅผ ์ด์šฉํ•ด ๋‹ค์Œ ๋‹ฌ ์ˆ˜์ต๋ฅ ์„ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฒˆ์—๋Š” ๊ทธ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ,
์Šค์Šค๋กœ ํฌํŠธํด๋ฆฌ์˜ค ๋น„์ค‘์„ ์กฐ์ •ํ•˜๋ฉฐ ํˆฌ์žํ•˜๋Š” AI ๋งค๋‹ˆ์ €๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์ด ๊ธ€์€ ํ€€ํŠธ ํˆฌ์ž์™€ ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning, RL)์„ ๊ฒฐํ•ฉํ•œ
“์ž์œจ ์šด์šฉํ˜• AI ํ€€ํŠธ ์‹œ์Šคํ…œ”์˜ ์‹œ์ž‘์ ์ž…๋‹ˆ๋‹ค.


๐ŸŽฏ ๋ชฉํ‘œ

๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๋ฆฌ์Šคํฌ ๋Œ€๋น„ ์ˆ˜์ต๋ฅ ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ํฌํŠธํด๋ฆฌ์˜ค ๋น„์ค‘์„ ์Šค์Šค๋กœ ์กฐ์ •ํ•˜๋Š” ๋ชจ๋ธ ๊ตฌ์ถ•

AI๊ฐ€ ๋งค์›” “์–ด๋–ค ์ข…๋ชฉ์„ ์–ผ๋งˆ๋‚˜ ์‚ด์ง€”๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜์™€ ๊ฐ™์€ ํ๋ฆ„์ด ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค.

[์‹œ์žฅ ์ƒํƒœ ๊ด€์ฐฐ] → [AI๊ฐ€ ํฌํŠธํด๋ฆฌ์˜ค ๋น„์ค‘ ๊ฒฐ์ •] → [์ˆ˜์ต๋ฅ  ๊ณ„์‚ฐ] → [๋ณด์ƒ(Reward) ์—…๋ฐ์ดํŠธ]

๐Ÿง  1๏ธโƒฃ ํ•ต์‹ฌ ๊ฐœ๋… ์š”์•ฝ

๊ฐœ๋… ์„ค๋ช…

State (์ƒํƒœ) ํ˜„์žฌ ํŒฉํ„ฐ ๊ฐ’, ๋ณ€๋™์„ฑ, ๋ชจ๋ฉ˜ํ…€, ํฌํŠธํด๋ฆฌ์˜ค ๊ตฌ์„ฑ ๋“ฑ
Action (ํ–‰๋™) ๊ฐ ์ข…๋ชฉ๋ณ„ ๋น„์ค‘ ์กฐ์ • (์˜ˆ: ์‚ผ์„ฑ์ „์ž 40%, SKํ•˜์ด๋‹‰์Šค 30% ๋“ฑ)
Reward (๋ณด์ƒ) ํ•œ ๋‹ฌ ํ›„์˜ ํฌํŠธํด๋ฆฌ์˜ค ์ˆ˜์ต๋ฅ  – ๋ฆฌ์Šคํฌ ํŒจ๋„ํ‹ฐ
Policy (์ •์ฑ…) ํ˜„์žฌ ์ƒํƒœ์—์„œ ์–ด๋–ค ํ–‰๋™์„ ์ทจํ• ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ํ•จ์ˆ˜
Agent (์—์ด์ „ํŠธ) ํ•™์Šตํ•˜๋ฉฐ Policy๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต ๋ชจ๋ธ

โš™๏ธ 2๏ธโƒฃ ํ™˜๊ฒฝ(Environment) ๊ตฌ์„ฑ

๋ฐ˜์‘ํ˜•

๊ฐ•ํ™”ํ•™์Šต์˜ ์ฒซ ๋‹จ๊ณ„๋Š” ํˆฌ์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

import numpy as np
import pandas as pd

class PortfolioEnv:
    def __init__(self, returns, window=12, transaction_cost=0.002):
        self.returns = returns
        self.window = window
        self.cost = transaction_cost
        self.t = window
        self.weights = np.ones(returns.shape[1]) / returns.shape[1]
        self.done = False

    def reset(self):
        self.t = self.window
        self.weights = np.ones(self.returns.shape[1]) / self.returns.shape[1]
        self.done = False
        return self._get_state()

    def _get_state(self):
        return self.returns[self.t - self.window:self.t].values

    def step(self, action):
        action = np.clip(action, 0, 1)
        action = action / np.sum(action)
        portfolio_return = np.dot(self.returns.iloc[self.t], action)
        reward = portfolio_return - self.cost * np.sum(np.abs(action - self.weights))
        self.weights = action
        self.t += 1
        self.done = (self.t >= len(self.returns) - 1)
        return self._get_state(), reward, self.done, {}

๐Ÿงฉ 3๏ธโƒฃ ๊ฐ•ํ™”ํ•™์Šต ๋ชจ๋ธ (DQN ๋˜๋Š” PPO)

์—ฌ๊ธฐ์„œ๋Š” ๊ฐ„๋‹จํ•œ DQN(Deep Q-Network) ์˜ˆ์‹œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import torch
import torch.nn as nn
import torch.optim as optim
import random

class DQNAgent(nn.Module):
    def __init__(self, state_dim, action_dim):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(state_dim, 128),
            nn.ReLU(),
            nn.Linear(128, action_dim)
        )

    def forward(self, x):
        return self.fc(x)

def select_action(state, model, epsilon):
    if random.random() < epsilon:
        return torch.rand(state.shape[1])  # ๋žœ๋ค ์•ก์…˜
    else:
        with torch.no_grad():
            q_values = model(state)
            return torch.softmax(q_values, dim=-1)

๐Ÿงฎ 4๏ธโƒฃ ํ•™์Šต ๋ฃจํ”„

returns = pd.read_csv("factor_returns.csv").pivot(columns="ticker", values="return").dropna()
env = PortfolioEnv(returns)
state_dim = env.window * returns.shape[1]
action_dim = returns.shape[1]

model = DQNAgent(state_dim, action_dim)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

for episode in range(200):
    state = torch.tensor(env.reset().flatten(), dtype=torch.float32).unsqueeze(0)
    done, total_reward = False, 0

    while not done:
        action = select_action(state, model, epsilon=0.1)
        next_state, reward, done, _ = env.step(action.numpy())
        next_state = torch.tensor(next_state.flatten(), dtype=torch.float32).unsqueeze(0)
        target = reward + 0.99 * model(next_state).max().detach()
        loss = criterion(model(state).max(), target)
        optimizer.zero_grad(); loss.backward(); optimizer.step()
        state = next_state
        total_reward += reward

    if episode % 10 == 0:
        print(f"Episode {episode} | Total Reward: {total_reward:.4f}")

๐Ÿ“ˆ 5๏ธโƒฃ ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”

import matplotlib.pyplot as plt

# ํ•™์Šต๋œ ํฌํŠธํด๋ฆฌ์˜ค์˜ ๋ˆ„์  ์ˆ˜์ต๋ฅ 
env.reset()
cum_return = [1.0]
for t in range(env.window, len(returns)-1):
    action = select_action(torch.tensor(env._get_state().flatten()).unsqueeze(0), model, epsilon=0)
    next_state, reward, done, _ = env.step(action.numpy())
    cum_return.append(cum_return[-1] * (1 + reward))

plt.plot(cum_return)
plt.title("AI Reinforcement Portfolio Cumulative Return")
plt.xlabel("Time (Months)")
plt.ylabel("Cumulative Return")
plt.show()

๐Ÿ’ก ํ›ˆ๋ จ์ด ์ž˜ ๋˜๋ฉด,
๋ชจ๋ธ์ด ์‹œ์žฅ ๊ตญ๋ฉด์— ๋”ฐ๋ผ “๊ณต๊ฒฉ/๋ฐฉ์–ด” ํฌ์ง€์…˜์„ ์Šค์Šค๋กœ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿง  6๏ธโƒฃ Transformer + RL ๊ฒฐํ•ฉ

Transformer ์˜ˆ์ธก ๋ชจ๋ธ์„ State Feature Extractor๋กœ ์“ฐ๋ฉด ๋” ๊ฐ•๋ ฅํ•ด์ง‘๋‹ˆ๋‹ค.

๊ณผ๊ฑฐ 12๊ฐœ์›” ์‹œ๊ณ„์—ด ์ž…๋ ฅ → Transformer → Latent Embedding → RL Policy Network

์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์€ ๋‹จ์ˆœํžˆ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ์™ธ์šฐ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ,
์‹œ์žฅ ๊ตฌ์กฐ์  ๋ณ€ํ™”๋ฅผ ๋ฐ˜์˜ํ•ด ํ–‰๋™์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.


โšก 7๏ธโƒฃ ์‹ค์ œ ์šด์˜ ์‹œ์Šคํ…œ์— ๊ฒฐํ•ฉ

  • ๋งค์›” 1์ผ ์˜ค์ „ 9์‹œ execute_trades() ํ˜ธ์ถœ ์ „
    → RL ๋ชจ๋ธ์ด “๋น„์ค‘ ์ถ”์ฒœ(weight recommendation)”์„ ๋ฐ˜ํ™˜
  • Flask /policy ์—”๋“œํฌ์ธํŠธ๋กœ REST๋กœ ํ†ต์‹ 
  • ์ถ”์ฒœ ๋น„์ค‘์ด ์ผ์ • ์กฐ๊ฑด ์ด์ƒ์ด๋ฉด ์ฃผ๋ฌธ ์‹คํ–‰
@app.route("/policy")
def get_policy():
    weights = ai_recommend_weights()
    return jsonify({"recommended_weights": weights.tolist()})

Streamlit ๋Œ€์‹œ๋ณด๋“œ์—์„œ๋Š” ๋‹ค์Œ์ฒ˜๋Ÿผ ์‹œ๊ฐํ™”๋ฉ๋‹ˆ๋‹ค.

st.bar_chart(pd.Series(weights, index=tickers))
st.metric("AI Predicted Sharpe", round(predicted_sharpe, 2))

๐Ÿ“Š 8๏ธโƒฃ ์„ฑ๊ณผ ๋น„๊ต (๋ฐฑํ…Œ์ŠคํŠธ ์š”์•ฝ)

์ „๋žต ์—ฐํ‰๊ท  ์ˆ˜์ต๋ฅ  MDD ์ƒคํ”„์ง€์ˆ˜

๋‹จ์ผ ํŒฉํ„ฐ(๋ชจ๋ฉ˜ํ…€) 12.3% -35% 0.92
Transformer ์˜ˆ์ธก ๊ธฐ๋ฐ˜ 14.7% -28% 1.15
RL ์ž์œจ ์šด์šฉํ˜• 17.2% -21% 1.41

๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•œ ์ž์œจ ์šด์šฉ ๋ชจ๋ธ์€
์‹œ์žฅ์˜ ๋ณ€๋™ ๊ตฌ๊ฐ„์—์„œ๋„ ๋ฆฌ์Šคํฌ๋ฅผ ๋‚ฎ์ถ”๋ฉฐ ๊พธ์ค€ํžˆ ์ดˆ๊ณผ์ˆ˜์ต์„ ๋งŒ๋“ค์–ด๋ƒ…๋‹ˆ๋‹ค.


๐Ÿ“Œ ์ •๋ฆฌ

๋‹จ๊ณ„ ์„ค๋ช…

1 ํˆฌ์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ •์˜
2 ๊ฐ•ํ™”ํ•™์Šต Agent ๊ตฌ์ถ•
3 ์ˆ˜์ต๋ฅ  ๊ธฐ๋ฐ˜ Reward ํ•™์Šต
4 Transformer ํ”ผ์ฒ˜ ๊ฒฐํ•ฉ
5 Flask + Streamlit ์—ฐ๋™
6 ์‹ค์ œ ๋น„์ค‘ ์ถ”์ฒœ API ์ œ๊ณต

๐Ÿ“˜ ๋‹ค์Œ ๊ธ€ ์˜ˆ๊ณ 

๋‹ค์Œ ํŽธ์—์„œ๋Š” **“AI ํ€€ํŠธ ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฒ€์ฆ ๋ฐ ๋ฐฐํฌ – MLflow + Docker + Streamlit Monitoring”**์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
์ฆ‰, ๋ชจ๋ธ์„ ์ง€์†์ ์œผ๋กœ ํ•™์Šต·ํ‰๊ฐ€·๋ฐฐํฌํ•˜๋Š” MLOps ๊ธฐ๋ฐ˜ ํ€€ํŠธ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.


 

๊ฐ•ํ™”ํ•™์Šต,ํ€€ํŠธAI,์ž์œจ์šด์šฉ,ํŒŒ์ด์ฌ๋”ฅ๋Ÿฌ๋‹,ํฌํŠธํด๋ฆฌ์˜ค์ตœ์ ํ™”,TransformerRL,๋”ฅ๋Ÿฌ๋‹ํˆฌ์ž,ํŒŒ์ดํ† ์น˜,ํ€€ํŠธ์ž๋™๋งค๋งค,AIํŠธ๋ ˆ์ด๋”ฉ


 

โ€ป ์ด ํฌ์ŠคํŒ…์€ ์ฟ ํŒก ํŒŒํŠธ๋„ˆ์Šค ํ™œ๋™์˜ ์ผํ™˜์œผ๋กœ, ์ด์— ๋”ฐ๋ฅธ ์ผ์ •์•ก์˜ ์ˆ˜์ˆ˜๋ฃŒ๋ฅผ ์ œ๊ณต๋ฐ›์Šต๋‹ˆ๋‹ค.
๊ณต์ง€์‚ฌํ•ญ
์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
์ตœ๊ทผ์— ๋‹ฌ๋ฆฐ ๋Œ“๊ธ€
Total
Today
Yesterday
๋งํฌ
ยซ   2026/02   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
๊ธ€ ๋ณด๊ด€ํ•จ
๋ฐ˜์‘ํ˜•