23 posts tagged with "Python"

Posts related to Python programming

Top 5 thư viện Python cần biết: Pandas, Numpy, Matplotlib, Yfinance, TA-Lib

July 1, 2025 · 4 min read

Fullstack

Python là một trong những ngôn ngữ lập trình phổ biến nhất hiện nay, đặc biệt trong lĩnh vực phân tích dữ liệu và khoa học dữ liệu. Dưới đây là 5 thư viện Python quan trọng mà mọi nhà phân tích dữ liệu cần biết.

1. Pandas

Pandas Logo

Pandas là thư viện mạnh mẽ cho việc thao tác và phân tích dữ liệu. Nó cung cấp các cấu trúc dữ liệu hiệu quả như DataFrame và Series.

Các tính năng chính:

Xử lý dữ liệu dạng bảng (DataFrame)
Đọc/ghi nhiều định dạng file (CSV, Excel, SQL, etc.)
Lọc và chuyển đổi dữ liệu
Xử lý dữ liệu thiếu
Phân tích thống kê cơ bản

Ví dụ code:

import pandas as pd

# Tạo DataFrame
df = pd.DataFrame({
    'Tên': ['An', 'Bình', 'Cường'],
    'Tuổi': [25, 30, 35],
    'Lương': [1000, 2000, 3000]
})

# Hiển thị thống kê cơ bản
print(df.describe())

2. NumPy

NumPy Logo

NumPy là thư viện cơ bản cho tính toán số học trong Python. Nó cung cấp các mảng đa chiều và các hàm toán học mạnh mẽ.

Các tính năng chính:

Mảng đa chiều (ndarray)
Tính toán vector hóa
Đại số tuyến tính
Xử lý tín hiệu số
Tích hợp với các thư viện khác

Ví dụ code:

import numpy as np

# Tạo mảng
arr = np.array([1, 2, 3, 4, 5])

# Tính toán vector hóa
print(arr * 2)  # Nhân mỗi phần tử với 2
print(np.mean(arr))  # Tính trung bình

3. Matplotlib

Matplotlib Logo

Matplotlib là thư viện vẽ đồ thị phổ biến nhất trong Python. Nó cho phép tạo các biểu đồ tĩnh, động và tương tác.

Các tính năng chính:

Vẽ đồ thị 2D và 3D
Tùy chỉnh giao diện đồ thị
Hỗ trợ nhiều định dạng xuất
Tích hợp với Jupyter Notebook
Tương thích với nhiều thư viện khác

Ví dụ code:

import matplotlib.pyplot as plt
import numpy as np

# Tạo dữ liệu
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Vẽ đồ thị
plt.plot(x, y)
plt.title('Đồ thị hàm sin')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.show()

4. Yfinance

Yfinance Logo

Yfinance là thư viện cho phép tải dữ liệu tài chính từ Yahoo Finance một cách dễ dàng.

Các tính năng chính:

Tải dữ liệu chứng khoán
Lấy thông tin công ty
Dữ liệu lịch sử giá
Thông tin cổ tức
Dữ liệu thị trường

Ví dụ code:

import yfinance as yf

# Tải dữ liệu cổ phiếu
msft = yf.Ticker("MSFT")
hist = msft.history(period="1mo")

# Hiển thị dữ liệu
print(hist.head())

5. TA-Lib

TA-Lib Logo

TA-Lib là thư viện mạnh mẽ cho phân tích kỹ thuật trong thị trường tài chính.

Các tính năng chính:

Chỉ báo kỹ thuật (RSI, MACD, Bollinger Bands)
Mẫu hình nến
Phân tích xu hướng
Tối ưu hóa hiệu suất
Tích hợp với Pandas

Ví dụ code:

import talib
import numpy as np

# Tính RSI
close_prices = np.array([...])  # Dữ liệu giá đóng cửa
rsi = talib.RSI(close_prices)

# Tính MACD
macd, macd_signal, macd_hist = talib.MACD(close_prices)

Kết luận

5 thư viện trên là nền tảng quan trọng cho việc phân tích dữ liệu và tài chính trong Python. Mỗi thư viện đều có thế mạnh riêng:

Pandas: Xử lý và phân tích dữ liệu
NumPy: Tính toán số học
Matplotlib: Trực quan hóa dữ liệu
Yfinance: Lấy dữ liệu tài chính
TA-Lib: Phân tích kỹ thuật

Việc kết hợp các thư viện này sẽ giúp bạn xây dựng các giải pháp phân tích dữ liệu mạnh mẽ và hiệu quả.

Tài liệu tham khảo

SQLAlchemy với SQL Server

July 1, 2025 · 11 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

SQLAlchemy với SQL Server

SQLAlchemy là một thư viện ORM (Object Relational Mapper) mạnh mẽ cho Python, cung cấp một bộ công cụ toàn diện để làm việc với cơ sở dữ liệu quan hệ như SQL Server, MySQL, PostgreSQL, và nhiều hệ quản trị cơ sở dữ liệu khác. Bài viết này sẽ hướng dẫn bạn cách sử dụng SQLAlchemy để tương tác với Microsoft SQL Server, từ thiết lập kết nối ban đầu đến thực hiện các thao tác CRUD (Create, Read, Update, Delete) phức tạp.

Giới thiệu về SQLAlchemy

SQLAlchemy với SQL Server

SQLAlchemy được thiết kế với hai thành phần chính:

Core: Cung cấp một SQL abstraction toolkit, cho phép tạo và thực thi các truy vấn SQL thông qua Python mà không cần viết trực tiếp các câu lệnh SQL.
ORM (Object Relational Mapper): Cho phép ánh xạ các bảng cơ sở dữ liệu thành các lớp Python và thao tác với dữ liệu như làm việc với các đối tượng Python thông thường.

SQLAlchemy mang lại nhiều lợi ích khi làm việc với SQL Server:

Tính di động: Mã nguồn có thể dễ dàng chuyển đổi giữa các cơ sở dữ liệu khác nhau
Bảo mật: Tự động xử lý các vấn đề bảo mật như SQL injection
Hiệu suất: Tối ưu hóa truy vấn và connection pooling
Mức độ trừu tượng: Làm việc với dữ liệu ở mức đối tượng thay vì viết SQL thuần
Hỗ trợ transaction: Quản lý transaction đơn giản và hiệu quả

Cài đặt các thành phần cần thiết

Để làm việc với SQL Server qua SQLAlchemy, bạn cần cài đặt các gói sau:

# Cài đặt SQLAlchemy
pip install sqlalchemy

# Cài đặt ODBC driver cho SQL Server
pip install pyodbc

# Cài đặt SQLAlchemy-ODBC
pip install sqlalchemy-pyodbc-mssql

Nếu bạn muốn sử dụng thư viện mới hơn và được khuyến nghị:

# Thay thế cho sqlalchemy-pyodbc-mssql
pip install pymssql

# Hoặc sử dụng với SQLAlchemy 2.0
pip install sqlalchemy[mssql]

Thiết lập kết nối đến SQL Server

1. Tạo URL kết nối

SQLAlchemy sử dụng URL để kết nối đến cơ sở dữ liệu. Dưới đây là cách tạo URL kết nối đến SQL Server:

from sqlalchemy import create_engine, URL
import urllib.parse

# Cách 1: Sử dụng URL trực tiếp
connection_string = "mssql+pyodbc://username:password@server_name/database_name?driver=ODBC+Driver+17+for+SQL+Server"

# Cách 2: Sử dụng dictionary và URL.create
connection_url = URL.create(
    "mssql+pyodbc",
    username="username",
    password="password",
    host="server_name",
    database="database_name",
    query={
        "driver": "ODBC Driver 17 for SQL Server",
        "TrustServerCertificate": "yes",
        "encrypt": "yes",
    },
)

# Cách 3: Sử dụng pyodbc connection string
params = urllib.parse.quote_plus(
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=server_name;"
    "DATABASE=database_name;"
    "UID=username;"
    "PWD=password;"
    "TrustServerCertificate=yes;"
)

connection_url = f"mssql+pyodbc:///?odbc_connect={params}"

2. Tạo Engine

Engine là thành phần trung tâm của SQLAlchemy, đại diện cho kết nối với cơ sở dữ liệu:

from sqlalchemy import create_engine

# Tạo engine từ URL kết nối
engine = create_engine(connection_url, echo=True)

Tham số echo=True giúp hiển thị các câu lệnh SQL được tạo ra, rất hữu ích khi gỡ lỗi.

3. Kiểm tra kết nối

# Kiểm tra kết nối
try:
    with engine.connect() as connection:
        result = connection.execute("SELECT @@VERSION")
        print(f"Kết nối thành công! Phiên bản SQL Server: {result.scalar()}")
except Exception as e:
    print(f"Lỗi kết nối: {e}")

Tạo mô hình dữ liệu (ORM)

1. Định nghĩa Base và Metadata

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Float, DateTime, ForeignKey
from sqlalchemy.orm import relationship
import datetime

# Tạo base class cho các model
Base = declarative_base()

# Hoặc trong SQLAlchemy 2.0+
from sqlalchemy.orm import DeclarativeBase

class Base(DeclarativeBase):
    pass

2. Định nghĩa các model

# Định nghĩa model cho bảng Customer
class Customer(Base):
    __tablename__ = 'customers'

    id = Column(Integer, primary_key=True)
    name = Column(String(100), nullable=False)
    email = Column(String(100), unique=True)
    phone = Column(String(20))
    created_at = Column(DateTime, default=datetime.datetime.utcnow)
    
    # Relationship với Order
    orders = relationship("Order", back_populates="customer")
    
    def __repr__(self):
        return f"<Customer(id={self.id}, name='{self.name}', email='{self.email}')>"

# Định nghĩa model cho bảng Order
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    customer_id = Column(Integer, ForeignKey('customers.id'))
    order_date = Column(DateTime, default=datetime.datetime.utcnow)
    total_amount = Column(Float, nullable=False)
    status = Column(String(20), default='pending')
    
    # Relationship với Customer
    customer = relationship("Customer", back_populates="orders")
    # Relationship với OrderItem
    items = relationship("OrderItem", back_populates="order")
    
    def __repr__(self):
        return f"<Order(id={self.id}, customer_id={self.customer_id}, total_amount={self.total_amount})>"

# Định nghĩa model cho bảng OrderItem
class OrderItem(Base):
    __tablename__ = 'order_items'
    
    id = Column(Integer, primary_key=True)
    order_id = Column(Integer, ForeignKey('orders.id'))
    product_name = Column(String(100), nullable=False)
    quantity = Column(Integer, nullable=False)
    unit_price = Column(Float, nullable=False)
    
    # Relationship với Order
    order = relationship("Order", back_populates="items")
    
    def __repr__(self):
        return f"<OrderItem(id={self.id}, order_id={self.order_id}, product_name='{self.product_name}')>"

3. Tạo bảng trong cơ sở dữ liệu

# Tạo tất cả các bảng theo mô hình đã định nghĩa
Base.metadata.create_all(engine)

Thao tác CRUD với SQLAlchemy ORM

1. Thiết lập Session

Session là cách SQLAlchemy quản lý các thao tác với cơ sở dữ liệu:

from sqlalchemy.orm import sessionmaker

# Tạo một lớp Session gắn với engine
Session = sessionmaker(bind=engine)

# Tạo một instance của Session
session = Session()

Trong SQLAlchemy 2.0+, bạn có thể sử dụng:

from sqlalchemy.orm import Session

# Sử dụng context manager
with Session(engine) as session:
    # Thực hiện các thao tác với session
    pass

2. Thêm dữ liệu (Create)

# Tạo một khách hàng mới
new_customer = Customer(
    name="Nguyễn Văn A",
    email="nguyenvana@example.com",
    phone="0123456789"
)

# Thêm khách hàng vào session
session.add(new_customer)

# Hoặc thêm nhiều đối tượng cùng lúc
session.add_all([
    Customer(name="Trần Thị B", email="tranthib@example.com", phone="0987654321"),
    Customer(name="Lê Văn C", email="levanc@example.com", phone="0369874125")
])

# Lưu các thay đổi vào cơ sở dữ liệu
session.commit()

3. Truy vấn dữ liệu (Read)

# Truy vấn tất cả khách hàng
all_customers = session.query(Customer).all()
for customer in all_customers:
    print(customer)

# Truy vấn với điều kiện
customer = session.query(Customer).filter(Customer.email == "nguyenvana@example.com").first()
print(f"Tìm thấy khách hàng: {customer.name}")

# Sử dụng các toán tử lọc phức tạp
from sqlalchemy import or_, and_

customers = session.query(Customer).filter(
    or_(
        Customer.name.like("Nguyễn%"),
        and_(
            Customer.email.like("%@example.com"),
            Customer.phone.startswith("01")
        )
    )
).all()

# Truy vấn với join
orders_with_customers = session.query(Order, Customer).join(Customer).all()
for order, customer in orders_with_customers:
    print(f"Đơn hàng {order.id} thuộc về khách hàng {customer.name}")

# Truy vấn với aggregation
from sqlalchemy import func
total_orders = session.query(func.count(Order.id)).scalar()
print(f"Tổng số đơn hàng: {total_orders}")

# Tính tổng doanh thu theo khách hàng
revenue_by_customer = session.query(
    Customer.name,
    func.sum(Order.total_amount).label('total_revenue')
).join(Order).group_by(Customer.name).order_by(func.sum(Order.total_amount).desc()).all()

for name, revenue in revenue_by_customer:
    print(f"Khách hàng: {name}, Tổng doanh thu: {revenue}")

4. Cập nhật dữ liệu (Update)

# Cập nhật thông tin một khách hàng
customer = session.query(Customer).filter(Customer.email == "nguyenvana@example.com").first()
if customer:
    customer.phone = "0123123123"
    session.commit()
    print(f"Đã cập nhật số điện thoại của khách hàng {customer.name}")

# Cập nhật hàng loạt
affected_rows = session.query(Order).filter(Order.status == "pending").update(
    {"status": "processing"},
    synchronize_session=False
)
session.commit()
print(f"Đã cập nhật {affected_rows} đơn hàng từ 'pending' sang 'processing'")

5. Xóa dữ liệu (Delete)

# Xóa một đơn hàng cụ thể
order = session.query(Order).filter(Order.id == 1).first()
if order:
    session.delete(order)
    session.commit()
    print("Đã xóa đơn hàng")

# Xóa hàng loạt
deleted_count = session.query(OrderItem).filter(OrderItem.unit_price < 10000).delete(
    synchronize_session=False
)
session.commit()
print(f"Đã xóa {deleted_count} sản phẩm có giá dưới 10.000")

Xử lý transaction và lỗi

1. Sử dụng transaction

# Sử dụng context manager để quản lý transaction
from sqlalchemy.orm import Session

try:
    with Session(engine) as session:
        # Thêm khách hàng mới
        new_customer = Customer(name="Khách hàng mới", email="khachhang@example.com")
        session.add(new_customer)
        
        # Thêm đơn hàng cho khách hàng này
        new_order = Order(customer=new_customer, total_amount=150000, status="new")
        session.add(new_order)
        
        # Thêm các mặt hàng trong đơn hàng
        session.add_all([
            OrderItem(order=new_order, product_name="Sản phẩm A", quantity=2, unit_price=50000),
            OrderItem(order=new_order, product_name="Sản phẩm B", quantity=1, unit_price=50000)
        ])
        
        # Lưu tất cả các thay đổi (tự động commit khi kết thúc block with)
        session.commit()
        print("Đã thêm khách hàng và đơn hàng thành công")
except Exception as e:
    # Transaction sẽ tự động rollback khi có lỗi
    print(f"Lỗi: {e}")
    # Không cần gọi session.rollback() vì context manager sẽ tự động xử lý

2. Xử lý commit và rollback thủ công

# Xử lý transaction thủ công
session = Session()
try:
    # Tạo một khách hàng mới
    new_customer = Customer(name="Khách hàng thủ công", email="manual@example.com")
    session.add(new_customer)
    
    # Thêm đơn hàng
    new_order = Order(customer=new_customer, total_amount=200000)
    session.add(new_order)
    
    # Commit transaction
    session.commit()
    print("Transaction thành công")
    
except Exception as e:
    # Rollback khi có lỗi
    session.rollback()
    print(f"Transaction thất bại: {e}")
    
finally:
    # Đóng session
    session.close()

Các tính năng nâng cao của SQLAlchemy

1. Sử dụng SQLAlchemy Core (Expression Language)

Ngoài ORM, bạn có thể sử dụng SQLAlchemy Core để làm việc với cú pháp biểu thức gần hơn với SQL:

from sqlalchemy import Table, MetaData, select, join

# Tạo metadata
metadata = MetaData()

# Định nghĩa bảng theo cách thủ công
customers = Table('customers', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String(100)),
    Column('email', String(100)),
    Column('phone', String(20))
)

orders = Table('orders', metadata,
    Column('id', Integer, primary_key=True),
    Column('customer_id', Integer, ForeignKey('customers.id')),
    Column('total_amount', Float),
    Column('status', String(20))
)

# Tạo một truy vấn select
query = select(customers.c.name, orders.c.total_amount).select_from(
    join(customers, orders, customers.c.id == orders.c.customer_id)
).where(
    orders.c.status == 'completed'
)

# Thực thi truy vấn
with engine.connect() as conn:
    result = conn.execute(query)
    for row in result:
        print(f"Khách hàng: {row.name}, Giá trị đơn hàng: {row.total_amount}")

2. Tạo index và constraints

from sqlalchemy import Index, UniqueConstraint

# Thêm Index và UniqueConstraint vào model
class Product(Base):
    __tablename__ = 'products'
    
    id = Column(Integer, primary_key=True)
    sku = Column(String(50), nullable=False)
    name = Column(String(100), nullable=False)
    price = Column(Float, nullable=False)
    category = Column(String(50))
    
    # Thêm uniqueness constraint
    __table_args__ = (
        UniqueConstraint('sku', name='uq_product_sku'),
        # Thêm index cho tìm kiếm nhanh theo tên sản phẩm
        Index('idx_product_name', 'name'),
        # Thêm index cho category và price
        Index('idx_product_category_price', 'category', 'price')
    )

3. Sử dụng lazy loading và eager loading

# Lazy loading - mặc định
customer = session.query(Customer).filter(Customer.id == 1).first()
# Các đơn hàng sẽ được load khi truy cập
for order in customer.orders:  # Thêm truy vấn SQL được thực hiện ở đây
    print(order)

# Eager loading với joinedload
from sqlalchemy.orm import joinedload

# Load khách hàng và đơn hàng cùng lúc
customer = session.query(Customer).options(
    joinedload(Customer.orders)
).filter(Customer.id == 1).first()

# Không có thêm truy vấn SQL khi truy cập
for order in customer.orders:
    print(order)

# Eager loading với nesting (lồng nhau)
customer = session.query(Customer).options(
    joinedload(Customer.orders).joinedload(Order.items)
).filter(Customer.id == 1).first()

# Tất cả dữ liệu đã được load, không cần thêm truy vấn
for order in customer.orders:
    for item in order.items:
        print(item)

4. Sử dụng events

from sqlalchemy import event

# Event trước khi insert
@event.listens_for(Customer, 'before_insert')
def before_customer_insert(mapper, connection, customer):
    print(f"Chuẩn bị thêm khách hàng: {customer.name}")
    # Có thể thêm logic xử lý ở đây, ví dụ: chuẩn hóa email
    customer.email = customer.email.lower()

# Event sau khi insert
@event.listens_for(Customer, 'after_insert')
def after_customer_insert(mapper, connection, customer):
    print(f"Đã thêm khách hàng: {customer.name} với ID = {customer.id}")

Các thực hành tốt nhất và mẹo khi sử dụng SQLAlchemy với SQL Server

1. Sử dụng connection pooling

# Cấu hình pool khi tạo engine
engine = create_engine(
    connection_url,
    pool_size=10,  # Số kết nối tối đa trong pool
    max_overflow=20,  # Số kết nối có thể tạo thêm khi pool đầy
    pool_timeout=30,  # Thời gian chờ kết nối (giây)
    pool_recycle=1800  # Thời gian tái sử dụng kết nối (giây)
)

2. Sử dụng bulk operations cho hiệu suất cao

# Thêm hàng loạt dữ liệu hiệu quả
products = [
    Product(sku=f"PRD-{i}", name=f"Sản phẩm {i}", price=10000 + i * 1000, category="Electronics")
    for i in range(1, 1001)
]

# Sử dụng bulk_save_objects thay vì add_all
session.bulk_save_objects(products)
session.commit()

3. Quản lý migration với Alembic

Alembic là công cụ migration được phát triển bởi tác giả của SQLAlchemy:

# Cài đặt Alembic
pip install alembic

# Khởi tạo Alembic
alembic init migrations

Trong file env.py của Alembic, cấu hình metadata:

from sqlalchemy import engine_from_config, pool
from models import Base  # Import Base từ module models của bạn

# Thiết lập target metadata
target_metadata = Base.metadata

Tạo migration và áp dụng:

# Tạo migration script
alembic revision --autogenerate -m "Create initial tables"

# Áp dụng migration
alembic upgrade head

4. Sử dụng stored procedures

from sqlalchemy import text

# Gọi stored procedure
with engine.connect() as conn:
    result = conn.execute(
        text("EXEC GetCustomerOrders :customer_id"),
        {"customer_id": 1}
    )
    for row in result:
        print(row)

Kết luận

SQLAlchemy cung cấp một cách mạnh mẽ và linh hoạt để làm việc với SQL Server từ Python. Bằng cách sử dụng ORM, bạn có thể tập trung vào logic nghiệp vụ của ứng dụng thay vì viết các câu lệnh SQL thủ công. Tuy nhiên, SQLAlchemy cũng rất linh hoạt, cho phép bạn viết truy vấn SQL thuần khi cần thiết.

Ngoài ra, SQLAlchemy còn có nhiều tính năng nâng cao như relationship, eager loading, connection pooling, và event listeners, giúp bạn xây dựng các ứng dụng có hiệu suất cao và dễ bảo trì.

Khi làm việc với SQL Server, hãy nhớ cấu hình các tham số kết nối phù hợp và sử dụng các trình điều khiển như pyodbc hoặc pymssql cho hiệu suất tốt nhất.

Bạn đã có kinh nghiệm sử dụng SQLAlchemy chưa? Thư viện ORM này đã giúp ích như thế nào trong các dự án của bạn? Hãy chia sẻ trong phần bình luận nhé!

Phân tích danh mục đầu tư với Python – Dữ liệu, hiệu suất, phân bổ

July 1, 2025 · 5 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Phân tích danh mục đầu tư là một phần quan trọng trong quản lý tài chính. Với Python, chúng ta có thể thực hiện các phân tích phức tạp một cách hiệu quả. Bài viết này sẽ hướng dẫn bạn cách sử dụng Python để phân tích danh mục đầu tư từ việc thu thập dữ liệu đến đánh giá hiệu suất và tối ưu hóa phân bổ.

1. Thu thập dữ liệu

Cài đặt thư viện cần thiết

pip install yfinance pandas numpy matplotlib seaborn scipy

Sử dụng yfinance để lấy dữ liệu chứng khoán

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import minimize

# Định nghĩa danh mục đầu tư
portfolio = {
    'AAPL': 0.3,  # Apple
    'MSFT': 0.3,  # Microsoft
    'GOOGL': 0.2, # Google
    'AMZN': 0.2   # Amazon
}

# Lấy dữ liệu lịch sử
data = pd.DataFrame()
for ticker in portfolio.keys():
    stock = yf.Ticker(ticker)
    hist = stock.history(period='1y')
    data[ticker] = hist['Close']

# Tính toán lợi nhuận hàng ngày
returns = data.pct_change()

# Hiển thị dữ liệu
print("Dữ liệu giá đóng cửa:")
print(data.head())
print("\nLợi nhuận hàng ngày:")
print(returns.head())

Dữ liệu giá đóng cửa

2. Phân tích hiệu suất

Tính toán các chỉ số quan trọng

# Lợi nhuận trung bình hàng năm
annual_returns = returns.mean() * 252

# Độ lệch chuẩn (rủi ro)
volatility = returns.std() * np.sqrt(252)

# Tỷ lệ Sharpe (giả sử lãi suất phi rủi ro là 0.02)
risk_free_rate = 0.02
sharpe_ratio = (annual_returns - risk_free_rate) / volatility

# Tạo bảng tổng hợp
performance = pd.DataFrame({
    'Lợi nhuận hàng năm': annual_returns,
    'Độ biến động': volatility,
    'Tỷ lệ Sharpe': sharpe_ratio
})

# Hiển thị kết quả
print("\nPhân tích hiệu suất:")
print(performance)

# Vẽ biểu đồ so sánh
plt.figure(figsize=(12, 6))
performance['Lợi nhuận hàng năm'].plot(kind='bar')
plt.title('Lợi nhuận hàng năm của các tài sản')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Phân tích hiệu suất

3. Phân tích tương quan

Xem xét mối quan hệ giữa các tài sản

# Ma trận tương quan
correlation_matrix = returns.corr()

# Vẽ biểu đồ nhiệt
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Ma trận tương quan giữa các tài sản')
plt.tight_layout()
plt.show()

# Vẽ biểu đồ phân tán
sns.pairplot(returns)
plt.suptitle('Phân tích phân tán giữa các tài sản', y=1.02)
plt.show()

Ma trận tương quan

4. Tối ưu hóa danh mục đầu tư

Sử dụng Modern Portfolio Theory (MPT)

def portfolio_volatility(weights, returns):
    return np.sqrt(np.dot(weights.T, np.dot(returns.cov() * 252, weights)))

def negative_sharpe(weights, returns, risk_free_rate):
    returns_array = returns.mean() * 252
    volatility = portfolio_volatility(weights, returns)
    return -(returns_array.dot(weights) - risk_free_rate) / volatility

# Tối ưu hóa danh mục
n_assets = len(portfolio)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
bounds = tuple((0, 1) for _ in range(n_assets))
initial_weights = np.array([1/n_assets] * n_assets)

optimal_weights = minimize(
    negative_sharpe,
    initial_weights,
    args=(returns, risk_free_rate),
    method='SLSQP',
    bounds=bounds,
    constraints=constraints
)

# Hiển thị kết quả tối ưu
print("\nPhân bổ tài sản tối ưu:")
for ticker, weight in zip(portfolio.keys(), optimal_weights.x):
    print(f"{ticker}: {weight:.2%}")

# Vẽ biểu đồ phân bổ
plt.figure(figsize=(10, 6))
plt.pie(optimal_weights.x, labels=portfolio.keys(), autopct='%1.1f%%')
plt.title('Phân bổ tài sản tối ưu')
plt.show()

Phân bổ tài sản tối ưu

5. Phân tích rủi ro

Đánh giá rủi ro danh mục

# Value at Risk (VaR)
def calculate_var(returns, weights, confidence_level=0.95):
    portfolio_returns = returns.dot(weights)
    return np.percentile(portfolio_returns, (1 - confidence_level) * 100)

# Expected Shortfall (ES)
def calculate_es(returns, weights, confidence_level=0.95):
    portfolio_returns = returns.dot(weights)
    var = calculate_var(returns, weights, confidence_level)
    return portfolio_returns[portfolio_returns <= var].mean()

# Tính toán các chỉ số rủi ro
var_95 = calculate_var(returns, optimal_weights.x)
es_95 = calculate_es(returns, optimal_weights.x)

print(f"\nValue at Risk (95%): {var_95:.2%}")
print(f"Expected Shortfall (95%): {es_95:.2%}")

# Vẽ biểu đồ phân phối lợi nhuận
portfolio_returns = returns.dot(optimal_weights.x)
plt.figure(figsize=(10, 6))
sns.histplot(portfolio_returns, kde=True)
plt.axvline(var_95, color='r', linestyle='--', label=f'VaR (95%): {var_95:.2%}')
plt.axvline(es_95, color='g', linestyle='--', label=f'ES (95%): {es_95:.2%}')
plt.title('Phân phối lợi nhuận danh mục')
plt.xlabel('Lợi nhuận')
plt.ylabel('Tần suất')
plt.legend()
plt.show()

Phân tích rủi ro

6. Trực quan hóa kết quả

Tạo biểu đồ hiệu suất và phân bổ

# Biểu đồ hiệu suất tích lũy
cumulative_returns = (1 + returns).cumprod()
plt.figure(figsize=(12, 6))
for column in cumulative_returns.columns:
    plt.plot(cumulative_returns.index, cumulative_returns[column], label=column)
plt.title('Hiệu suất tích lũy của danh mục')
plt.xlabel('Ngày')
plt.ylabel('Hiệu suất tích lũy')
plt.legend()
plt.grid(True)
plt.show()

# Biểu đồ phân bổ tài sản
plt.figure(figsize=(10, 6))
plt.pie(optimal_weights.x, labels=portfolio.keys(), autopct='%1.1f%%')
plt.title('Phân bổ tài sản tối ưu')
plt.show()

# Biểu đồ so sánh hiệu suất
plt.figure(figsize=(12, 6))
performance['Lợi nhuận hàng năm'].plot(kind='bar')
plt.title('So sánh lợi nhuận hàng năm')
plt.xlabel('Tài sản')
plt.ylabel('Lợi nhuận hàng năm')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

Hiệu suất tích lũy

Kết luận

Phân tích danh mục đầu tư với Python cung cấp cho chúng ta các công cụ mạnh mẽ để:

Thu thập và xử lý dữ liệu thị trường
Đánh giá hiệu suất và rủi ro
Tối ưu hóa phân bổ tài sản
Trực quan hóa kết quả

Việc kết hợp các thư viện như pandas, numpy, yfinance và matplotlib cho phép chúng ta thực hiện các phân tích phức tạp một cách hiệu quả và dễ hiểu.

Tài liệu tham khảo

Hướng dẫn lấy dữ liệu cổ phiếu từ Yahoo Finance bằng Python

July 1, 2025 · 4 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Yahoo Finance là một nguồn dữ liệu tài chính phong phú và miễn phí. Với thư viện yfinance của Python, chúng ta có thể dễ dàng truy cập và phân tích dữ liệu thị trường. Bài viết này sẽ hướng dẫn bạn cách sử dụng yfinance để lấy và xử lý dữ liệu cổ phiếu.

1. Cài đặt và thiết lập

Cài đặt thư viện yfinance

pip install yfinance pandas numpy matplotlib seaborn

Import các thư viện cần thiết

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

2. Lấy dữ liệu cơ bản

Lấy thông tin cổ phiếu

# Tạo đối tượng Ticker
aapl = yf.Ticker("AAPL")

# Lấy thông tin cơ bản
info = aapl.info
print("Thông tin cơ bản:")
print(f"Tên công ty: {info['longName']}")
print(f"Ngành: {info['industry']}")
print(f"Giá hiện tại: ${info['currentPrice']}")
print(f"Vốn hóa thị trường: ${info['marketCap']:,.2f}")

Thông tin cơ bản

Lấy dữ liệu lịch sử

# Lấy dữ liệu 1 năm gần nhất
hist = aapl.history(period="1y")
print("\nDữ liệu lịch sử:")
print(hist.head())

# Vẽ biểu đồ giá đóng cửa
plt.figure(figsize=(12, 6))
plt.plot(hist.index, hist['Close'])
plt.title('Giá đóng cửa AAPL trong 1 năm')
plt.xlabel('Ngày')
plt.ylabel('Giá ($)')
plt.grid(True)
plt.show()

Dữ liệu lịch sử

3. Lấy dữ liệu nâng cao

Lấy dữ liệu nhiều cổ phiếu

# Định nghĩa danh sách cổ phiếu
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']

# Lấy dữ liệu cho nhiều cổ phiếu
data = pd.DataFrame()
for ticker in tickers:
    stock = yf.Ticker(ticker)
    hist = stock.history(period='1y')
    data[ticker] = hist['Close']

# Tính toán lợi nhuận hàng ngày
returns = data.pct_change()

# Vẽ biểu đồ so sánh
plt.figure(figsize=(12, 6))
for column in data.columns:
    plt.plot(data.index, data[column], label=column)
plt.title('So sánh giá đóng cửa')
plt.xlabel('Ngày')
plt.ylabel('Giá ($)')
plt.legend()
plt.grid(True)
plt.show()

So sánh nhiều cổ phiếu

Lấy dữ liệu theo khoảng thời gian tùy chỉnh

# Định nghĩa khoảng thời gian
start_date = '2020-01-01'
end_date = '2023-12-31'

# Lấy dữ liệu theo khoảng thời gian
hist = aapl.history(start=start_date, end=end_date)

# Tính toán các chỉ số
hist['Daily_Return'] = hist['Close'].pct_change()
hist['Cumulative_Return'] = (1 + hist['Daily_Return']).cumprod()

# Vẽ biểu đồ lợi nhuận tích lũy
plt.figure(figsize=(12, 6))
plt.plot(hist.index, hist['Cumulative_Return'])
plt.title('Lợi nhuận tích lũy AAPL')
plt.xlabel('Ngày')
plt.ylabel('Lợi nhuận tích lũy')
plt.grid(True)
plt.show()

Lợi nhuận tích lũy

4. Phân tích dữ liệu

Phân tích biến động

# Tính toán các chỉ số thống kê
stats = pd.DataFrame({
    'Giá trung bình': hist['Close'].mean(),
    'Độ lệch chuẩn': hist['Close'].std(),
    'Giá cao nhất': hist['Close'].max(),
    'Giá thấp nhất': hist['Close'].min(),
    'Biến động trung bình': hist['Daily_Return'].std() * np.sqrt(252)
})

print("\nThống kê cơ bản:")
print(stats)

# Vẽ biểu đồ phân phối lợi nhuận
plt.figure(figsize=(12, 6))
sns.histplot(hist['Daily_Return'].dropna(), kde=True)
plt.title('Phân phối lợi nhuận hàng ngày')
plt.xlabel('Lợi nhuận')
plt.ylabel('Tần suất')
plt.show()

Phân tích biến động

Phân tích tương quan

# Tính toán ma trận tương quan
correlation = returns.corr()

# Vẽ biểu đồ nhiệt
plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0)
plt.title('Ma trận tương quan giữa các cổ phiếu')
plt.show()

Ma trận tương quan

5. Lấy dữ liệu bổ sung

Lấy dữ liệu tài chính

# Lấy báo cáo tài chính
financials = aapl.financials
balance_sheet = aapl.balance_sheet
cash_flow = aapl.cashflow

print("\nBáo cáo tài chính:")
print(financials.head())

# Vẽ biểu đồ doanh thu
plt.figure(figsize=(12, 6))
plt.bar(financials.columns, financials.loc['Total Revenue'])
plt.title('Doanh thu theo quý')
plt.xlabel('Quý')
plt.ylabel('Doanh thu ($)')
plt.xticks(rotation=45)
plt.show()

Báo cáo tài chính

Lấy dữ liệu cổ tức

# Lấy thông tin cổ tức
dividends = aapl.dividends

# Vẽ biểu đồ cổ tức
plt.figure(figsize=(12, 6))
plt.bar(dividends.index, dividends)
plt.title('Lịch sử cổ tức')
plt.xlabel('Ngày')
plt.ylabel('Cổ tức ($)')
plt.grid(True)
plt.show()

Lịch sử cổ tức

6. Xử lý dữ liệu thời gian thực

Lấy dữ liệu realtime

# Lấy dữ liệu realtime
ticker = yf.Ticker("AAPL")
realtime = ticker.history(period="1d", interval="1m")

# Vẽ biểu đồ giá trong ngày
plt.figure(figsize=(12, 6))
plt.plot(realtime.index, realtime['Close'])
plt.title('Giá AAPL trong ngày')
plt.xlabel('Thời gian')
plt.ylabel('Giá ($)')
plt.grid(True)
plt.show()

Dữ liệu realtime

Kết luận

Thư viện yfinance cung cấp một cách đơn giản và hiệu quả để truy cập dữ liệu tài chính từ Yahoo Finance. Với Python, chúng ta có thể:

Lấy thông tin cơ bản về cổ phiếu
Truy cập dữ liệu lịch sử
Phân tích biến động và tương quan
Xem báo cáo tài chính
Theo dõi dữ liệu thời gian thực

Tài liệu tham khảo

Chiến lược giao dịch với Ichimoku Cloud trong Python

July 1, 2025 · 4 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Ichimoku Cloud (Kumo) là một chỉ báo kỹ thuật phức tạp được phát triển bởi Goichi Hosoda vào những năm 1960. Nó cung cấp một cái nhìn toàn diện về thị trường bằng cách kết hợp nhiều thành phần khác nhau để xác định xu hướng, hỗ trợ/kháng cự và tín hiệu giao dịch. Trong bài viết này, chúng ta sẽ tìm hiểu cách triển khai chiến lược giao dịch Ichimoku Cloud bằng Python.

1. Các thành phần của Ichimoku Cloud

Ichimoku Cloud bao gồm 5 thành phần chính:

Tenkan-sen (Conversion Line): Đường chuyển đổi, được tính bằng trung bình của mức cao nhất và thấp nhất trong 9 kỳ.
Kijun-sen (Base Line): Đường cơ sở, được tính bằng trung bình của mức cao nhất và thấp nhất trong 26 kỳ.
Senkou Span A (Leading Span A): Đường dẫn A, được tính bằng trung bình của Tenkan-sen và Kijun-sen, dịch chuyển 26 kỳ về phía trước.
Senkou Span B (Leading Span B): Đường dẫn B, được tính bằng trung bình của mức cao nhất và thấp nhất trong 52 kỳ, dịch chuyển 26 kỳ về phía trước.
Chikou Span (Lagging Span): Đường trễ, là giá đóng cửa dịch chuyển 26 kỳ về phía sau.

Ichimoku Cloud Components

2. Triển khai Ichimoku Cloud trong Python

Đầu tiên, chúng ta cần cài đặt các thư viện cần thiết:

import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt

Hàm tính toán các thành phần của Ichimoku Cloud:

def calculate_ichimoku(df, tenkan_period=9, kijun_period=26, senkou_span_b_period=52, displacement=26):
    # Tenkan-sen (Conversion Line)
    tenkan_sen_high = df['High'].rolling(window=tenkan_period).max()
    tenkan_sen_low = df['Low'].rolling(window=tenkan_period).min()
    df['tenkan_sen'] = (tenkan_sen_high + tenkan_sen_low) / 2

    # Kijun-sen (Base Line)
    kijun_sen_high = df['High'].rolling(window=kijun_period).max()
    kijun_sen_low = df['Low'].rolling(window=kijun_period).min()
    df['kijun_sen'] = (kijun_sen_high + kijun_sen_low) / 2

    # Senkou Span A (Leading Span A)
    df['senkou_span_a'] = ((df['tenkan_sen'] + df['kijun_sen']) / 2).shift(displacement)

    # Senkou Span B (Leading Span B)
    senkou_span_b_high = df['High'].rolling(window=senkou_span_b_period).max()
    senkou_span_b_low = df['Low'].rolling(window=senkou_span_b_period).min()
    df['senkou_span_b'] = ((senkou_span_b_high + senkou_span_b_low) / 2).shift(displacement)

    # Chikou Span (Lagging Span)
    df['chikou_span'] = df['Close'].shift(-displacement)

    return df

3. Chiến lược giao dịch

Có một số chiến lược giao dịch phổ biến với Ichimoku Cloud:

3.1. Chiến lược Kumo Breakout

Tín hiệu mua: Giá phá vỡ phía trên Kumo (đám mây)
Tín hiệu bán: Giá phá vỡ phía dưới Kumo

Ichimoku Kumo Breakout

3.2. Chiến lược TK Cross

Tín hiệu mua: Tenkan-sen cắt lên trên Kijun-sen
Tín hiệu bán: Tenkan-sen cắt xuống dưới Kijun-sen

Ichimoku TK Cross

4. Triển khai chiến lược giao dịch

def generate_signals(df):
    signals = pd.DataFrame(index=df.index)
    signals['signal'] = 0

    # Kumo Breakout Strategy
    signals['kumo_breakout'] = 0
    signals.loc[df['Close'] > df[['senkou_span_a', 'senkou_span_b']].max(axis=1), 'kumo_breakout'] = 1
    signals.loc[df['Close'] < df[['senkou_span_a', 'senkou_span_b']].min(axis=1), 'kumo_breakout'] = -1

    # TK Cross Strategy
    signals['tk_cross'] = 0
    signals.loc[df['tenkan_sen'] > df['kijun_sen'], 'tk_cross'] = 1
    signals.loc[df['tenkan_sen'] < df['kijun_sen'], 'tk_cross'] = -1

    # Combined Strategy
    signals['signal'] = signals['kumo_breakout'] + signals['tk_cross']
    signals['signal'] = signals['signal'].apply(lambda x: 1 if x > 0 else (-1 if x < 0 else 0))

    return signals

5. Backtesting chiến lược

def backtest_strategy(df, signals):
    # Calculate returns
    df['returns'] = df['Close'].pct_change()
    df['strategy_returns'] = df['returns'] * signals['signal'].shift(1)
    
    # Calculate cumulative returns
    df['cumulative_returns'] = (1 + df['returns']).cumprod()
    df['strategy_cumulative_returns'] = (1 + df['strategy_returns']).cumprod()
    
    return df

6. Ví dụ thực tế

Dưới đây là một ví dụ về việc áp dụng chiến lược Ichimoku Cloud cho cổ phiếu AAPL:

# Download data
symbol = 'AAPL'
df = yf.download(symbol, start='2020-01-01', end='2023-12-31')

# Calculate Ichimoku
df = calculate_ichimoku(df)

# Generate signals
signals = generate_signals(df)

# Backtest
results = backtest_strategy(df, signals)

# Plot results
plt.figure(figsize=(15, 10))
plt.plot(results.index, results['cumulative_returns'], label='Buy and Hold')
plt.plot(results.index, results['strategy_cumulative_returns'], label='Ichimoku Strategy')
plt.title(f'Ichimoku Cloud Strategy - {symbol}')
plt.legend()
plt.show()

Ichimoku Strategy Results

Kết luận

Ichimoku Cloud là một công cụ phân tích kỹ thuật mạnh mẽ có thể được sử dụng để phát triển các chiến lược giao dịch hiệu quả. Bằng cách kết hợp Python và các thư viện phân tích dữ liệu, chúng ta có thể dễ dàng triển khai và backtest các chiến lược giao dịch dựa trên Ichimoku Cloud.

Tài liệu tham khảo

FastAPI - Framework Python Hiện Đại Cho API Development

July 1, 2025 · 3 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

FastAPI Features

FastAPI là một framework web hiện đại, nhanh (high-performance) cho việc xây dựng API với Python 3.7+. Nó được xây dựng dựa trên các tiêu chuẩn Python type hints và cung cấp một cách tiếp cận hiện đại để phát triển API.

Tại Sao Chọn FastAPI?

1. Hiệu Suất Cao

FastAPI là một trong những framework Python nhanh nhất hiện nay:

Dựa trên Starlette và Pydantic
Hỗ trợ async/await
Hiệu suất tương đương với NodeJS và Go

2. Type Safety

FastAPI tận dụng Python type hints để:

Tự động validate dữ liệu
Tạo tài liệu API tự động
Phát hiện lỗi trong quá trình phát triển

3. Tài Liệu Tự Động

FastAPI tự động tạo tài liệu API:

Swagger UI (/docs)
ReDoc (/redoc)
OpenAPI specification

Cài Đặt và Bắt Đầu

1. Cài Đặt

pip install fastapi uvicorn

2. Tạo Ứng Dụng Đầu Tiên

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

3. Chạy Server

uvicorn main:app --reload

Các Tính Năng Chính

1. Path Parameters

@app.get("/items/{item_id}")
async def read_item(item_id: int):
    return {"item_id": item_id}

2. Query Parameters

@app.get("/items/")
async def read_items(skip: int = 0, limit: int = 10):
    return {"skip": skip, "limit": limit}

3. Request Body

from pydantic import BaseModel

class Item(BaseModel):
    name: str
    price: float
    is_offer: bool = None

@app.post("/items/")
async def create_item(item: Item):
    return item

4. Form Data

from fastapi import Form

@app.post("/login/")
async def login(username: str = Form(...), password: str = Form(...)):
    return {"username": username}

Dependency Injection

FastAPI có hệ thống dependency injection mạnh mẽ:

from fastapi import Depends

async def common_parameters(q: str = None, skip: int = 0, limit: int = 100):
    return {"q": q, "skip": skip, "limit": limit}

@app.get("/items/")
async def read_items(commons: dict = Depends(common_parameters)):
    return commons

Bảo Mật

1. OAuth2 với JWT

from fastapi.security import OAuth2PasswordBearer

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

@app.get("/users/me")
async def read_users_me(token: str = Depends(oauth2_scheme)):
    return {"token": token}

2. CORS

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Testing

FastAPI hỗ trợ testing dễ dàng:

from fastapi.testclient import TestClient

client = TestClient(app)

def test_read_main():
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Hello World"}

Deployment

1. Uvicorn

uvicorn main:app --host 0.0.0.0 --port 8000

2. Docker

FROM python:3.9
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
COPY ./app /code/app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Kết Luận

FastAPI là một framework hiện đại, mạnh mẽ và dễ sử dụng cho việc phát triển API với Python. Với hiệu suất cao, type safety và tài liệu tự động, FastAPI là lựa chọn tuyệt vời cho các dự án API hiện đại.

Nếu bạn đang tìm kiếm một framework Python hiện đại để xây dựng API, FastAPI chắc chắn là một lựa chọn đáng cân nhắc.

Cách đánh giá hiệu suất mô hình giao dịch định lượng

July 1, 2025 · 6 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Đánh giá hiệu suất là một phần quan trọng trong phát triển mô hình giao dịch định lượng. Bài viết này sẽ hướng dẫn bạn các phương pháp và chỉ số để đánh giá hiệu suất của mô hình giao dịch một cách toàn diện.

Các chỉ số hiệu suất chính

1. Các chỉ số cơ bản

Tỷ suất lợi nhuận (Return)

Tỷ suất lợi nhuận là chỉ số cơ bản nhất để đánh giá hiệu suất của mô hình. Có hai loại lợi nhuận chính:

Lợi nhuận tuyệt đối: Tổng lợi nhuận của danh mục
Lợi nhuận tương đối: Lợi nhuận so với benchmark

So sánh lợi nhuận

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Tính toán lợi nhuận
def calculate_returns(portfolio_values):
    returns = portfolio_values.pct_change()
    return returns

# Ví dụ
portfolio_values = pd.Series([100, 105, 110, 108, 115])
returns = calculate_returns(portfolio_values)
print("Lợi nhuận hàng ngày:")
print(returns)

Độ biến động (Volatility)

Độ biến động đo lường mức độ dao động của lợi nhuận. Đây là một chỉ số quan trọng để đánh giá rủi ro của mô hình.

Độ biến động theo thời gian

def calculate_volatility(returns, annualization_factor=252):
    volatility = returns.std() * np.sqrt(annualization_factor)
    return volatility

# Ví dụ
volatility = calculate_volatility(returns)
print(f"\nĐộ biến động hàng năm: {volatility:.2%}")

2. Các chỉ số nâng cao

Tỷ số Sharpe

Tỷ số Sharpe đo lường lợi nhuận điều chỉnh theo rủi ro. Công thức:

Sharpe Ratio = (R - Rf) / σ

Trong đó:

R: Lợi nhuận của danh mục
Rf: Lợi nhuận phi rủi ro
σ: Độ lệch chuẩn của lợi nhuận

def calculate_sharpe_ratio(returns, risk_free_rate=0.02, annualization_factor=252):
    excess_returns = returns - risk_free_rate/annualization_factor
    sharpe_ratio = np.sqrt(annualization_factor) * excess_returns.mean() / returns.std()
    return sharpe_ratio

# Ví dụ
sharpe = calculate_sharpe_ratio(returns)
print(f"\nTỷ số Sharpe: {sharpe:.2f}")

Tỷ số Sortino

Tỷ số Sortino tương tự như Sharpe nhưng chỉ xem xét rủi ro downside. Công thức:

Sortino Ratio = (R - Rf) / σd

Trong đó:

σd: Độ lệch chuẩn của lợi nhuận âm

def calculate_sortino_ratio(returns, risk_free_rate=0.02, annualization_factor=252):
    excess_returns = returns - risk_free_rate/annualization_factor
    downside_returns = returns[returns < 0]
    sortino_ratio = np.sqrt(annualization_factor) * excess_returns.mean() / downside_returns.std()
    return sortino_ratio

# Ví dụ
sortino = calculate_sortino_ratio(returns)
print(f"\nTỷ số Sortino: {sortino:.2f}")

3. Phân tích rủi ro

Drawdown

Drawdown đo lường mức độ sụt giảm từ đỉnh xuống đáy của danh mục. Đây là một chỉ số quan trọng để đánh giá rủi ro tối đa.

def calculate_drawdown(portfolio_values):
    rolling_max = portfolio_values.expanding().max()
    drawdown = (portfolio_values - rolling_max) / rolling_max
    return drawdown

# Ví dụ
drawdown = calculate_drawdown(portfolio_values)
print("\nDrawdown:")
print(drawdown)

Biểu đồ Drawdown

Value at Risk (VaR)

VaR đo lường mức thua lỗ tối đa có thể xảy ra với một xác suất nhất định. Ví dụ, VaR 95% là mức thua lỗ tối đa có thể xảy ra với xác suất 95%.

Value at Risk (VaR)

def calculate_var(returns, confidence_level=0.95):
    var = np.percentile(returns, (1 - confidence_level) * 100)
    return var

# Ví dụ
var_95 = calculate_var(returns)
print(f"\nVaR 95%: {var_95:.2%}")

4. Phân tích hiệu suất

Phân tích thời gian

Phân tích hiệu suất theo các khung thời gian khác nhau giúp đánh giá tính ổn định của mô hình.

def analyze_performance_by_time(returns):
    # Phân tích theo tháng
    monthly_returns = returns.resample('M').mean()
    # Phân tích theo quý
    quarterly_returns = returns.resample('Q').mean()
    # Phân tích theo năm
    yearly_returns = returns.resample('Y').mean()
    
    return monthly_returns, quarterly_returns, yearly_returns

# Ví dụ
monthly, quarterly, yearly = analyze_performance_by_time(returns)
print("\nLợi nhuận theo tháng:")
print(monthly)

Phân tích tương quan

Phân tích tương quan giúp đánh giá mức độ phụ thuộc của mô hình vào thị trường.

def analyze_correlation(returns, benchmark_returns):
    correlation = returns.corr(benchmark_returns)
    return correlation

# Ví dụ
benchmark_returns = pd.Series([0.01, 0.02, -0.01, 0.03, 0.01])
correlation = analyze_correlation(returns, benchmark_returns)
print(f"\nTương quan với benchmark: {correlation:.2f}")

Ma trận tương quan

Hiệu suất danh mục theo thời gian

5. Đánh giá tổng thể

Báo cáo hiệu suất

Tạo báo cáo tổng hợp các chỉ số hiệu suất để có cái nhìn toàn diện.

def generate_performance_report(returns, portfolio_values):
    report = {
        'Tổng lợi nhuận': (portfolio_values[-1] / portfolio_values[0] - 1),
        'Lợi nhuận trung bình': returns.mean(),
        'Độ biến động': returns.std(),
        'Tỷ số Sharpe': calculate_sharpe_ratio(returns),
        'Tỷ số Sortino': calculate_sortino_ratio(returns),
        'VaR 95%': calculate_var(returns),
        'Drawdown tối đa': calculate_drawdown(portfolio_values).min()
    }
    return report

# Ví dụ
report = generate_performance_report(returns, portfolio_values)
print("\nBáo cáo hiệu suất:")
for metric, value in report.items():
    print(f"{metric}: {value:.2%}")

6. Trực quan hóa

Biểu đồ hiệu suất

Biểu đồ hiệu suất giúp trực quan hóa kết quả của mô hình theo thời gian.

def plot_performance(portfolio_values, benchmark_values=None):
    plt.figure(figsize=(12, 6))
    plt.plot(portfolio_values.index, portfolio_values, label='Portfolio')
    if benchmark_values is not None:
        plt.plot(benchmark_values.index, benchmark_values, label='Benchmark')
    plt.title('Hiệu suất danh mục')
    plt.xlabel('Thời gian')
    plt.ylabel('Giá trị')
    plt.legend()
    plt.grid(True)
    plt.show()

# Ví dụ
plot_performance(portfolio_values)

Biểu đồ phân phối lợi nhuận

Biểu đồ phân phối lợi nhuận giúp hiểu rõ hơn về tính chất của lợi nhuận.

def plot_returns_distribution(returns):
    plt.figure(figsize=(12, 6))
    sns.histplot(returns, kde=True)
    plt.title('Phân phối lợi nhuận')
    plt.xlabel('Lợi nhuận')
    plt.ylabel('Tần suất')
    plt.show()

# Ví dụ
plot_returns_distribution(returns)

Phân phối lợi nhuận

Kết luận

Đánh giá hiệu suất mô hình giao dịch định lượng đòi hỏi việc xem xét nhiều khía cạnh khác nhau:

Lợi nhuận và rủi ro
- Tỷ suất lợi nhuận
- Độ biến động
- Drawdown
- VaR
Các chỉ số hiệu suất điều chỉnh theo rủi ro
- Tỷ số Sharpe
- Tỷ số Sortino
Phân tích theo thời gian
- Hiệu suất theo tháng/quý/năm
- Tính ổn định của mô hình
Tương quan với benchmark
- Mức độ phụ thuộc vào thị trường
- Khả năng tạo alpha

Việc sử dụng kết hợp các chỉ số này sẽ giúp bạn có cái nhìn toàn diện về hiệu suất của mô hình giao dịch và đưa ra quyết định đầu tư tốt hơn.

Tài liệu tham khảo

Phân tích chênh lệch giá tiền điện tử giữa các sàn giao dịch với Python

July 1, 2025 · 5 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Giới thiệu

Chênh lệch giá (Arbitrage) là một chiến lược giao dịch phổ biến trong thị trường tiền điện tử. Trong bài viết này, chúng ta sẽ học cách sử dụng Python và CCXT để phân tích chênh lệch giá giữa các sàn giao dịch khác nhau.

1. Cài đặt và Cấu hình

1.1. Cài đặt thư viện

pip install ccxt pandas numpy plotly

1.2. Khởi tạo kết nối với các sàn

import ccxt
import pandas as pd
import numpy as np
from datetime import datetime

# Khởi tạo các sàn giao dịch
exchanges = {
    'binance': ccxt.binance(),
    'coinbase': ccxt.coinbase(),
    'kraken': ccxt.kraken(),
    'kucoin': ccxt.kucoin()
}

# Cấu hình chung
for exchange in exchanges.values():
    exchange.enableRateLimit = True

2. Lấy dữ liệu giá từ nhiều sàn

2.1. Lấy giá hiện tại

def get_current_prices(symbol, exchanges):
    """
    Lấy giá hiện tại của một cặp giao dịch từ nhiều sàn
    
    Parameters:
    - symbol: Cặp giao dịch (ví dụ: 'BTC/USDT')
    - exchanges: Dictionary chứa các exchange objects
    """
    prices = {}
    for name, exchange in exchanges.items():
        try:
            ticker = exchange.fetch_ticker(symbol)
            prices[name] = {
                'bid': ticker['bid'],
                'ask': ticker['ask'],
                'last': ticker['last'],
                'timestamp': datetime.fromtimestamp(ticker['timestamp']/1000)
            }
        except Exception as e:
            print(f"Error fetching {symbol} from {name}: {e}")
    return prices

# Ví dụ sử dụng
symbol = 'BTC/USDT'
prices = get_current_prices(symbol, exchanges)

2.2. Tính toán chênh lệch giá

def calculate_arbitrage_opportunities(prices):
    """
    Tính toán cơ hội arbitrage giữa các sàn
    """
    opportunities = []
    
    # Tạo ma trận chênh lệch
    exchanges = list(prices.keys())
    for i in range(len(exchanges)):
        for j in range(i+1, len(exchanges)):
            exchange1 = exchanges[i]
            exchange2 = exchanges[j]
            
            # Tính chênh lệch mua-bán
            spread1 = prices[exchange1]['ask'] - prices[exchange2]['bid']
            spread2 = prices[exchange2]['ask'] - prices[exchange1]['bid']
            
            # Tính phần trăm chênh lệch
            spread1_pct = (spread1 / prices[exchange2]['bid']) * 100
            spread2_pct = (spread2 / prices[exchange1]['bid']) * 100
            
            opportunities.append({
                'exchange1': exchange1,
                'exchange2': exchange2,
                'spread1': spread1,
                'spread2': spread2,
                'spread1_pct': spread1_pct,
                'spread2_pct': spread2_pct,
                'timestamp': datetime.now()
            })
    
    return pd.DataFrame(opportunities)

# Tính toán cơ hội arbitrage
arbitrage_df = calculate_arbitrage_opportunities(prices)

3. Phân tích và Trực quan hóa

3.1. Phân tích chênh lệch

def analyze_arbitrage(arbitrage_df, min_spread_pct=0.5):
    """
    Phân tích cơ hội arbitrage
    
    Parameters:
    - arbitrage_df: DataFrame chứa dữ liệu chênh lệch
    - min_spread_pct: Phần trăm chênh lệch tối thiểu để xem xét
    """
    # Lọc các cơ hội có chênh lệch đáng kể
    significant_opportunities = arbitrage_df[
        (arbitrage_df['spread1_pct'] > min_spread_pct) |
        (arbitrage_df['spread2_pct'] > min_spread_pct)
    ]
    
    # Sắp xếp theo chênh lệch
    significant_opportunities = significant_opportunities.sort_values(
        by=['spread1_pct', 'spread2_pct'],
        ascending=False
    )
    
    return significant_opportunities

# Phân tích cơ hội
opportunities = analyze_arbitrage(arbitrage_df)
print(opportunities)

3.2. Trực quan hóa chênh lệch

def plot_arbitrage_opportunities(arbitrage_df):
    """
    Vẽ biểu đồ chênh lệch giá
    """
    import plotly.graph_objects as go
    
    # Tạo biểu đồ
    fig = go.Figure()
    
    # Thêm các cột cho spread1 và spread2
    fig.add_trace(go.Bar(
        name='Spread 1',
        x=arbitrage_df['exchange1'] + ' vs ' + arbitrage_df['exchange2'],
        y=arbitrage_df['spread1_pct'],
        text=arbitrage_df['spread1_pct'].round(2),
        textposition='auto',
    ))
    
    fig.add_trace(go.Bar(
        name='Spread 2',
        x=arbitrage_df['exchange1'] + ' vs ' + arbitrage_df['exchange2'],
        y=arbitrage_df['spread2_pct'],
        text=arbitrage_df['spread2_pct'].round(2),
        textposition='auto',
    ))
    
    # Cập nhật layout
    fig.update_layout(
        title='Arbitrage Opportunities Between Exchanges',
        xaxis_title='Exchange Pairs',
        yaxis_title='Spread Percentage (%)',
        barmode='group',
        template='plotly_dark'
    )
    
    return fig

# Vẽ biểu đồ
fig = plot_arbitrage_opportunities(arbitrage_df)
fig.show()

4. Theo dõi chênh lệch theo thời gian thực

def monitor_arbitrage(symbol, exchanges, interval=60, duration=3600):
    """
    Theo dõi chênh lệch giá theo thời gian thực
    
    Parameters:
    - symbol: Cặp giao dịch
    - exchanges: Dictionary chứa các exchange objects
    - interval: Khoảng thời gian giữa các lần kiểm tra (giây)
    - duration: Thời gian theo dõi (giây)
    """
    import time
    from datetime import datetime, timedelta
    
    end_time = datetime.now() + timedelta(seconds=duration)
    opportunities_history = []
    
    while datetime.now() < end_time:
        try:
            # Lấy giá hiện tại
            prices = get_current_prices(symbol, exchanges)
            
            # Tính toán cơ hội arbitrage
            arbitrage_df = calculate_arbitrage_opportunities(prices)
            
            # Phân tích cơ hội
            opportunities = analyze_arbitrage(arbitrage_df)
            
            # Lưu vào lịch sử
            opportunities_history.append({
                'timestamp': datetime.now(),
                'opportunities': opportunities
            })
            
            # In thông tin
            print(f"\nTime: {datetime.now()}")
            print(opportunities)
            
            # Đợi đến lần kiểm tra tiếp theo
            time.sleep(interval)
            
        except Exception as e:
            print(f"Error in monitoring: {e}")
            time.sleep(interval)
    
    return pd.DataFrame(opportunities_history)

# Bắt đầu theo dõi
# monitor_arbitrage('BTC/USDT', exchanges)

5. Tính toán lợi nhuận tiềm năng

def calculate_potential_profit(opportunity, amount=1.0):
    """
    Tính toán lợi nhuận tiềm năng từ cơ hội arbitrage
    
    Parameters:
    - opportunity: Dictionary chứa thông tin cơ hội arbitrage
    - amount: Số lượng coin giao dịch
    """
    # Tính lợi nhuận cho cả hai hướng
    profit1 = amount * opportunity['spread1']
    profit2 = amount * opportunity['spread2']
    
    # Tính phí giao dịch (ước tính)
    fee_rate = 0.001  # 0.1%
    fees = amount * fee_rate * 2  # Phí mua và bán
    
    # Lợi nhuận thực tế
    net_profit1 = profit1 - fees
    net_profit2 = profit2 - fees
    
    return {
        'gross_profit1': profit1,
        'gross_profit2': profit2,
        'fees': fees,
        'net_profit1': net_profit1,
        'net_profit2': net_profit2
    }

Kết luận

Trong bài viết này, chúng ta đã học cách:

Kết nối với nhiều sàn giao dịch qua CCXT
Lấy và so sánh giá từ các sàn khác nhau
Tính toán cơ hội arbitrage
Trực quan hóa chênh lệch giá
Theo dõi chênh lệch theo thời gian thực

Lưu ý quan trọng:

Cần tính đến phí giao dịch và phí rút tiền
Xem xét thời gian xử lý giao dịch
Kiểm tra giới hạn giao dịch của các sàn
Đảm bảo đủ số dư trên các sàn

Tài liệu tham khảo

Liên hệ

Nếu bạn có thắc mắc hoặc cần hỗ trợ thêm, hãy liên hệ:

Email: support@huongnghiepdulieu.com
GitHub: huongnghiepdulieu

Tự động lấy và trực quan hóa dữ liệu giá tiền điện tử từ Binance với Python

July 1, 2025 · 5 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Giới thiệu

Trong bài viết này, chúng ta sẽ học cách sử dụng Python và thư viện CCXT để lấy dữ liệu giá tiền điện tử từ sàn Binance, sau đó phân tích và trực quan hóa dữ liệu này. Đây là kỹ năng quan trọng cho các nhà giao dịch và phân tích thị trường tiền điện tử.

1. Cài đặt và Cấu hình

1.1. Cài đặt các thư viện cần thiết

pip install ccxt pandas numpy plotly openpyxl

1.2. Kết nối với Binance qua CCXT

import ccxt
import pandas as pd
import plotly.graph_objects as go
from datetime import datetime

# Khởi tạo exchange
exchange = ccxt.binance({
    'enableRateLimit': True,  # Tự động xử lý rate limit
    'options': {
        'defaultType': 'spot'  # Sử dụng spot trading
    }
})

# Kiểm tra kết nối
print(f"Exchange: {exchange.name}")
print(f"Markets: {len(exchange.markets)}")

2. Lấy dữ liệu OHLCV (Candlestick)

2.1. Lấy dữ liệu theo timeframe

def fetch_ohlcv(symbol, timeframe='1h', limit=1000):
    """
    Lấy dữ liệu OHLCV từ Binance
    
    Parameters:
    - symbol: Cặp giao dịch (ví dụ: 'BTC/USDT')
    - timeframe: Khung thời gian ('1m', '5m', '1h', '4h', '1d')
    - limit: Số lượng nến muốn lấy (tối đa 1000)
    """
    try:
        ohlcv = exchange.fetch_ohlcv(symbol, timeframe, limit=limit)
        df = pd.DataFrame(ohlcv, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
        df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
        return df
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

# Ví dụ sử dụng
btc_data = fetch_ohlcv('BTC/USDT', '1h', 1000)
print(btc_data.head())

2.2. Lấy nhiều hơn 1000 nến

def fetch_multiple_ohlcv(symbol, timeframe='1h', since=None, limit=1000):
    """
    Lấy nhiều hơn 1000 nến bằng cách sử dụng since parameter
    """
    all_ohlcv = []
    while True:
        try:
            ohlcv = exchange.fetch_ohlcv(symbol, timeframe, since=since, limit=limit)
            if len(ohlcv) == 0:
                break
            all_ohlcv.extend(ohlcv)
            since = ohlcv[-1][0] + 1
        except Exception as e:
            print(f"Error: {e}")
            break
    return pd.DataFrame(all_ohlcv, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])

3. Xử lý và Lưu trữ Dữ liệu

3.1. Xử lý dữ liệu với Pandas

def process_ohlcv_data(df):
    """
    Xử lý dữ liệu OHLCV
    """
    # Chuyển đổi timestamp
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
    
    # Tính toán các chỉ báo
    df['returns'] = df['close'].pct_change()
    df['volatility'] = df['returns'].rolling(window=20).std()
    
    # Tính toán SMA
    df['SMA20'] = df['close'].rolling(window=20).mean()
    df['SMA50'] = df['close'].rolling(window=50).mean()
    
    return df

# Xử lý dữ liệu
btc_data = process_ohlcv_data(btc_data)

3.2. Lưu trữ dữ liệu

def save_data(df, filename, format='csv'):
    """
    Lưu dữ liệu ra file
    """
    if format == 'csv':
        df.to_csv(f"{filename}.csv", index=False)
    elif format == 'excel':
        df.to_excel(f"{filename}.xlsx", index=False)
    elif format == 'html':
        df.to_html(f"{filename}.html", index=False)
    else:
        print("Unsupported format")

# Ví dụ lưu dữ liệu
save_data(btc_data, 'btc_data', 'csv')
save_data(btc_data, 'btc_data', 'excel')

4. Trực quan hóa dữ liệu với Plotly

4.1. Vẽ biểu đồ nến (Candlestick)

def plot_candlestick(df, title='BTC/USDT Price'):
    """
    Vẽ biểu đồ nến với Plotly
    """
    fig = go.Figure(data=[go.Candlestick(
        x=df['timestamp'],
        open=df['open'],
        high=df['high'],
        low=df['low'],
        close=df['close']
    )])
    
    # Thêm SMA
    fig.add_trace(go.Scatter(
        x=df['timestamp'],
        y=df['SMA20'],
        name='SMA20',
        line=dict(color='blue')
    ))
    
    fig.add_trace(go.Scatter(
        x=df['timestamp'],
        y=df['SMA50'],
        name='SMA50',
        line=dict(color='red')
    ))
    
    # Cập nhật layout
    fig.update_layout(
        title=title,
        yaxis_title='Price (USDT)',
        xaxis_title='Date',
        template='plotly_dark'
    )
    
    return fig

# Vẽ và hiển thị biểu đồ
fig = plot_candlestick(btc_data)
fig.show()

4.2. Vẽ biểu đồ volume

def plot_volume(df, title='BTC/USDT Volume'):
    """
    Vẽ biểu đồ volume
    """
    fig = go.Figure(data=[go.Bar(
        x=df['timestamp'],
        y=df['volume'],
        name='Volume'
    )])
    
    fig.update_layout(
        title=title,
        yaxis_title='Volume',
        xaxis_title='Date',
        template='plotly_dark'
    )
    
    return fig

# Vẽ và hiển thị biểu đồ volume
volume_fig = plot_volume(btc_data)
volume_fig.show()

5. Lấy giá hiện tại (Ticker)

def get_current_price(symbol):
    """
    Lấy giá hiện tại của một cặp giao dịch
    """
    try:
        ticker = exchange.fetch_ticker(symbol)
        return {
            'symbol': symbol,
            'last': ticker['last'],
            'bid': ticker['bid'],
            'ask': ticker['ask'],
            'volume': ticker['baseVolume'],
            'timestamp': datetime.fromtimestamp(ticker['timestamp']/1000)
        }
    except Exception as e:
        print(f"Error fetching ticker: {e}")
        return None

# Ví dụ lấy giá BTC/USDT
btc_ticker = get_current_price('BTC/USDT')
print(btc_ticker)

6. Mở rộng: Các tính năng nâng cao

6.1. Lấy dữ liệu từ nhiều cặp giao dịch

def fetch_multiple_symbols(symbols, timeframe='1h', limit=1000):
    """
    Lấy dữ liệu từ nhiều cặp giao dịch
    """
    data = {}
    for symbol in symbols:
        data[symbol] = fetch_ohlcv(symbol, timeframe, limit)
    return data

# Ví dụ lấy dữ liệu nhiều cặp
symbols = ['BTC/USDT', 'ETH/USDT', 'BNB/USDT']
multi_data = fetch_multiple_symbols(symbols)

6.2. Tính toán tương quan giữa các cặp

def calculate_correlation(data_dict):
    """
    Tính toán tương quan giữa các cặp giao dịch
    """
    # Tạo DataFrame với giá đóng cửa của các cặp
    closes = pd.DataFrame()
    for symbol, df in data_dict.items():
        closes[symbol] = df['close']
    
    # Tính toán ma trận tương quan
    correlation = closes.corr()
    return correlation

# Tính và hiển thị tương quan
correlation = calculate_correlation(multi_data)
print(correlation)

Kết luận

Trong bài viết này, chúng ta đã học cách:

Kết nối với Binance qua CCXT
Lấy và xử lý dữ liệu OHLCV
Lưu trữ dữ liệu dưới nhiều định dạng
Trực quan hóa dữ liệu với Plotly
Thực hiện các phân tích nâng cao

Đây là nền tảng cơ bản để bạn có thể tự động hóa việc phân tích dữ liệu tiền điện tử. Bạn có thể mở rộng thêm bằng cách:

Thêm các chỉ báo kỹ thuật
Tạo chiến lược giao dịch tự động
Phân tích sentiment từ social media
Tích hợp với các nguồn dữ liệu khác

Tài liệu tham khảo

Liên hệ

Nếu bạn có thắc mắc hoặc cần hỗ trợ thêm, hãy liên hệ:

Email: support@huongnghiepdulieu.com
GitHub: huongnghiepdulieu

Dùng Machine Learning để Dự Đoán Giá Cổ Phiếu

July 1, 2025 · 9 min read

admin Hướng Nghiệp Dữ Liệu

Fullstack

Dự đoán giá cổ phiếu với Machine Learning

Giới thiệu

Dự đoán giá cổ phiếu là một trong những bài toán phức tạp nhất trong lĩnh vực tài chính, thu hút sự quan tâm của cả nhà đầu tư cá nhân lẫn tổ chức. Tuy nhiên, với sự phát triển của các kỹ thuật học máy (Machine Learning) và trí tuệ nhân tạo (AI), việc dự đoán biến động giá cổ phiếu đã trở nên khả thi hơn. Bài viết này sẽ hướng dẫn cách sử dụng Machine Learning trong Python để dự đoán giá cổ phiếu.

Thu thập dữ liệu

Bước đầu tiên trong quá trình dự đoán giá cổ phiếu là thu thập dữ liệu lịch sử. Python cung cấp nhiều thư viện hữu ích để lấy dữ liệu tài chính như yfinance, pandas-datareader, hoặc các API từ các sàn giao dịch.

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta

# Xác định khoảng thời gian
end_date = datetime.now()
start_date = end_date - timedelta(days=365*5)  # Lấy dữ liệu 5 năm

# Lấy dữ liệu cổ phiếu
ticker = "AAPL"  # Apple Inc.
data = yf.download(ticker, start=start_date, end=end_date)

# Xem dữ liệu
print(data.head())

Dữ liệu thu thập thường bao gồm giá mở cửa (Open), giá cao nhất (High), giá thấp nhất (Low), giá đóng cửa (Close), giá đóng cửa đã điều chỉnh (Adjusted Close) và khối lượng giao dịch (Volume).

Tiền xử lý dữ liệu

Trước khi áp dụng các thuật toán học máy, chúng ta cần tiền xử lý dữ liệu như xử lý giá trị thiếu, chuẩn hóa dữ liệu và tạo các tính năng mới.

# Xử lý giá trị thiếu
data = data.dropna()

# Thêm các chỉ báo kỹ thuật
# 1. Trung bình động (Moving Average)
data['MA20'] = data['Close'].rolling(window=20).mean()
data['MA50'] = data['Close'].rolling(window=50).mean()

# 2. MACD (Moving Average Convergence Divergence)
def calculate_macd(data, fast=12, slow=26, signal=9):
    data['EMA_fast'] = data['Close'].ewm(span=fast, adjust=False).mean()
    data['EMA_slow'] = data['Close'].ewm(span=slow, adjust=False).mean()
    data['MACD'] = data['EMA_fast'] - data['EMA_slow']
    data['MACD_signal'] = data['MACD'].ewm(span=signal, adjust=False).mean()
    data['MACD_histogram'] = data['MACD'] - data['MACD_signal']
    return data

data = calculate_macd(data)

# 3. RSI (Relative Strength Index)
def calculate_rsi(data, period=14):
    delta = data['Close'].diff()
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    
    avg_gain = gain.rolling(window=period).mean()
    avg_loss = loss.rolling(window=period).mean()
    
    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    
    data['RSI'] = rsi
    return data

data = calculate_rsi(data)

# 4. Độ biến động (Volatility)
data['Volatility'] = data['Close'].pct_change().rolling(window=20).std() * np.sqrt(20)

# Loại bỏ các dòng chứa giá trị NaN sau khi tính toán
data = data.dropna()

print(data.head())

Chuẩn bị dữ liệu cho mô hình

Tiếp theo, chúng ta cần chia dữ liệu thành tập huấn luyện (training set) và tập kiểm tra (test set), đồng thời chuẩn hóa dữ liệu để tăng hiệu suất của mô hình.

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Tính năng (features) và mục tiêu (target)
features = ['Open', 'High', 'Low', 'Volume', 'MA20', 'MA50', 'MACD', 'RSI', 'Volatility']
X = data[features]
y = data['Close']

# Chuẩn hóa dữ liệu
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Chia tập dữ liệu (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, shuffle=False)

print(f"Kích thước tập huấn luyện: {X_train.shape}")
print(f"Kích thước tập kiểm tra: {X_test.shape}")

Xây dựng và huấn luyện mô hình

Chúng ta có thể sử dụng nhiều thuật toán khác nhau để dự đoán giá cổ phiếu. Dưới đây là một số mô hình phổ biến:

1. Mô hình hồi quy tuyến tính

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Khởi tạo mô hình
model = LinearRegression()

# Huấn luyện mô hình
model.fit(X_train, y_train)

# Dự đoán trên tập kiểm tra
y_pred = model.predict(X_test)

# Đánh giá mô hình
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")
print(f"R² Score: {r2:.2f}")

# Hiển thị tầm quan trọng của các tính năng
importance = pd.DataFrame({'Feature': features, 'Importance': model.coef_})
importance = importance.sort_values('Importance', ascending=False)
print("\nTầm quan trọng của các tính năng:")
print(importance)

2. Mô hình Random Forest

from sklearn.ensemble import RandomForestRegressor

# Khởi tạo mô hình
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Huấn luyện mô hình
rf_model.fit(X_train, y_train)

# Dự đoán trên tập kiểm tra
rf_y_pred = rf_model.predict(X_test)

# Đánh giá mô hình
rf_mse = mean_squared_error(y_test, rf_y_pred)
rf_rmse = np.sqrt(rf_mse)
rf_r2 = r2_score(y_test, rf_y_pred)

print(f"Random Forest - MSE: {rf_mse:.2f}")
print(f"Random Forest - RMSE: {rf_rmse:.2f}")
print(f"Random Forest - R²: {rf_r2:.2f}")

3. Mô hình mạng nơ-ron (Neural Network)

Mô hình mạng nơ-ron cho dự đoán giá cổ phiếu

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler

# Tái định dạng dữ liệu cho LSTM
def create_sequences(X, y, time_steps=10):
    Xs, ys = [], []
    for i in range(len(X) - time_steps):
        Xs.append(X[i:(i + time_steps)])
        ys.append(y[i + time_steps])
    return np.array(Xs), np.array(ys)

# Chuẩn hóa tất cả dữ liệu
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()

X_scaled = scaler_X.fit_transform(data[features])
y_scaled = scaler_y.fit_transform(data[['Close']])

# Tạo chuỗi thời gian
time_steps = 10
X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)

# Chia tập dữ liệu
train_size = int(len(X_seq) * 0.8)
X_train_seq = X_seq[:train_size]
y_train_seq = y_seq[:train_size]
X_test_seq = X_seq[train_size:]
y_test_seq = y_seq[train_size:]

# Xây dựng mô hình LSTM
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train_seq.shape[1], X_train_seq.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))

# Biên dịch mô hình
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

# Huấn luyện mô hình
history = model.fit(
    X_train_seq, y_train_seq,
    epochs=100,
    batch_size=32,
    validation_split=0.1,
    verbose=1
)

# Dự đoán
y_pred_seq = model.predict(X_test_seq)

# Chuyển đổi về giá trị gốc
y_test_inv = scaler_y.inverse_transform(y_test_seq)
y_pred_inv = scaler_y.inverse_transform(y_pred_seq)

# Đánh giá mô hình
lstm_mse = mean_squared_error(y_test_inv, y_pred_inv)
lstm_rmse = np.sqrt(lstm_mse)

print(f"LSTM - MSE: {lstm_mse:.2f}")
print(f"LSTM - RMSE: {lstm_rmse:.2f}")

Dự đoán giá cổ phiếu trong tương lai

Một khi đã huấn luyện mô hình, chúng ta có thể sử dụng nó để dự đoán giá cổ phiếu trong tương lai:

def predict_future_prices(model, data, features, scaler, days=30):
    # Lấy dữ liệu cuối cùng
    last_data = data[features].iloc[-time_steps:].values
    last_data_scaled = scaler_X.transform(last_data)
    
    # Tạo danh sách để lưu trữ dự đoán
    future_predictions = []
    
    # Dự đoán cho 'days' ngày tiếp theo
    current_batch = last_data_scaled.reshape(1, time_steps, len(features))
    
    for _ in range(days):
        # Dự đoán giá tiếp theo
        future_price = model.predict(current_batch)[0]
        future_predictions.append(future_price)
        
        # Tạo dữ liệu mới cho dự đoán tiếp theo
        # (Dùng một cách đơn giản để minh họa - trong thực tế cần phức tạp hơn)
        new_data_point = current_batch[0][-1:].copy()
        new_data_point[0][0] = future_price[0]  # Thay đổi giá đóng cửa
        
        # Cập nhật batch hiện tại
        current_batch = np.append(current_batch[:,1:,:], [new_data_point], axis=1)
    
    # Chuyển đổi về giá trị gốc
    future_predictions = scaler_y.inverse_transform(np.array(future_predictions))
    
    return future_predictions

# Dự đoán giá cho 30 ngày tiếp theo
future_prices = predict_future_prices(model, data, features, scaler_X, days=30)

# Hiển thị kết quả
last_date = data.index[-1]
future_dates = pd.date_range(start=last_date + timedelta(days=1), periods=30)

future_df = pd.DataFrame({
    'Date': future_dates,
    'Predicted_Close': future_prices.flatten()
})

print(future_df)

Hiển thị dự đoán

Cuối cùng, chúng ta có thể trực quan hóa kết quả dự đoán bằng thư viện matplotlib:

plt.figure(figsize=(14, 7))

# Vẽ giá đóng cửa lịch sử
plt.plot(data.index[-100:], data['Close'][-100:], label='Giá lịch sử', color='blue')

# Vẽ giá dự đoán
plt.plot(future_df['Date'], future_df['Predicted_Close'], label='Giá dự đoán', color='red', linestyle='--')

# Thêm vùng tin cậy (mô phỏng - trong thực tế cần tính toán thêm)
confidence = 0.1  # 10% độ không chắc chắn
upper_bound = future_df['Predicted_Close'] * (1 + confidence)
lower_bound = future_df['Predicted_Close'] * (1 - confidence)

plt.fill_between(future_df['Date'], lower_bound, upper_bound, color='red', alpha=0.2, label='Khoảng tin cậy 90%')

plt.title(f'Dự đoán giá cổ phiếu {ticker}')
plt.xlabel('Ngày')
plt.ylabel('Giá (USD)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('stock_prediction_result.png')
plt.show()

Đánh giá và cải thiện mô hình

Để có kết quả dự đoán chính xác hơn, chúng ta có thể cải thiện mô hình bằng nhiều cách:

Thêm nhiều tính năng hơn: Bổ sung các chỉ báo kỹ thuật khác, dữ liệu từ phân tích tình cảm (sentiment analysis) của tin tức và mạng xã hội.
Tinh chỉnh siêu tham số: Sử dụng tìm kiếm lưới (Grid Search) hoặc tìm kiếm ngẫu nhiên (Random Search) để tìm các siêu tham số tối ưu.
Sử dụng các mô hình tiên tiến hơn: Thử nghiệm với mô hình Transformer, GRU, hoặc kiến trúc kết hợp CNN-LSTM.
Kết hợp nhiều mô hình: Sử dụng phương pháp ensemble để kết hợp dự đoán từ nhiều mô hình khác nhau.

from sklearn.ensemble import VotingRegressor

# Kết hợp các mô hình đã huấn luyện
ensemble_model = VotingRegressor([
    ('linear', LinearRegression()),
    ('random_forest', RandomForestRegressor(n_estimators=100, random_state=42)),
    ('svr', SVR(kernel='rbf', C=100, gamma=0.1, epsilon=.1))
])

# Huấn luyện mô hình kết hợp
ensemble_model.fit(X_train, y_train)

# Dự đoán
ensemble_y_pred = ensemble_model.predict(X_test)

# Đánh giá
ensemble_mse = mean_squared_error(y_test, ensemble_y_pred)
ensemble_rmse = np.sqrt(ensemble_mse)
ensemble_r2 = r2_score(y_test, ensemble_y_pred)

print(f"Ensemble - MSE: {ensemble_mse:.2f}")
print(f"Ensemble - RMSE: {ensemble_rmse:.2f}")
print(f"Ensemble - R²: {ensemble_r2:.2f}")

Kết luận

Dự đoán giá cổ phiếu bằng Machine Learning là một bài toán thú vị nhưng cũng đầy thách thức. Mặc dù không có mô hình nào có thể dự đoán chính xác 100% do tính chất phức tạp và không dự đoán được của thị trường tài chính, nhưng các kỹ thuật học máy có thể cung cấp cái nhìn sâu sắc và hỗ trợ cho việc ra quyết định đầu tư.

Điều quan trọng cần lưu ý là kết quả dự đoán không nên được xem là lời khuyên đầu tư, mà chỉ nên sử dụng như một công cụ bổ sung trong chiến lược đầu tư tổng thể, kết hợp với phân tích cơ bản, phân tích kỹ thuật, và hiểu biết về các yếu tố kinh tế vĩ mô.

1. Pandas​

Các tính năng chính:​

Ví dụ code:​

2. NumPy​

Các tính năng chính:​

Ví dụ code:​

3. Matplotlib​

Các tính năng chính:​

Ví dụ code:​

4. Yfinance​

Các tính năng chính:​

Ví dụ code:​

5. TA-Lib​

Các tính năng chính:​

Ví dụ code:​

Kết luận​

Tài liệu tham khảo​

Giới thiệu về SQLAlchemy​

Cài đặt các thành phần cần thiết​

Thiết lập kết nối đến SQL Server​

1. Tạo URL kết nối​

2. Tạo Engine​

3. Kiểm tra kết nối​

Tạo mô hình dữ liệu (ORM)​

1. Định nghĩa Base và Metadata​

2. Định nghĩa các model​

3. Tạo bảng trong cơ sở dữ liệu​

Thao tác CRUD với SQLAlchemy ORM​

1. Thiết lập Session​

2. Thêm dữ liệu (Create)​

3. Truy vấn dữ liệu (Read)​

4. Cập nhật dữ liệu (Update)​

5. Xóa dữ liệu (Delete)​

Xử lý transaction và lỗi​

1. Sử dụng transaction​

2. Xử lý commit và rollback thủ công​

Các tính năng nâng cao của SQLAlchemy​

1. Sử dụng SQLAlchemy Core (Expression Language)​

2. Tạo index và constraints​

3. Sử dụng lazy loading và eager loading​

4. Sử dụng events​

Các thực hành tốt nhất và mẹo khi sử dụng SQLAlchemy với SQL Server​

1. Sử dụng connection pooling​

2. Sử dụng bulk operations cho hiệu suất cao​

3. Quản lý migration với Alembic​

4. Sử dụng stored procedures​

Kết luận​

1. Thu thập dữ liệu​

Cài đặt thư viện cần thiết​

Sử dụng yfinance để lấy dữ liệu chứng khoán​

2. Phân tích hiệu suất​

Tính toán các chỉ số quan trọng​

3. Phân tích tương quan​

Xem xét mối quan hệ giữa các tài sản​

4. Tối ưu hóa danh mục đầu tư​

Sử dụng Modern Portfolio Theory (MPT)​

5. Phân tích rủi ro​

Đánh giá rủi ro danh mục​

6. Trực quan hóa kết quả​

Tạo biểu đồ hiệu suất và phân bổ​

Kết luận​

Tài liệu tham khảo​

1. Cài đặt và thiết lập​

Cài đặt thư viện yfinance​

Import các thư viện cần thiết​

2. Lấy dữ liệu cơ bản​

Lấy thông tin cổ phiếu​

Lấy dữ liệu lịch sử​

3. Lấy dữ liệu nâng cao​

Lấy dữ liệu nhiều cổ phiếu​

Lấy dữ liệu theo khoảng thời gian tùy chỉnh​

4. Phân tích dữ liệu​

Phân tích biến động​

Phân tích tương quan​

5. Lấy dữ liệu bổ sung​

Lấy dữ liệu tài chính​

Lấy dữ liệu cổ tức​

6. Xử lý dữ liệu thời gian thực​

Lấy dữ liệu realtime​

Kết luận​

1. Pandas

Các tính năng chính:

Ví dụ code:

2. NumPy

Các tính năng chính:

Ví dụ code:

3. Matplotlib

Các tính năng chính:

Ví dụ code:

4. Yfinance

Các tính năng chính:

Ví dụ code:

5. TA-Lib

Các tính năng chính:

Ví dụ code:

Kết luận

Tài liệu tham khảo

Giới thiệu về SQLAlchemy

Cài đặt các thành phần cần thiết

Thiết lập kết nối đến SQL Server

1. Tạo URL kết nối

2. Tạo Engine

3. Kiểm tra kết nối

Tạo mô hình dữ liệu (ORM)

1. Định nghĩa Base và Metadata

2. Định nghĩa các model

3. Tạo bảng trong cơ sở dữ liệu

Thao tác CRUD với SQLAlchemy ORM

1. Thiết lập Session

2. Thêm dữ liệu (Create)

3. Truy vấn dữ liệu (Read)

4. Cập nhật dữ liệu (Update)

5. Xóa dữ liệu (Delete)

Xử lý transaction và lỗi

1. Sử dụng transaction

2. Xử lý commit và rollback thủ công

Các tính năng nâng cao của SQLAlchemy

1. Sử dụng SQLAlchemy Core (Expression Language)

2. Tạo index và constraints

3. Sử dụng lazy loading và eager loading

4. Sử dụng events

Các thực hành tốt nhất và mẹo khi sử dụng SQLAlchemy với SQL Server

1. Sử dụng connection pooling

2. Sử dụng bulk operations cho hiệu suất cao

3. Quản lý migration với Alembic

4. Sử dụng stored procedures

Kết luận

1. Thu thập dữ liệu

Cài đặt thư viện cần thiết

Sử dụng yfinance để lấy dữ liệu chứng khoán

2. Phân tích hiệu suất

Tính toán các chỉ số quan trọng

3. Phân tích tương quan

Xem xét mối quan hệ giữa các tài sản

4. Tối ưu hóa danh mục đầu tư

Sử dụng Modern Portfolio Theory (MPT)

5. Phân tích rủi ro

Đánh giá rủi ro danh mục

6. Trực quan hóa kết quả

Tạo biểu đồ hiệu suất và phân bổ

Kết luận

Tài liệu tham khảo

1. Cài đặt và thiết lập

Cài đặt thư viện yfinance

Import các thư viện cần thiết

2. Lấy dữ liệu cơ bản

Lấy thông tin cổ phiếu

Lấy dữ liệu lịch sử

3. Lấy dữ liệu nâng cao

Lấy dữ liệu nhiều cổ phiếu

Lấy dữ liệu theo khoảng thời gian tùy chỉnh

4. Phân tích dữ liệu

Phân tích biến động

Phân tích tương quan

5. Lấy dữ liệu bổ sung

Lấy dữ liệu tài chính

Lấy dữ liệu cổ tức

6. Xử lý dữ liệu thời gian thực

Lấy dữ liệu realtime

Kết luận