T2C Текст → Код Средний

Telegram Channel Parser: Export Messages to CSV

The prompt generates a ready-to-use Python script for parsing Telegram channels, saving messages, dates, and media links to a CSV file.

Готовый промпт

Role: You are an experienced Python developer specializing in automation and working with the Telegram API.

Task: Write a complete Python script for parsing a public Telegram channel [channel_name] and exporting data to a CSV file.

Script requirements:
1. Library: Telethon (asyncio-based). Python version: 3.10+.
2. Authorization through environment variables: APIID, APIHASH, PHONE (load from .env via python-dotenv).
3. Parse the last [numberofmessages] messages from the channel.
4. Extract for each message:
 - message id
 - date and time (UTC, ISO 8601)
 - message text (cleaned of HTML tags)
 - media type (photo / video / document / none)
 - number of views
 - number of reactions (sum of all)
 - link to the message (https://t.me/channel/id)
5. Implement pagination through iter_messages with a limit.
6. Date filter: export only messages starting from [start_date] (format YYYY-MM-DD). If empty, no filter.
7. Save results to [file_name].csv with a delimiter ";" and UTF-8-BOM encoding (for correct opening in Excel).
8. Add a progress bar via tqdm.
9. Log errors through the logging module (level WARNING and above) to parser.log.
10. Handle exceptions: FloodWaitError, ChannelPrivateError, UsernameNotOccupiedError.

Response format:
- First, output the full script code in one block.
- After the code, a "Dependencies" block with pip install command for all required libraries.
- Then, a ".env Setup" block with a sample .env file.
- Lastly, a "Run" block with the launch command and a sample terminal output.

Constraints:
- Do not use the Bot API - only the User API via Telethon.
- Do not use third-party parsers like snscrape.
- All code must be covered with type hints.
- Add a docstring to each function.
- Do not hardcode tokens and credentials in the code.
Заполнить переменные 4

Подставьте свои значения — промпт обновится автоматически.

Ваш промпт
Role: You are an experienced Python developer specializing in automation and working with the Telegram API.

Task: Write a complete Python script for parsing a public Telegram channel [channel_name] and exporting data to a CSV file.

Script requirements:
1. Library: Telethon (asyncio-based). Python version: 3.10+.
2. Authorization through environment variables: APIID, APIHASH, PHONE (load from .env via python-dotenv).
3. Parse the last [numberofmessages] messages from the channel.
4. Extract for each message:
 - message id
 - date and time (UTC, ISO 8601)
 - message text (cleaned of HTML tags)
 - media type (photo / video / document / none)
 - number of views
 - number of reactions (sum of all)
 - link to the message (https://t.me/channel/id)
5. Implement pagination through iter_messages with a limit.
6. Date filter: export only messages starting from [start_date] (format YYYY-MM-DD). If empty, no filter.
7. Save results to [file_name].csv with a delimiter ";" and UTF-8-BOM encoding (for correct opening in Excel).
8. Add a progress bar via tqdm.
9. Log errors through the logging module (level WARNING and above) to parser.log.
10. Handle exceptions: FloodWaitError, ChannelPrivateError, UsernameNotOccupiedError.

Response format:
- First, output the full script code in one block.
- After the code, a "Dependencies" block with pip install command for all required libraries.
- Then, a ".env Setup" block with a sample .env file.
- Lastly, a "Run" block with the launch command and a sample terminal output.

Constraints:
- Do not use the Bot API - only the User API via Telethon.
- Do not use third-party parsers like snscrape.
- All code must be covered with type hints.
- Add a docstring to each function.
- Do not hardcode tokens and credentials in the code.

Как использовать

  1. Fill in the placeholders: Specify the channel's username (without @), desired number of messages, start date (or leave empty), and output file name.
  2. Insert the prompt into ChatGPT (GPT-4o) or Claude: Copy the completed prompt and send it to the chat. The model will generate the complete script with instructions.
  3. Create a .env file: Copy the example from the ".env Setup" block and fill it with your API_ID, API_HASH, and phone number. Keys can be obtained from my.telegram.org.
  4. Install dependencies: Execute the pip install command from the "Dependencies" block within your virtual environment.
  5. Run the script: On the first run, Telethon will request a confirmation code from Telegram. After authorization, the script will create a .session file and start parsing with a progress bar.
  6. Open the CSV: The file will be saved with UTF-8-BOM encoding, allowing Excel to open it correctly without additional settings.

Пример результата

import asyncio
import csv
import logging
import os
from datetime import datetime, timezone
from html.parser import HTMLParser
from typing import Optional

from dotenv import load_dotenv
from telethon import TelegramClient
from telethon.errors import ChannelPrivateError, FloodWaitError, UsernameNotOccupiedError
from telethon.tl.types import Message
from tqdm.asyncio import tqdm

load_dotenv()
logging.basicConfig(filename="parser.log", level=logging.WARNING)
logger = logging.getLogger(__name__)

API_ID: int = int(os.environ["API_ID"])
API_HASH: str = os.environ["API_HASH"])
PHONE: str = os.environ["PHONE"]

class _HTMLStripper(HTMLParser):
    """Strips HTML tags from a string."""
    def __init__(self) -> None:
        super().__init__()
        self._parts: list[str] = []

    def handle_data(self, data: str) -> None:
        self._parts.append(data)

    def get_text(self) -> str:
        return "".join(self._parts)

def strip_html(text: str) -> str:
    """Remove HTML tags from message text."""
    s = _HTMLStripper()
    s.feed(text)
    return s.get_text()

def get_media_type(msg: Message) -> str:
    """Determine media type of a Telegram message."""
    if msg.photo:
        return "photo"
    if msg.video:
        return "video"
    if msg.document:
        return "document"
    return "none"

async def parse_channel(
    channel: str,
    limit: int,
    output_file: str,
    date_from: Optional[datetime] = None,
) -> None:
    """Parse Telegram channel and export messages to CSV."""
    async with TelegramClient("session", API_ID, API_HASH) as client:
        await client.start(phone=PHONE)
        try:
            entity = await client.get_entity(channel)
        except (ChannelPrivateError, UsernameNotOccupiedError) as e:
            logger.warning("Cannot access channel %s: %s", channel, e)
            return

        rows: list[dict] = []
        try:
            async for msg in tqdm(client.iter_messages(entity, limit=limit), total=limit):
                if not isinstance(msg, Message):
                    continue
                if date_from and msg.date.replace(tzinfo=timezone.utc) < date_from:
                    break
                reactions_count = 0
                if msg.reactions:
                    reactions_count = sum(r.count for r in msg.reactions.results)
                rows.append({
                    "id": msg.id,
                    "date": msg.date.isoformat(),
                    "text": strip_html(msg.text or ""),
                    "media_type": get_media_type(msg),
                    "views": msg.views or 0,
                    "reactions": reactions_count,
                    "link": f"https://t.me/{channel}/{msg.id}",
                })
        except FloodWaitError as e:
            logger.warning("FloodWait: sleeping %s seconds", e.seconds)
            await asyncio.sleep(e.seconds)

        with open(f"{output_file}.csv", "w", newline="", encoding="utf-8-sig") as f:
            writer = csv.DictWriter(f, fieldnames=rows[0].keys(), delimiter=";")
            writer.writeheader()
            writer.writerows(rows)
        print(f"✅ Saved {len(rows)} messages → {output_file}.csv")

if __name__ == "__main__":
    asyncio.run(parse_channel(
        channel="bbcrussian",
        limit=500,
        output_file="telegram_export",
        date_from=datetime(2024, 1, 1, tzinfo=timezone.utc),
    ))

Dependencies: pip install telethon python-dotenv tqdm

.env: API_ID=12345678 | API_HASH=abcdef... | PHONE=+79001234567

Советы и вариации

  • Increase parsing speed: Add the parameter wait_time=0 to iter_messages for channels with large archives, but be cautious of FloodWait from Telegram.
  • Parsing multiple channels: Ask AI to wrap the function call in a loop over a list of channels and add a delay asyncio.sleep(2) between channels to avoid bans.
  • Keyword filtering: Add a requirement in the prompt to filter messages by a list of keywords [keywords] - AI will add a check any(kw in text for kw in keywords).
  • Export directly to Excel: Ask to replace CSV with openpyxl to generate .xlsx files with automatic column formatting and first-row freezing.
  • Add Webhook notifications: Extend the prompt with a request to send final statistics (number of messages, date range) to a Telegram bot via Bot API after parsing is complete.

Alternative approaches: For parsing without authorization (public channels only), ask AI to use telethon.sync or pyrogram. For data visualization, add prompt generation for activity graphs using matplotlib.