[Tool/Utility] Chat Log Image transcription

Aldaz108 · January 23

What is this?

This is a python script, which I admit (I'm no coder) but with the help of our new A.I overlords cooked up something which is massively helpful for creating cleaner screenshots. Essentially, it uses the Tesseract module, an algorithm designed to de-code images and transcribe them, along with some other minor modules to aid in the process of this. It took a lot of fine tuning and testing with images but I believe it's in a suitable position now for me to share with the community.

TLDR; it allows you to grab the text from a screen as text, to copy as a text object in Photoshop or another editing platform of your choosing. This allows you to correct typos and make cleaner text with ease.

Do note: I do crop my images prior to examining them, so it's mainly the chat.

If you encounter issues either manually crop or use a Powershell script to re-size batches

(which I do and I will release if lots of people have issues.)

Requirements
Python
Tesseract UB Mannheiem

Steps on how to use/install

Open up notepad or ideally Notepad++

Paste the below code.
For your Tesseract directory, this is where you installed it.
Image directory is where you will have your chat log images.
Note:Make sure to use /bf to capture the text behind a black screen otherwise this will not work.

Output is well, your output. Where you want the text extract to be saved.

Once done, save the file in the location of your choosing and ensure it's saved as a Python file, so example: "ScanImages.py"
Ensure chat log images are in the folder to be scanned, and run the ScanImage.py file.

You should now have an extract of the text, which can be simply copied into your desired image editing software for clean screenshot texts.

import os
import re
from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'Your_Tesseract_Executable_Path'

images_directory = r'Your_Images_Directory_Path'
output_directory = r'Your_Output_Directory_Path'

if not os.path.exists(output_directory):
    os.makedirs(output_directory)

def correct_ocr_errors(text):
    corrections = {
        r'\[\d{2}:\d{2}:\d{2}\]\s*': '',
        r'\(\(\s*(.*?)\s*\}\)': r'((\1))',
        r'\|\s*(\w)': r'I\1',
        r'\*\s*(\w)': r'* \1',
    }
    
    for pattern, replacement in corrections.items():
        text = re.sub(pattern, replacement, text)
    
    text = re.sub(r'\(\(\s*(\d+)\s*([^)]+)$', r'((\1 \2))', text)
    
    return text
    
def remove_empty_lines(text):
    lines = text.split('\n')
    non_empty_lines = [line for line in lines if line.strip()]
    return '\n'.join(non_empty_lines)
    
for image_filename in os.listdir(images_directory):
    if image_filename.endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join(images_directory, image_filename)
        output_text_path = os.path.join(output_directory, os.path.splitext(image_filename)[0] + '.txt')

        with Image.open(image_path) as img:
            text = pytesseract.image_to_string(img)

            corrected_text = correct_ocr_errors(text)

            with open(output_text_path, 'w') as file:
                file.write(corrected_text)

print("Transcription and correction completed.")

Example of result

As you can see below, the results are pretty accurate. I'm working on fixing the line spacing in the output.

The top won't decode due to the text being cut off, hence some gibberish e.g "wep" and "sp".

End result once the text is pasted in, lines removed & simply colour formatted correctly.

Edited January 25 by Aldaz108

Aldaz108 · January 24

FYI too, the top sections are where you configure your file paths.

pytesseract.pytesseract.tesseract_cmd = r'Your_Tesseract_Executable_Path'

images_directory = r'Your_Images_Directory_Path'

output_directory = r'Your_Output_Directory_Path'

Keep ' marks in.

Edited January 24 by Aldaz108

Sign In

[Tool/Utility] Chat Log Image transcription

Recommended Posts

Aldaz108

Link to comment

Aldaz108

Link to comment

Create an account or sign in to comment

Create an account

Sign in

All Activity

Browse