Jump to content

[Tool/Utility] Chat Log Image transcription


Recommended Posts

What is this?

This is a python script, which I admit (I'm no coder) but with the help of our new A.I overlords cooked up something which is massively helpful for creating cleaner screenshots. Essentially, it uses the Tesseract module, an algorithm designed to de-code images and transcribe them, along with some other minor modules to aid in the process of this. It took a lot of fine tuning and testing with images but I believe it's in a suitable position now for me to share with the community.

 

TLDR; it allows you to grab the text from a screen as text, to copy as a text object in Photoshop or another editing platform of your choosing. This allows you to correct typos and make cleaner text with ease.

 

Do note: I do crop my images prior to examining them, so it's mainly the chat.

If you encounter issues either manually crop or use a Powershell script to re-size batches

(which I do and I will release if lots of people have issues.)  

 

Requirements
Python
Tesseract UB Mannheiem

 

 

Steps on how to use/install

Open up notepad or ideally Notepad++ 

Paste the below code.
For your Tesseract directory, this is where you installed it.
Image directory is where you will have your chat log images.
Note:Make sure to use /bf to capture the text behind a black screen otherwise this will not work.

Output is well, your output. Where you want the text extract to be saved. 

Once done, save the file in the location of your choosing and ensure it's saved as a Python file, so example: "ScanImages.py"
Ensure chat log images are in the folder to be scanned, and run the ScanImage.py file.

 

 

 

You should now have an extract of the text, which can be simply copied into your desired image editing software for clean screenshot texts. 

 

 

 

import os
import re
from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'Your_Tesseract_Executable_Path'

images_directory = r'Your_Images_Directory_Path'
output_directory = r'Your_Output_Directory_Path'

if not os.path.exists(output_directory):
    os.makedirs(output_directory)

def correct_ocr_errors(text):
    corrections = {
        r'\[\d{2}:\d{2}:\d{2}\]\s*': '',
        r'\(\(\s*(.*?)\s*\}\)': r'((\1))',
        r'\|\s*(\w)': r'I\1',
        r'\*\s*(\w)': r'* \1',
    }
    
    for pattern, replacement in corrections.items():
        text = re.sub(pattern, replacement, text)
    
    text = re.sub(r'\(\(\s*(\d+)\s*([^)]+)$', r'((\1 \2))', text)
    
    return text
    
def remove_empty_lines(text):
    lines = text.split('\n')
    non_empty_lines = [line for line in lines if line.strip()]
    return '\n'.join(non_empty_lines)
    
for image_filename in os.listdir(images_directory):
    if image_filename.endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join(images_directory, image_filename)
        output_text_path = os.path.join(output_directory, os.path.splitext(image_filename)[0] + '.txt')

        with Image.open(image_path) as img:
            text = pytesseract.image_to_string(img)

            corrected_text = correct_ocr_errors(text)

            with open(output_text_path, 'w') as file:
                file.write(corrected_text)

print("Transcription and correction completed.")

 

 

 

 

Example of result

 

As you can see below, the results are pretty accurate. I'm working on fixing the line spacing in the output. 

The top won't decode due to the text being cut off, hence some gibberish e.g "wep" and "sp".  

 

g9vQOyd.png

 

 

End result once the text is pasted in, lines removed & simply colour formatted correctly. 

4gEBLVM.png

Edited by Aldaz108
  • Upvote 2
Link to comment

FYI too, the top sections are where you configure your file paths. 

 

pytesseract.pytesseract.tesseract_cmd = r'Your_Tesseract_Executable_Path'

 

images_directory = r'Your_Images_Directory_Path'

output_directory = r'Your_Output_Directory_Path'

 

Keep ' marks in. 

Edited by Aldaz108
Link to comment
  • Aldaz108 changed the title to [Tool/Utility] Chat Log Image transcription

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...