字符图片合并

在ocr识别当中,单字识别效果一般不够准确,另外,对于一个ocr功能而言,最耗时的莫过于每次执行ocr识别了,假设现今有几百张字符图片,为了快速生成{图片名:字符}的映射关系,需要耗时更少的方案

这个时候图片合并就需要安排上了,将几百次的字符图片识别转换为识别若干次单行或多行文字图片,减少ocr识别次数,就能有效减少耗时了,另外,经过合并后的文字图片在保留合理的字符间隔后,其识别准度也会有所提升,可谓一箭双雕


合并成单行文字图片

实现思路上还是比较简单的,大致拆分成以下步骤:

    1. 设定单行文字图片里的文字数量
    1. 遍历文件夹下的字符图片,存储即将进行图片合并的图片路径列表
    1. 计算每个字符图片的尺寸大小
    1. 根据单行文字数量和尺寸大小来计算出最终合并后的图片尺寸,并在内存里创建一张该尺寸的纯色背景图

    注意,如果字符图片的尺寸过小,为了让各个字符之间有足够的间隔,最好增加一个参数来拓展字符图片的边界

    1. 计算每张字符图片的混合贴图的位置信息,将各个字符图片平铺到这个背景图片上
    1. 保存图片到磁盘上
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# 合并成单行文字图片
def combine_image_with_lite(paths, merge_times, is_debug = False):
    count = len(paths)
    src_image = Image.open(paths[0])
    image_width, image_height = src_image.size 
    x = image_width + extra_bound_as_single_line
    y = image_height + extra_bound_as_single_line

    # 计算合并之后的大小
    full_x = x * count
    full_y = y
    full_size = (full_x, full_y)
    full_image = Image.new('RGBA', full_size, (255,255,255))
    full_image.format = "PNG"
    # print("full rect : {%d,%d,%d,%d}" % (0, 0, full_x, full_y))

    # 将图片块平铺到这个大图片上
    start_x = 0
    start_y = 0
    end_x = x
    end_y = y
    index = 1
    
    draw = ImageDraw.Draw(full_image)
    for path in paths:
        if is_debug == True:
            # 绘制每个填充区域的边界
            print("{%d,%d,%d,%d}" % (start_x, start_y, end_x, end_y))
            draw.rectangle((start_x+1, start_y+1, end_x-1, end_y-1), outline = get_random_color())

            # 居中绘制填充的图片索引
            index_str = "%d"%(index)
            font_size_x, font_size_y = draw.textsize(index_str)
            draw.text((start_x + (x-font_size_x)/2, start_y + (y-font_size_y)/2), index_str, fill='black')
        else:
            # 由于字符图片边界过小,如果直接平铺的话,就会显得太挤,OCR识别效果很差
            # 为此拓展字符图片的填充区域,并将图片在此中间叠加上去
            src_image = Image.open(path)
            start_x0 = start_x + (x-image_width)//2
            start_y0 = start_y + (y-image_height)//2
            end_x0 = start_x0 + image_width
            end_y0 = start_y0 + image_height
            full_image.paste(src_image, (start_x0, start_y0, end_x0, end_y0), src_image)

        index = index + 1
        if end_x >= full_x:
            # 从左往右填充,遇到边界之后切换到下一行
            start_x = 0
            end_x = start_x + x

            start_y = start_y + y
            end_y = start_y + y
        else:
            # 继续从左往右平铺
            start_x = start_x + x
            end_x = end_x + x

    current_dir = os.getcwd()
    output_path = '%s/../single_line_%d.png' % (current_dir, merge_times)
    full_image.save(output_path)

合并成多行文字图片

实现思路上同上述差不多,但是需要注意计算好图片的行列数,尽可能减少空白区域,步骤如下:

    1. 遍历文件夹下的所有字符图片得到待合并的图片路径列表
    1. 计算每个字符图片的尺寸大小
    1. 根据合并的图片数量,开根号后计算行列数
    1. 根据行列数来计算出最终合并后的图片尺寸,并在内存里创建一张该尺寸的纯色背景图

    注意,如果字符图片的尺寸过小,为了让各个字符之间有足够的间隔,最好增加一个参数来拓展字符图片的边界

    1. 计算每张字符图片的混合贴图的位置信息,将各个字符图片平铺到这个背景图片上,如果当前行已经平铺满了,就切换到下一行继续平铺

    平铺顺序:从左到右 从上到下

    1. 保存图片到磁盘上
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# 合并成多行文字图片
def combine_image_with_multi(paths, merge_times, is_debug = False):
    count = len(paths)
    src_image = Image.open(paths[0])
    image_width, image_height = src_image.size 
    x = image_width + extra_bound_as_multi_line
    y = image_height + extra_bound_as_multi_line

    # 根据待合并的图片总数来计算平铺的行列数
    sqrt2 = math.sqrt(count)
    row_num = round(sqrt2)
    column_num = row_num
    if sqrt2 > column_num:
        column_num = column_num + 1 
    print("总数:%d 开根号:%f 行:%d 列:%d" % (count, sqrt2, row_num, column_num))

    # 计算合并之后的图片尺寸
    full_x = x * column_num
    full_y = y * row_num
    full_size = (full_x, full_y)
    full_image = Image.new('RGBA', full_size, (255,255,255))
    print("full rect : {%d,%d,%d,%d}" % (0, 0, full_x, full_y))

    # 将图片块平铺到这个大图片上
    start_x = 0
    start_y = 0
    end_x = x
    end_y = y
    index = 1
    
    draw = ImageDraw.Draw(full_image)
    for path in paths:
        if is_debug == True:
            # 绘制每个填充区域的边界
            print("{%d,%d,%d,%d}" % (start_x, start_y, end_x, end_y))
            draw.rectangle((start_x+1, start_y+1, end_x-1, end_y-1), outline = get_random_color())

            # 居中绘制填充的图片索引
            index_str = "%d"%(index)
            font_size_x, font_size_y = draw.textsize(index_str)
            draw.text((start_x + (x-font_size_x)/2, start_y + (y-font_size_y)/2), index_str, fill='black')
        else:
            src_image = Image.open(path)
            start_x0 = start_x + (x-image_width)//2
            start_y0 = start_y + (y-image_height)//2
            end_x0 = start_x0 + image_width
            end_y0 = start_y0 + image_height
            full_image.paste(src_image, (start_x0, start_y0, end_x0, end_y0), src_image)

        index = index + 1
        if end_x >= full_x:
            # 从左往右填充,遇到边界之后切换到下一行
            start_x = 0
            end_x = start_x + x

            start_y = start_y + y
            end_y = start_y + y
        else:
            # 继续从左往右平铺
            start_x = start_x + x
            end_x = end_x + x

    current_dir = os.getcwd()
    output_path = '%s/../multi_line_%d.png' % (current_dir, merge_times)
    full_image.save(output_path)

附件:char_image_merge_tool.zip

0%