https://www.codeproject.com/articles/7056/code-to-extract-plain-text-from-a-pdf-file
EDIT: some streams needs a bigger buffer:
size_t outsize = (streamend - streamstart)*105;
Hi TimoVJL,

great method, golang language processes the file in the same way, here is an example of a table to be found

Here is what will be found in the text file ..
Opcode Instruction Op/
En
64-Bit
Mode
Compat/
Leg Mode
Description
F6 /5 IMUL r/m8* MV a l i d V a l i d A X AL r/m byte.
F7 /5 IMUL r/m16 MV a l i d V a l i d D X : A X AX r/m word.
F7 /5 IMUL r/m32 MV a l i d V a l i d E D X : E A X EAX r/m 32.
REX.W + F7 /5 IMUL r/m64 M Valid N.E. RDX:RAX RAX r/m 64.
0F AF / r IMUL r16, r/m16 RM Valid Valid word register word register r/m 16.
0F AF / r IMUL r32, r/m32 RM Valid Valid doubleword register doubleword register
r/m32.
REX.W + 0F AF / r IMUL r64, r/m64 RM Valid N.E. Quadword register Quadword register
r/m64 .
6B / r ib IMUL r16, r/m16, imm8 RMI Valid Valid word register r/m16 sign-extended
immediate byte.
6B / r ib IMUL r32, r/m32, imm8 RMI Valid Valid doubleword register r/m32 sign-
extended immediate byte.
REX.W + 6B / r ib IMUL r64, r/m64, imm8 RMI Valid N.E. Quadword register r/m64 sign-extended
immediate byte.
69 / r iw IMUL r16, r/m16, imm16 RMI Valid Valid word register r/m16 immediate word.
69 / r id IMUL r32, r/m32, imm32 RMI Valid Valid doubleword register r/m32 immediate
doubleword.
REX.W + 69 / r id IMUL r64, r/m64, imm32 RMI Valid N.E. Quadword register r/m64 immediate
doubleword.Formatting is broken, but as a method it should be the perfect way. Thanks!
I would bet in in moving the information from the XED text database to an Sqlite database and manipulate it from there.
Records are already parsed and separated by {}
Record fields are: ICLASS, UNAME, VERSION, CATEGORY, .... etc
It will be easy to build the Sqlite database.
Just an idea. 
Hi AW,
the form of records scared me, before that I studied a number of other similar databases, it turned out that with an almost perfect form, all operand information is lost, since encoding in operands has a suitable representation for instruction decoder, in other words, we do not know anything about operands and this information is transformed into the source code of the parser in C++. I immediately noticed this form in xed, therefore, I lost all desire to search for operand descriptors in the source code.
CMPXCHG8B{
ICLASS : CMPXCHG8B
CPL : 3
CATEGORY : SEMAPHORE
EXTENSION : BASE
ISA_SET : PENTIUMREAL
ATTRIBUTES : LOCKABLE
FLAGS : MUST [ zf-mod ]
PATTERN : 0x0F 0xC7 MOD[mm] MOD!=3 REG[0b001] RM[nnn] not64 IMMUNE66() MODRM() nolock_prefix
OPERANDS : MEM0:rcw:q REG0=XED_REG_EDX:rcw:SUPP REG1=XED_REG_EAX:rcw:SUPP REG2=XED_REG_ECX:r:SUPP REG3=XED_REG_EBX:r:SUPP
PATTERN : 0x0F 0xC7 MOD[mm] MOD!=3 REG[0b001] RM[nnn] mode64 norexw_prefix IMMUNE66() MODRM() nolock_prefix
OPERANDS : MEM0:rcw:q REG0=XED_REG_EDX:rcw:SUPP REG1=XED_REG_EAX:rcw:SUPP REG2=XED_REG_ECX:r:SUPP REG3=XED_REG_EBX:r:SUPP
}

Maybe I'm wrong, the instruction descriptor here is
MEM0:rcw:q, CMPXCHG8B m64, m64 = MEM0:rcw:q ?! I will check the rest of the instructions and see if the information is lost there .. Thank you!
https://www.codeproject.com/articles/7056/code-to-extract-plain-text-from-a-pdf-file
EDIT: some streams needs a bigger buffer:
size_t outsize = (streamend - streamstart)*105;
It works, kind of, but it's lightyears away from what AbleWord can extract from a Pdf.
All instructions can be found in the pdf file, the question is how to copy the tables correctly ?!
Try AbleWord.
Hi jj2007,
I agree that a good presentation of text from pdf helps to process it more easily according to the conditions. But from experience this text always turns out to be messy, probably due to the fact that the pdf format has its own descriptors for the location of the text

. I tried the program and it gave me an error...

I would not want to be attached to the method of transformation into text, because there will always be some kind of new text structure.

Thank you!