moving thread from pb forums to here because of their server problems

Started by bobl, August 27, 2015, 11:12:39 PM

Previous topic - Next topic

jj2007

New version attached, with over a dozen Replace() statements. Open MS Excel and drag *.txt over the exe

Occasionally, you may see NAME# in Excel. This is because Excel interprets hyphens or plus signs at the beginning of a tab-delimited area as numeric fields.

hutch--

I gather in the longer term that you want to batch process a large number of files which may have at least slightly different notation so I wonder if its worth doing a sequence of searches with INSTR on each file to find if it has a known header or footer ?

The Line Input code is an old timer that performs OK but there is probably a faster way to do it, I envisaged something like a linear word search with INSTR to find the lead and trailing strings for each page them grabbing each page with MID$. Alternatively if its only particular pages you require, with page numbers you can scan the text for the page notation or if you need multiple pages create an array of page offsets so you can index your way through them.

bobl

JJ
Thanks for your extra work. It's very much appreciated.
Hutch.
That sounds like very good advice and thank you for it.
I've only got the reports for a handfull of companies at the moment but even these few confirm that I'm up against quite a bit of non-uniformity.

hutch--

Ok, I guess the trick is to make a lookup list of easily identifiable keywords or phrases that can identify a particular file layout from a given company. Now if you have multiple similar phrases you could stack the order to try the longer ones first then the shorter ones after it.

1. Annual Report and Accounts
2. Annual Report
etc ....

bobl

>you could stack the order to try the longer ones first then the shorter ones after it
That's a very good point.
Yes I'll do that and once again thanks for the advice.