当前位置:网站首页>Excel data extraction technique: a universal formula for extracting numbers from mixed text

Excel data extraction technique: a universal formula for extracting numbers from mixed text

2022-06-24 06:10:00 User 8639654

In the last article , Floret explains by looking at mixed text features , Set specific formulas , Three scenarios for data extraction . therefore , Some little petals whispered to little flower : Miss Xiaohua , I am stupid. , No data characteristics can be seen , I'm lazy , I don't want to set different formulas for different scenarios , Is there a kind of overlord universal formula , What kind of mixed text we can hard bow ?

The answer, of course, is , yes , we have ! however , It is still necessary to distinguish between the two situations . One is to extract values , There are positive and negative differences in size , There is also a decimal point ; The other is to extract numeric strings , Such as telephone number 、 ID number, etc , The numbers here have no decimals or minus signs , There is no difference in size .

How to write the universal formula of these two scenarios respectively , How to understand ? And listen to the little flower .

Four 、 A universal formula for extracting numerical values

Situational features : Except for the target value , There are no other numbers in the text , Otherwise, it is easy to cause interference .

Universal formula :

{=-LOOKUP(9^9,-MIDB(A2,MIN(FINDB(LEFT(ROW($1:$11)-2,1),A2&-1/19)),ROW($1:$100)))}

The formula is disassembled in detail as follows :

①LEFT(ROW(1:11)-2,1)

ROW(1:11) Well understood. , Back to page 1 Go to the first place 11 The line number of the line , That is to say 11 Made up of... Characters aggregate A{1,2,3…11},-2 It becomes Character set B{-1,0,1,2…9}. Re pass LEFT Extract character set B The first character on the left , Generate Character set C{"-",0,1,2,…9}, That is, symbols and 0-9 These ten characters , All values , By this 11 Characters make up .

Sum up , The function of this part is to construct all characters of Arabic numerals , These numbers help us to lock the position , And then extract the Arabic values .

②FINDB(①,A2&-1/19)

FINDB Is to find the position of the character in the target text , It is associated with FIND The difference is , It returns the byte sequence number , That is to say, Chinese characters and symbols are regarded as 2 Bytes . Thus we can see that ,A2 Cells in mixed text , Minus sign “-” The place where it appears is 5, instead of 3.

The formula uses A2&-1/19 To make sure that Character set C{"-",0,1,2,…9} Every character of is in FIND Appears in the find text for , Make sure FIND There is no error value in the return value of . fragment ② return Character set C{"-",0,1,2,…9} stay A2&-1/19 Position of appearance , namely Ordinal set D{5,13,10,6,…}.

③MIN(②)

MIN(②) take ② Result Ordinal set D{5,13,10,6,…} Minimum of , It is the target value at A2 Starting position in , namely A2 Mixed text , The position where the negative sign or Arabic numeral first appears , That is, the starting position of the target extraction value . This is why the left side of the target number is required , There can be no irrelevant Arabic numerals or negative signs .

④-MIDB(A2,③,ROW($1:$100))

Use here MIDB, instead of MID, It's for correspondence FINDB, Part of the text is intercepted by byte position .ROW($1:$100) Returns an ordered array {1-100}, As MIDB The third argument to the function —— Number of bytes to extract , I.e. separate extraction 1-100 Characters . Learn more skills , Please collect and pay attention to Tribal education excel Text course .

therefore ,MIDB The function of the function is from ③ Start at the determined starting position , Respectively from the A2 The cut length in the cell text is 1-100 Bytes of 100 individual Unequal length string E{"-","-2","-29","-299",…"-299.19"}. and -MIDB Is to subtract unequal length strings , This causes non numeric data to report an error as #VALUE!, And then Unequal length string E Convert to pure numbers and error values #VALUE! A new constant composed of Array F{#VALUE!;2;29;299;299;299.1;299.19;…;299.19}

⑤-LOOKUP(9^9,④)

LOOKUP Queries have three features :

1. The default query area is in ascending order , That is, the later the value is, the greater .

2. The return value should be less than and closest to the query value .

3. Ignore the wrong values in the query area .

thus , We assign a maximum number to the query value 9^9, because LOOKUP Characteristics of 1, So the last non error value of the query area is the maximum value , That is, the value is the return value .LOOKUP These characteristics of , It perfectly ignores the error value and takes the last valid value !

5、 ... and 、 Universal formula for extracting characters

usage : Extract all the values of the target cell in turn and merge .

Universal formula :

{=SUM(MID(0&A2,LARGE(ISNUMBER(--MID(A2,ROW($1:$100),1))*ROW($1:$100),ROW($1:$100))+1,1)*10^ROW($1:$100)/10)}

The formula is briefly disassembled as follows :

① ISNUMBER(--MID(A2,ROW($1:$100),1))*ROW($1:$100)

adopt MID(A2,ROW($1:$100),1) Extract each character one by one , Use double minus sign operation , Distinguish between numbers and other characters , Reuse ISNUMBER Function to determine whether each character is a number , Returns a set of logical values , Last *ROW($1:$100) Make the number return to its in A2 Position in mixed text , Other characters return 0.

② LARGE(①,ROW($1:$100))

adopt LARGE function , take ① Reorder the set of character position values in from large to small . Because the position of the number in the text is always greater than 0, And the lower the number , The higher the position value is . Other characters are always less than 0 Of . The point here is to put all 0 After setting the value , At the same time, all digital position values are inverted .

③ MID(0&A2,②+1,1)

MID according to ② Position value of +1 from 0&A2 One by one . Because the non numeric position value is 0, All non numeric return values take the first place 0, The remaining figures are unaffected . because ② The numeric position value of is reversed , therefore , At this time, the extracted numbers are reversed .

④ SUM(③*10^ROW($1:$100)/10))

The first three steps lead to A2 All the numbers in the cell and a string representing non numeric positions 0 An ordered array of , This completes the final extraction , You also need to arrange the numbers in positive order 、 Remove 0 Values and merge them . These are all handed over to *10^ROW($1:$100)/10 complete , It builds a multi digit number to put the numbers in order , Will eventually represent the number of significant digits before the text 0 Value ellipsis , The rest of the numbers are arranged from one bit to the left in order . The final multi digit number is the result of digital extraction .

Actually , The problem of extracting numeric strings ,19 Years later, the version has a very simple and brain - free solution –– adopt CONCAT Just connect directly .

19 The universal formula is as follows :

{=CONCAT(IFERROR(--MID($A2,ROW($1:$100),1),""))}

原网站

版权声明
本文为[User 8639654]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/07/20210726162029097i.html