Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
163 views
in Technique[技术] by (71.8m points)

excel - How to extract dates from string?

I am trying to extract dates from string by excel formula only. Below is my data sample

Only contains one date 01/05/2021 to this example
Project start date is 01/01/2021 and end date is 31/01/2021
There may multiple date like 1st 01/01/2021 2nd 01/06/2021 and 3rd 31/12/2021

enter image description here

My computer local date format is dd/mm/yyyy. I was trying to accomplish it by FILTERXML() formula. I have tried below formula, also tried few other methods but fails.

=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[translate(.,'dd/mm/yyyy','')!=.]"))

My expected output looks like-

enter image description here

question from:https://stackoverflow.com/questions/65952087/how-to-extract-dates-from-string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you want to go purely xpath then you could try to fully validate your pattern dd/mm/yyyy in a few steps:1

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[substring(., 3, 1)= '/'][substring(., 6, 1)= '/'][string-length(translate(., '/' , '')) = 8][translate(., '/' , '')*0=0]"),"dd/mm/e"))
  • "<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>" - Create a valid XML-construct.
  • //s - Select s-nodes where:
    • [substring(., 3, 1)= '/'] - There is a forward slash at the 3rd index;
    • [substring(., 6, 1)= '/'] - There is a forward slash at the 6th index;
    • [string-length(translate(., '/' , '')) = 8] - The remainder of the node when we replace the forward slashes is of length eight.
    • [translate(., '/' , '')*0=0] - The remainder of the node when we replace the forward slashes is numeric.

Needless to say that if your string does not hold any other forward slashes but those in the dates you can simplify the above significantly1:

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[contains(., '/')]"),"dd/mm/e"))

enter image description here

Notice that if "dd/mm/yyyy" is recognized by Excel as dates, the returned array of simply using //s would return the numeric equivalent of these dates. If no other numeric values exist in your string you could benefit from that using Microsoft365 functionality1:

=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),TRANSPOSE(TEXT(FILTER(X,ISNUMBER(X)),"dd/mm/e")))

1: Note that you can remove the nested TEXT() function and numberformat your cells to dd/mm/e too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...