Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
987 views
in Technique[技术] by (71.8m points)

math - Base conversion of arbitrary sized numbers (PHP)

I have a long "binary string" like the output of PHPs pack function.

How can I convert this value to base62 (0-9a-zA-Z)? The built in maths functions overflow with such long inputs, and BCmath doesn't have a base_convert function, or anything that specific. I would also need a matching "pack base62" function.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think there is a misunderstanding behind this question. Base conversion and encoding/decoding are different. The output of base64_encode(...) is not a large base64-number. It's a series of discrete base64 values, corresponding to the compression function. That is why BC Math does not work, because BC Math is concerned with single large numbers, not strings that are in reality groups of small numbers that represent binary data.

Here's an example to illustrate the difference:

base64_encode(1234) = "MTIzNA=="
base64_convert(1234) = "TS" //if the base64_convert function existed

base64 encoding breaks the input up into groups of 3 bytes (3*8 = 24 bits), then converts each sub-segment of 6 bits (2^6 = 64, hence "base64") to the corresponding base64 character (values are "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", where A = 0, / = 63).

In our example, base64_encode() treats "1234" as a string of 4 characters, not an integer (because base64_encode() does not operate on integers). Therefore it outputs "MTIzNA==", because (in US-ASCII/UTF-8/ISO-8859-1) "1234" is 00110001 00110010 00110011 00110100 in binary. This gets broken into 001100 (12 in decimal, character "M") 010011 (19 in decimal, character "T") 001000 ("I") 110011 ("z") 001101 ("N") 00. Since the last group isn't complete, it gets padded with 0's and the value is 000000 ("A"). Because everything is done by groups of 3 input characters, there are 2 groups: "123" and "4". The last group is padded with ='s to make it 3 chars long, so the whole output becomes "MTIzNA==".

converting to base64, on the other hand, takes a single integer value and converts it into a single base64 value. For our example, 1234 (decimal) is "TS" (base64), if we use the same string of base64 values as above. Working backward, and left-to-right: T = 19 (column 1), S = 18 (column 0), so (19 * 64^1) + (18 * 64^0) = 19 * 64 + 18 = 1234 (decimal). The same number can be represented as "4D2" in hexadecimal (base16): (4 * 16^2) + (D * 16^1) + (2 * 16^0) = (4 * 256) + (13 * 16) + (2 * 1) = 1234 (decimal).

Unlike encoding, which takes a string of characters and changes it, base conversion does not alter the actual number, just changes its presentation. The hexadecimal (base16) "FF" is the same number as decimal (base10) "255", which is the same number as "11111111" in binary (base2). Think of it like currency exchange, if the exchange rate never changed: $1 USD has the same value as £0.79 GBP (exchange rate as of today, but pretend it never changes).

In computing, integers are typically operated on as binary values (because it's easy to build 1-bit arithmetic units and then stack them together to make 32-bit/etc. arithmetic units). To do something as simple as "255 + 255" (decimal), the computer needs to first convert the numbers to binary ("11111111" + "11111111") and then perform the operation in the Arithmetic Logic Unit (ALU).

Almost all other uses of bases are purely for the convenience of humans (presentational) - computers display their internal value 11111111 (binary) as 255 (decimal) because humans are trained to operate on decimal numbers. The function base64_convert() doesn't exist as part of the standard PHP repertoire because it's not often useful to anyone: not many humans read base64 numbers natively. By contrast, binary 1's and 0's are sometimes useful for programmers (we can use them like on/off switches!), and hexadecimal is convenient for humans editing binary data because an entire 8-bit byte can be represented unambiguously as 00 through FF, without wasting too much space.

You may ask, "if base conversion is just for presentation, why does BC Math exist?" That's a fair question, and also exactly why I said "almost" purely for presentation: typical computers are limited to 32-bit or 64-bit wide numbers, which are usually plenty big enough. Sometimes you need to operate on really, really big numbers (RSA moduli for example), which don't fit in those registers. BC Math solves this problem by acting as an abstraction layer: it converts huge numbers into long strings of text. When it's time to do some operation, BC Math painstakingly breaks the long strings of text up into small chunks which the computer can handle. It's much, much slower than native operations, but it can handle arbitrary-sized numbers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...