I'm trying to concatenate three columns into one using Spark SQL and concat
function.
One of the columns gets value 'E', and when concatenated together with other two columns, returns unexpected value.
Here is the example of what is happening:
select
concat('22','E','01') as string_E,
concat('22','D','01') as strind_D
Expected value for string_E would be '22E01' while it returns 220. However, when value is 'D', returned value is as expected, 22D01
Does single 'E' character have any special meaning for Spark? When ran as 'EE' (double), it behaves as expected...
Apologies if this was covered somewhere already, haven't found it.
I'm trying to concatenate three columns into one using Spark SQL and concat
function.
One of the columns gets value 'E', and when concatenated together with other two columns, returns unexpected value.
Here is the example of what is happening:
select
concat('22','E','01') as string_E,
concat('22','D','01') as strind_D
Expected value for string_E would be '22E01' while it returns 220. However, when value is 'D', returned value is as expected, 22D01
Does single 'E' character have any special meaning for Spark? When ran as 'EE' (double), it behaves as expected...
Apologies if this was covered somewhere already, haven't found it.
As jasonharper said.
22E01
is simply scientific/E notation. It has special meaning, but not just for Spark.
>>> spark.sql('''select
concat('22','E','01') as string_E,
cast(concat('22','E','01') as double) as double_E
''').collect()
[Row(string_E='22E01', double_E=220.0)]
>>>
cast that to string first to avoid that
SELECT CAST(CONCAT('22', 'E', '01') AS VARCHAR(200)) AS string_E;
if it fails
SELECT CONCAT('' + '22', '' + 'E', '' + '01') AS string_E;
This forces SQL Server to treat everything as a string.
22E01
is the number 220, expressed in scientific notation. I don't know why any sort of string-to-number conversion is going on here - but that's clearly the source of the value you received. – jasonharper Commented Feb 4 at 14:16