Is 'E' character special in Spark SQL - Stack Overflow

admin2025-04-15  0

I'm trying to concatenate three columns into one using Spark SQL and concat function. One of the columns gets value 'E', and when concatenated together with other two columns, returns unexpected value. Here is the example of what is happening:

select
    concat('22','E','01') as string_E,
    concat('22','D','01') as strind_D 

Expected value for string_E would be '22E01' while it returns 220. However, when value is 'D', returned value is as expected, 22D01

Does single 'E' character have any special meaning for Spark? When ran as 'EE' (double), it behaves as expected...

Apologies if this was covered somewhere already, haven't found it.

I'm trying to concatenate three columns into one using Spark SQL and concat function. One of the columns gets value 'E', and when concatenated together with other two columns, returns unexpected value. Here is the example of what is happening:

select
    concat('22','E','01') as string_E,
    concat('22','D','01') as strind_D 

Expected value for string_E would be '22E01' while it returns 220. However, when value is 'D', returned value is as expected, 22D01

Does single 'E' character have any special meaning for Spark? When ran as 'EE' (double), it behaves as expected...

Apologies if this was covered somewhere already, haven't found it.

Share Improve this question asked Feb 4 at 14:03 kwasnykwasny 1417 bronze badges 4
  • 22E01 is the number 220, expressed in scientific notation. I don't know why any sort of string-to-number conversion is going on here - but that's clearly the source of the value you received. – jasonharper Commented Feb 4 at 14:16
  • Thanks - I thought so but could not articulate this precisely enough – kwasny Commented Feb 4 at 16:13
  • 1 I can't reproduce this behavior in spark sql. Is this maybe happening when you open your output in Excel? – Andrew Commented Feb 4 at 19:40
  • Andrew - I'm not using excel for this at all. I connect to Spark using Alation Compose and value of 220 is being returned straight as query result. That is OK once I understand that 22E01 is the fact 22 * 10^1, which is 220 exactly. Thanks for all inputs, it helped me greatly to understand my issue is not an issue at all. There is no point to 'fight' spark parsing engine, I will have to find other workaround for this. – kwasny Commented Feb 5 at 9:01
Add a comment  | 

2 Answers 2

Reset to default 0

As jasonharper said.

22E01 is simply scientific/E notation. It has special meaning, but not just for Spark.

>>> spark.sql('''select
   concat('22','E','01') as string_E,
   cast(concat('22','E','01') as double) as double_E
''').collect()
[Row(string_E='22E01', double_E=220.0)]
>>>

cast that to string first to avoid that

SELECT CAST(CONCAT('22', 'E', '01') AS VARCHAR(200)) AS string_E;

if it fails

SELECT CONCAT('' + '22', '' + 'E', '' + '01') AS string_E;

This forces SQL Server to treat everything as a string.

转载请注明原文地址:http://www.anycun.com/QandA/1744714620a86607.html