13

I am loading a file into variables, the problem is that the file is formatted in Windows, I believe, so I get a ^M instead of a newline.

How do I modify it when the value is in the variable? I am aware that I can modify the source in VI (I use OS X, by the way), but I can't modify the original file, only read it, so I have to remove the ^M from the variable.

From my understanding, \n is not the same as ^M, so tr command won't work.

EDIT

It seems the question is not clear; so this is the clarification.

I do parse the file line by line; each line has a 2 values, separated by tab and at the end of each line, there is a ^M, it does look like this:

value1 value2^M value3 value4^M value5 value6^M value7 value8^M 

My workflow is pretty straightforward and simple: the txt file contain what you see above, the loop separate fields and for each line get the values; when I print the second value it has the ^M, which I would like to remove

while IFS=$'\t' read -r -a line do Type1="${line[0]}" Type2="${line[1]}" done < $TXTFILE 

Which means taht when I print Type1 it is fine, but Type2 variable contain the ^M. I did use tr and it didn't work, I did use sed to remove the last character of the variable, and it didn't work. Hope this clarify my question. Thanks

5
  • 1
    how are you reading the file into variable? might be possible to remove it during that... for ex: use sed 's|\r||' file instead of file Commented Nov 26, 2016 at 2:39
  • I was going to answer with tr but the question is way to broad. We don't know what the input or output are, nor what the script looks like. Commented Nov 26, 2016 at 3:17
  • I did add more info to the question; I load the file with standard loop in bash, changing IFS to tab, so I can divide the line in 2 parts and get the 2 variables. I did try TR but it didn't work, same for sed; which is why I did go ahead and posted the question here. Thanks Commented Nov 26, 2016 at 5:54
  • I find strange that you accept an unnecessarily complex solution. Did you try mine? Commented Nov 27, 2016 at 7:24
  • Also see Why is using a shell loop to process text considered bad practice? Commented Nov 27, 2016 at 16:31

4 Answers 4

11

^M is a carriage return (CR), which can be specified as \r for tr or within $'…'. \n specifies a line feed (LF), which is ^J. A Unix line ending is LF, and a Windows line separator is the two-character sequence CR-LF, so Windows text files viewed under a Unix system such as Linux or macOS look like they have ^M at the end of each line except on the last line which is missing its final newline.

You can remove carriage returns from a file with tr with

tr -d '\r' <somefile.txt >somefile.txt.new && mv somefile.txt.new somefile.txt 

or more simply with dos2unix.

To avoid modifying the files, you can check each line when you read it and strip CR at the end of a line. For example, if you're using read to parse tab-separated values, then strip CR at the end of the last field. The parameter expansion ${VAR%$'\r'} yields the value of VAR minus a trailing CR, and yields the value of VAR if it doesn't end with CR.

while IFS=$'\t' read -r -a line do line[$((${#line[@]}-1))]="${line[$((${#line[@]}-1))]%$'\r'}" Type1="${line[0]}" Type2="${line[1]}" done < "$TXTFILE" 
3
  • Thanks for clarifying that /r is in fact the same as ^M; the solution works Commented Nov 27, 2016 at 5:25
  • Note that OS/X comes with a very old version of bash, so I don't expect it would support line[-1] for which you need bash-4.3 or above. It comes with zsh that supports it (and has for decades), but note that in zsh, the first element is $line[1], not $line[0] (unless in ksh emulation). With older bash, you can always use line[${#line[@]}-1] Commented Nov 27, 2016 at 10:59
  • +1 for the background explanation, but I think jiliagre's solution is much better, and much simpler. Commented Nov 27, 2016 at 16:33
8

Here is the simplest way to fix your script, simply add "carriage return" as a internal field separator for the read command:

 while IFS=$'\t\r' read -r -a line do Type1="${line[0]}" Type2="${line[1]}" done < $TXTFILE 
2
  • 1
    @jiliagre, yes, you're right, it's taken as a delimiter there. That would be different in zsh where $IFS is taken as a separator. Commented Nov 27, 2016 at 11:13
  • 2
    @StéphaneChazelas Thanks, you are too obsessed with zsh ;-) Commented Nov 27, 2016 at 11:16
7

Use (for short strings):

${var//$'\015'} 

Example:

$ var=$'This is a test of a CR (\r) character' $ echo "${var//$'\r'}" This is a test of a CR () character 

For longer strings you may need sed or awk.

0

A more generally useful way to convert the content of "DOS" files, which have no other content marker than having CR+LF line endings (in contrast to Linux' LF only).

For Ubuntu, first and once only, do

 sudo apt install dos2unix 

the use as indicated below, here with od used to verify the output

 $ dos2unix < $TXTFILE | od -t x1z -w17 0000000 76 61 6c 75 65 31 20 20 20 20 76 61 6c 75 65 32 0a >value1 value2.< 0000021 76 61 6c 75 65 33 20 20 20 20 76 61 6c 75 65 34 0a >value3 value4.< 0000042 76 61 6c 75 65 35 20 20 20 20 76 61 6c 75 65 36 0a >value5 value6.< 0000063 76 61 6c 75 65 37 20 20 20 20 76 61 6c 75 65 38 0a >value7 value8.< 0000104 $ cat $TXTFILE | od -t x1z -w18 0000000 76 61 6c 75 65 31 20 20 20 20 76 61 6c 75 65 32 0d 0a >value1 value2..< 0000022 76 61 6c 75 65 33 20 20 20 20 76 61 6c 75 65 34 0d 0a >value3 value4..< 0000044 76 61 6c 75 65 35 20 20 20 20 76 61 6c 75 65 36 0d 0a >value5 value6..< 0000066 76 61 6c 75 65 37 20 20 20 20 76 61 6c 75 65 38 0d 0a >value7 value8..< 0000110 

This will translate not only the line endings but also other special characters, depending on the parameters to dos2unix or its counterpart unix2dos (which gets installed at the same time).

4
  • isn't dos2unix there by default? Commented Nov 27, 2016 at 4:51
  • I did clearly specified that I know how to do it via file modification; plus I am not using linux but OSX, so to install dos2unix I have to use brew Commented Nov 27, 2016 at 5:23
  • dos2unix isn't FIXATED to using FILE-modification, it is "a filter" and can be used in pipes; just like tr. It should also be preferred over tr as it handles charsets on a higher level, not only single byte codes. Commented Nov 27, 2016 at 9:10
  • @Lưu Vĩnh Phúc, I'm using Ubuntu 16.04 and have a quite fresh install, and I had to install it. Commented Nov 27, 2016 at 9:11

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.